Web Scraping with Python

What is Web Scraping?
Web scraping, also called web data mining or web harvesting, is the process of constructing an agent which can extract, parse, download and organize useful information from the web automatically.
Libraries used in web scrapping.
In Python there are different libraries for various purposes. We will be using the following libraries:
- BeautifulSoup : BeautifulSoup is a Python tool that allows you to parse HTML and XML texts. It generates parse trees, which are useful for quickly extracting information.
- Pandas: Pandas is a library that may be used to manipulate and analyse data. It is customary to extract data and save it in the desired format.
- Requests :The requests module allows you to use Python to send HTTP requests. The response data from an HTTP request is returned as a Response Object (content, encoding, status, etc).
Steps to do Web Scraping.
- First find the URL or website you want to scrap
Here I have used Amazon website to extract the Product name, Ratings and Votes of the product. The URL is : Link
2. Find the information you want to extract
Extract the Product Names and Prices
3. Write the code.
First create python file in Google Colab.
Import libraries:
import requests
from bs4 import BeautifulSoup
import pandas as pd
Create an empty array to store the details. Here, three Empty Arrays are created to store the Product name, rating and vote.
products=[]
prices=[]
Open the URL and extract the data from the website
url="https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
response = requests.get(url)
htmlcontent = response.content
soup = BeautifulSoup(htmlcontent,"html.parser")
Using the Find and Find All methods in BeautifulSoup. We extract the data and store it in the variable.
for a in soup.findAll('a', attrs={'class':'_1fQZEK'}):
name=a.find('div',attrs={'class':'_4rR01T'})
price=a.find('div',attrs={'class':'_30jeq3 _1_WHN1'})
products.append(name.text)
prices.append(price.text)
final=[]
for i in prices:
i=i[1:]
final.append(i)
4. Store the data in a Sheet.We store the data in Comma-separated values (CSV format)
df = pd.DataFrame({'Product Name':products,'Prices':final})
df.head()

Source Code : Github link