The internet is full of huge amounts of data that can be used for different purposes. To collect this data we need to know how to scrape data from a website.
Web scraping is the process of extracting and collecting data from websites and storing it on a local machine or in a database.
In this section, we will use the beautiful soup and requests package to scrape data. The package version we are using is beautifulsoup 4.
To start scraping websites, you need requests, beautifoulSoup4 and a website.
pip install requests
pip install beautifulsoup4
To scrape data from websites, basic understanding of HTML tags and CSS selectors is needed. We target content from a website using HTML tags, classes or/and ids. Let us import the requests and BeautifulSoup module
import requests
from bs4 import BeautifulSoup
Let us declare url variable for the website which we are going to scrape.
import requests
from bs4 import BeautifulSoup
url = 'https://archive.ics.uci.edu/ml/datasets.php'
# Lets use the requests get method to fetch the data from url
response = requests.get(url)
# lets check the status
status = response.status_code
print(status) # 200 means the fetching was successful
200
Using beautifulSoup to parse content from the page
import requests
from bs4 import BeautifulSoup
url = 'https://archive.ics.uci.edu/ml/datasets.php'
response = requests.get(url)
content = response.content # we get all the content from the website
soup = BeautifulSoup(content, 'html.parser') # beautiful soup will give a chance to parse
print(soup.title) # <title>UCI Machine Learning Repository: Data Sets</title>
print(soup.title.get_text()) # UCI Machine Learning Repository: Data Sets
print(soup.body) # gives the whole page on the website
print(response.status_code)
tables = soup.find_all('table', {'cellpadding':'3'})
# We are targeting the table with cellpadding attribute with the value of 3
# We can select using id, class or HTML tag , for more information check the beautifulsoup doc
table = tables[0] # the result is a list, we are taking out data from it
for td in table.find('tr').find_all('td'):
print(td.text)
If you run this code, you can see that the extraction is half done. You can continue doing it because it is part of exercise 1. For reference check the beautifulsoup documentation
Now do some exercises for your brain and muscles.
Day 30 Conclusions In the process of preparing this material, I have learned quite a…
Day 29: Building API In this section, we will cover a RESTful API that uses HTTP…
Day 28: Application Programming Interface (API) API API stands for Application Programming Interface. The kind…
Day 27: Python with MongoDB Python is a backend technology, and it can be connected…
Day 26: Python for Web Python is a general-purpose programming language, and it can be…
Day 25: Pandas Pandas is an open-source, high-performance, easy-to-use data structure, and data analysis tool…
View Comments
Çandır Jakuzi | Atlas Jakuzi'nin sunduğu ürünler sayesinde evimde bir lüks oteldeymiş gibi hissediyorum. Harika bir deneyim!
Witpack | MAFA Technology's insights provide valuable context for understanding tech developments.
Köprüköy Jakuzi Fiyatları | Atlas Jakuzi'nin sunduğu ürünler, evimde kendime ayırdığım zamanı daha da özel kılıyor. Teşekkürler!
Eyyübiye / Şanlıurfa | After reading MAFA's articles, I feel more inspired and motivated to explore new possibilities in web design and development. Thank you for the encouragement.
Merdivenköy / Kadıköy Karotçu | Rüzgar Karot's professionalism and effective communication ensured smooth progress in our work.
Cute and Easy-To-Style Short Layered Hairstyles for 2024 | Your writing has a way of resonating with me on a deep level. Thank you for sharing your insights.
Webshop laten maken Zaanstad | MAFA's Engagement für Exzellenz im Webdesign und in der Softwareentwicklung ist bewundernswert. Ich kann es kaum erwarten, mit ihnen an meinem nächsten Projekt zu arbeiten.
Aydıncık Toptan Giyim | The customer service at RENE Wholesale Textile and Clothing Solutions is exemplary. They're always prompt and helpful.
Hidrolik Beton Kesme | Rüzgar Karot's expertise and quality service always leave me satisfied.