๐Ÿ Python Examples - Comprehensive Code Library
โ† Back to PranavKulkarni.org
Lesson 4 ยท Web Development

Web Scraping with BeautifulSoup & Requests

Extract data from websites using BeautifulSoup, Requests, and Selenium.

The Scraping Toolkit

  • Requests: For making HTTP calls and getting raw HTML.
  • BeautifulSoup: For parsing and navigating the HTML tree.
  • Selenium: For scraping dynamic, JavaScript-rendered sites.

Basic Scraping Flow

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
resp = requests.get(url)

soup = BeautifulSoup(resp.text, 'html.parser')
title = soup.find('h1').text
print(title)

Navigating the Tree

Find elements by tag, class, or ID.

# Find all links
links = soup.find_all('a')

# Find by class
articles = soup.find_all('div', class_='article-content')

# CSS Selectors
header_links = soup.select('nav ul li a')

Respecting Robots.txt

Always check a site's /robots.txt file to see what they allow you to scrape. Be a good bot: add delays between requests and set a proper User-Agent.

โœ… Practice (30 minutes)

  • Install dependencies: pip install requests beautifulsoup4.
  • Scrape a news website and print the titles of the top 5 articles.
  • Extract all image URLs from a landing page.
  • Save your scraped data to a JSON file.