Web Scraping with BeautifulSoup & Requests

The Scraping Toolkit

Requests: For making HTTP calls and getting raw HTML.
BeautifulSoup: For parsing and navigating the HTML tree.
Selenium: For scraping dynamic, JavaScript-rendered sites.

Basic Scraping Flow

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
resp = requests.get(url)

soup = BeautifulSoup(resp.text, 'html.parser')
title = soup.find('h1').text
print(title)

Navigating the Tree

Find elements by tag, class, or ID.

# Find all links
links = soup.find_all('a')

# Find by class
articles = soup.find_all('div', class_='article-content')

# CSS Selectors
header_links = soup.select('nav ul li a')

Respecting Robots.txt

Always check a site's /robots.txt file to see what they allow you to scrape. Be a good bot: add delays between requests and set a proper User-Agent.

✅ Practice (30 minutes)

Install dependencies: pip install requests beautifulsoup4.
Scrape a news website and print the titles of the top 5 articles.
Extract all image URLs from a landing page.
Save your scraped data to a JSON file.