Scrape images from google :

Scrape images from google :

Step-by-Step Guide with Python (Requests and bs4)

·

5 min read

This is what we will Scrape :

Introduction :

Get ready, In this tutorial, we will explore how to scrape images from Google using the Python libraries Requests and BeautifulSoup. Web scraping allows us to automate the extraction of data from websites, and in this case, we'll focus on retrieving images from Google search results.

By the end of this tutorial, you will have a basic understanding of how to scrape images and save them to your local machine.

let's dive in!

Who Can Do This?

This tutorial is suitable for Python enthusiasts, developers, and data scientists looking to automate the process of retrieving images from Google search results.

Prerequisites:

Before we begin, make sure you have Python installed on your machine, along with the Requests and BeautifulSoup libraries. You can install them using the following command:

pip install requests beautifulsoup4

Step 1:

Importing the Required Libraries:

First, let's import the necessary libraries:

import requests
from bs4 import BeautifulSoup
import os
  • The requests module helps the code communicate with web servers by sending requests for information and receiving responses. it allows the code to send HTTP requests and receive responses from web servers.

  • The BeautifulSoup module assists in understanding the structure of HTML and XML documents found on web pages, making it easier to extract specific data from them.

  • The os module enables the code to interact with the computer's operating system, allowing it to perform tasks like accessing files and folders, creating new ones, or deleting them. It provides a way for the code to manage files and directories on the computer.

Step 2:

Sending a Request to Google Images To start scraping images, we need to send a request to the Google Images page. We'll use the requests library to make the request and retrieve the HTML content:

query = input("Enter your search query: ")
url = f"https://www.google.com/search?q={query}&tbm=isch"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"
}

response = requests.get(url, headers=headers)
  1. First, we prompt the user to enter their search query using the input function and store it in the query variable.

  2. Next, we construct the URL by appending the search query to the base Google Images URL. This is done using an f-string (f"https://www.google.com/search?q={query}&tbm=isch"). The tbm=isch parameter ensures that we get image search results.

  3. We set the headers dictionary to mimic a web browser's User-Agent header. This helps us avoid any potential issues with the request.

  4. Finally, we make a GET request to the constructed URL using requests.get() and store the response in the response variable.

Step 3:

Parsing the HTML Content

Next, we'll use BeautifulSoup to parse the HTML content and extract the image URLs:

soup = BeautifulSoup(response.content, "html.parser")
image_elements = soup.find_all("img")

image_urls = []
for image in image_elements:
    image_urls.append(image["src"])

The code uses BeautifulSoup library to parse HTML content and extract image URLs:

  1. It creates a BeautifulSoup object, soup, by parsing the HTML content.

  2. It finds all the image elements in the parsed HTML and stores them in the image_elements list.

  3. It initializes an empty list, image_urls, to store the image URLs.

  4. It loops through each image element and extracts the value of the "src" attribute, adding it to the image_urls list.

  5. After executing the code, the image_urls list will contain the URLs of all the images found in the HTML content.

Step 4:

Downloading and Saving the Images

Now that we have a list of image URLs, we can proceed to download and save them on our local machine. We'll create a directory to store the images and then iterate over the URLs, downloading each image using the requests library:

os.makedirs("images", exist_ok=True)

for i, image_url in enumerate(image_urls):
    response = requests.get(image_url)
    file_name = f"images/image{i}.jpg"
    with open(file_name, "wb") as file:
        file.write(response.content)
        print(f"Image {i+1}/{len(image_urls)} downloaded and saved as {file_name}")
  1. The code creates a directory named "images" using the os.makedirs function. The exist_ok=True parameter ensures that the directory is only created if it doesn't already exist.

  2. It iterates through each image URL in the image_urls list using the enumerate function. The enumerate function provides an index (i) along with each image URL.

  3. For each image URL, it sends a request using the requests.get function to retrieve the image content.

  4. It generates a file name for the image based on the index, such as "image0.jpg", "image1.jpg", etc.

  5. It opens the file in binary mode using the open function, with the file name and "wb" mode (write binary).

  6. It writes the content of the image response to the opened file using the write method of the file object.

  7. It prints a message indicating the progress of the download, including the image index (i+1) and the total number of images (len(image_urls)).

Here's a single snippet that combines the entire program for scraping images from Google using Python

import requests
from bs4 import BeautifulSoup
import os

def scrape_images_from_google(query):
    url = f"https://www.google.com/search?q={query}&tbm=isch"

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"
    }

    response = requests.get(url, headers=headers)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, "html.parser")
    images = soup.find_all("img")

    for i, image in enumerate(images):
        image_url = image["src"]
        image_data = requests.get(image_url).content
        with open(f"image_{i}.jpg", "wb") as file:
            file.write(image_data)

    print("Image scraping completed!")

query = input("Enter your search query: ")
scrape_images_from_google(query)

Conclusion:

In this tutorial, we explored how to scrape images from Google using the Requests and BeautifulSoup libraries in Python. We learned how to send a request to Google Images, parse the HTML content, extract the image URLs, and save the images to our local machine. Web scraping opens up a world of possibilities for automating data extraction tasks, and with the right techniques, you can scrape images and other information from various websites.

I hope you found this tutorial helpful and have a great time exploring the world of web scraping!