Programming

Image crawling with Python on Chrome browser

2022-07-09

Step by step insturction

1. Install selenium and beautiful soup on terminal

pip install bs4
pip install pip install selenium

2. Import request, BeautifulSoup, webdriver, Keys and time

# google.py
import urllib.request #import urllib.request
from bs4 import BeautifulSoup #import BeautifulSoup
from selenium import webdriver #import webdriver
from selenium.webdriver.common.keys import Keys #import Keys
import time #import time; enablues you to use .sleep() while crwaling imges

3. Instantiate Chrome browser (Windows10-based)

If you haven’t got Chrome driver, download it and install first at ChromeDriver.

I will explain based on the local path of mine.

# google.py

# instantiate browser
binary = r'C:\Users\HP\Desktop\chromedriver\chromedriver.exe' #path to the chrome driver
browser = webdriver.Chrome(binary) #init browser

#get browser. this will open a new ready-to-search tab of Chrome browser
browser.get("https://www.google.com/imghp?hl=en&search?hl=en&q=")

# q is a name element
elem = browser.find_element_by_name("q") #init elem
elem.send_keys("golden retriever") #keywords that you wanna use to search
elem.submit() #elem.submit()

r in front of the chromdriver path can be used if some escaping-related error occurred

    r'C:\Users\HP\Desktop\chromedriver\chromedriver.exe'

4. Load all possible images on the browser

# google.py

# for-loop
for i in range(1 ,10):
    # find body tag and execute send_keys(Keys.END) for i < 10 so 9 times
    # Keys.END is when the END key is executed to be cliecked
    browser.find_element_by_xpath("//body").send_keys(Keys.END)

    try:
        browser.find_element_by_id("smb").click() #smb == when clicking show more result button
        time.sleep(5) # sleep for 5 sec
    except:
        time.sleep(5) # sleep for 5 sec

time.sleep(5) # sleep for 5 sec

5. Fetch the image URLs and download them

Initialize html and soup

# google.py

html = browser.page_source # get page_source from browser

# init soup by calling the method BeautifulSoup with two parameters; the variable html and "html.parser"
soup = BeautifulSoup(html, "html.parser") #this code enables you to fetch the image urls and download the images

Create methods for fetching and downloading

# google.py

# method for listing url
def fetch_list_url():
    params = [] #declare and init an array

    # find all img tags and the class whose name is rg_i
    imgList = soup.find_all("img", class_ ="rg_i")

    #Now, extract the img sources(urls) from imgList
    for im in imgList:
        try:
            params.append(im["src"]) #source address of an img in the class rg_i
        except KeyError:
            params.append(im["data-src"])
    return params #return params

# method for downloading imgs from the url
def fetch_detail_url():
    params = fetch_list_url() #init params by calling the method fetch_list_url()

    # print(params)
    a = 1 #a = 1
    for p in params:
        # @param p. the source img urls in params are assigned into p with this foreach-loop if urls are fetched properly   
        # @param path; gives download path
        # @param a; gives auto-incrementing numeric file name
        # finally, set .jpg extension to each of the img downloaded.
        urllib.request.urlretrieve(p, r"C:\Users\HP\Documents\python_project\img/ "+ str(a) + ".jpg" )
        a+=1 #a = a + 1; increment nums

#by calling the method fetch_detail_url(), the method fetch_list_url() is executed first in it.
fetch_detail_url() #fetch_detail_url()
browser.quit() #close the browser

Full code at here

References

_{JoCoding, RickyAvina, joygoround, 보안공돌이, itopia, Jusung’s Blog, 브라보 마이라이프, sprumin.github.io, fun-coding, Yumere, beomi.github.io}
_{stackoverflow.com/questions/53902507

github.com/SeleniumHQ/docker-selenium

github.com/ONLYOFFICE/testing-wrata

github.com/moby/moby/issues/6758

stackoverflow.com/questions/14192709}

dedicated assistance to your jekyll website, blog, portfolio, or project (fiverr gig @whatthehekkist)

your jekyll website helper for $5

What The Hekk is HTML/CSS/JS? a complete guide to creating a web app in 5 min from scratch!

Kindle Edition

PDF

You May Also Enjoy

Web Dev

what is switch statement?

2023-05-26

what is switch statement? switch statement is a conditional statement in JS (and in most programming languages) to execute a certain action over multiple inner blocks called case. switch takes an expression (input condition) that looks for a match...

Web Dev

what is web and HTML/CSS/JS ?

2023-04-19

# What is Web? Web (World Wide Web; WWW) is a vast online information system you can write, read, watch, or consume resources such as documents, media, etc accessed by web browser on the Internet. You can think like the...

ebook

6 Useful Hacks for ebook Publish in 2023

2023-07-19

# Choose a legit ebook editing software/program If you plan to write and publish an ebook, it's best practice to pick a robust ebook editing program for your manuscript. There are plenty of ebook editing softwares/programs you can go for...

Web Dev

[JS variables] what is var, let, and const?

2023-04-08

In any script/programming language, variable is a logical storage where you can assign value into, and JS has three types of variables; var, let, and const. Let's get familiar with some of the variable-related terms first. * Declaration is...

Network

[Packet Tracer] Configuring a DHCPv4 Server and a Relay Agent

2022-07-14

Cisco Network Academy provides plenty of learning resources in networking. I will cover how to configure basic DHCPv4 on a router and a DHCPv4 Server, and a DHCP relay agent using Packet Tracer where you can simulate network...

Web Dev

Bootstrap Nav, Browser inspection, and Browser rendering process

2023-04-23

Bootstrap navigation Bootstrap navigation is one of the most widely used navigation templates and there are a lot of practical points that help get familiar with frontend. In this post, we are going to have a close look at...

Web Dev

Nelify에 Jekyll 웹사이트 배포하기

2022-07-10

정적 웹사이트나 블로그에 관심이 많다면 Jekyll에 대해 들어 봤을 것이다. 간단한 구글링만으로도 Jekyll을 사용하여 어떻게 웹사이트나 블로그를 만드는지 쉽게 알 수 있다. 이 블로그 또한 Jekyll 기반이고, Better Jeong님과 Jihye Leee님의 블로그 포스트를 참고하여 커스터마이징하였다. Nelify에...