Skip to content
42 changes: 25 additions & 17 deletions web_programming/crawl_google_results.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,32 @@
import sys
import webbrowser
from sys import argv
from urllib.parse import quote

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
from requests import get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from requests import get
import requests

A function called get() is not self-documenting and may confuse the reader ("Get what from where?") so lets use requests.get() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from requests import get import requests
Changed!


if __name__ == "__main__":
if len(argv) > 1:
query = "%20".join(argv[1:])
else:
query = quote(str(input("Search: ")))

print("Googling.....")
url = "https://www.google.com/search?q=" + " ".join(sys.argv[1:])
res = requests.get(url, headers={"UserAgent": UserAgent().random})
# res.raise_for_status()
with open("project1a.html", "wb") as out_file: # only for knowing the class
for data in res.iter_content(10000):
out_file.write(data)
soup = BeautifulSoup(res.text, "html.parser")
links = list(soup.select(".eZt8xd"))[:5]

print(len(links))
for link in links:
if link.text == "Maps":
webbrowser.open(link.get("href"))
else:
webbrowser.open(f"http://google.com{link.get('href')}")
url = f"https://www.google.com/search?q={query}&num=2"

res = get(
url,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0"
Copy link
Member

@cclauss cclauss Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing on

  • web_programming/crawl_google_results.py:21:89: E501 line too long (106 > 88 characters)

Why is this version an improvement on the original?

Copy link
Contributor Author

@cj-praveen cj-praveen Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this version an improvement on the original?

Yeah!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it better?

Copy link
Contributor Author

@cj-praveen cj-praveen Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it better?

The previous or original one Simply type your query into Google. But this one actually scrapes the result link from Google and opens it for you. As a result, you will save time.

Running the previous or original program

C:\> python crawl_google_results.py how to code on ipad
Googling.....
3

It opens these three links, or urls, shown below:

Running my program

C:\> python crawl_google_results.py how to code on ipad
Googling.....

It opens the search result link or url.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Let's move these changes to a new, separate file like launch_google_results.py or open_google_results.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cclauss So can you merge now?

},
)

link = (
BeautifulSoup(res.text, "html.parser")
.find("div", attrs={"class": "yuRUbf"})
.find("a")
.get("href")
)

webbrowser.open(link)