As of August 2016, the free Google API that this library used is no longer available. As a result, googlesearch is no longer in PyPI. See here for more information.
#####Search the web with python
GoogleSearch is a Python 2 library for searching the web, using Google's Custom Search JSON/Atom API. It provides a simple python API for this task, as a wrapper around Google's.
>>> from googlesearch import GoogleSearch
>>> gs = GoogleSearch("An intriguing query")
>>> for url in gs.top_urls():
... print url
pip install -U googlesearch
Print a list of top hits for a query. Like a miniature first page of hits on Google.
from googlesearch import GoogleSearch
from pprint import pprint
gs = GoogleSearch("Bacon")
for hit in gs.top_results():
{u'GsearchResultClass': u'GwebSearch',
u'cacheUrl': u'',
u'content': u'<b>Bacon</b> is a meat product prepared from a pig and usually cured. It is first cured \nusing large quantities of salt, either in a brine or in a dry packing; the result is \nfresh\xa0...',
u'title': u'<b>Bacon</b> - Wikipedia, the free encyclopedia',
u'titleNoFormatting': u'Bacon - Wikipedia, the free encyclopedia',
u'unescapedUrl': u'',
u'url': u'',
u'visibleUrl': u''}
{u'GsearchResultClass': u'GwebSearch',
u'cacheUrl': u'',
u'content': u'Francis <b>Bacon</b>, 1st Viscount St. Alban, QC (/\u02c8be\u026ak\u0259n/; 22 January 1561 \u2013 9 April \n1626), was an English philosopher, statesman, scientist, jurist, orator, essayist\xa0...',
u'title': u'Francis <b>Bacon</b> - Wikipedia, the free encyclopedia',
u'titleNoFormatting': u'Francis Bacon - Wikipedia, the free encyclopedia',
u'unescapedUrl': u'',
u'url': u'',
u'visibleUrl': u''}
{u'GsearchResultClass': u'GwebSearch',
u'cacheUrl': u'',
u'content': u'<b>Bacon</b>. <b>Bacon</b>; 900 W 10th St; Austin, Texas 78703. Hours: Monday - Friday: \n11am - 9pm; Saturday: 9am - 9pm; Sunday: 9am - 3pm. View Larger Map\xa0...',
u'title': u'<b>Bacon</b>',
u'titleNoFormatting': u'Bacon',
u'unescapedUrl': u'',
u'url': u'',
u'visibleUrl': u''}
{u'GsearchResultClass': u'GwebSearch',
u'cacheUrl': u'',
u'content': u'Make <b>bacon</b> the star ingredient in pastas, salads, snacks and more from Food \nNetwork Magazine.',
u'title': u'50 Things to Make With <b>Bacon</b> : Recipes and Cooking : Food Network',
u'titleNoFormatting': u'50 Things to Make With Bacon : Recipes and Cooking : Food Network',
u'unescapedUrl': u'',
u'url': u'',
u'visibleUrl': u''}
Query Wikipedia and show the top hit.
from googlesearch import GoogleSearch
def search_wikipedia(query):
gs = GoogleSearch(" %s" % query)
print gs.top_result()['titleNoFormatting']
print gs.top_url()
return gs.top_url()
wiki_url = search_wikipedia("Porcupine")
Porcupine - Wikipedia, the free encyclopedia
Which of the two words is used more on the Internet?
from googlesearch import GoogleSearch
def x_vs_y_count_match(x, y):
nx = GoogleSearch(x).count()
ny = GoogleSearch(y).count()
print '%s vs %s:' % (x,y)
report = '%s wins with %i vs %i'
if nx > ny:
print report % (x,nx,ny)
elif nx < ny:
print report % (y,ny,nx)
print "it's a tie with %s each!" % nx
return nx, ny
counts = x_vs_y_count_match("color", "colour")
color vs colour:
color wins with 259000000 vs 55500000
Retrieve the imdb id for a movie using only its name (and year if there are remakes).
from googlesearch import GoogleSearch
import re
def imdb_id_for_movie(movie_name):
query = ' %s' % movie_name
url = GoogleSearch( query ).top_url()
imdb_id ='/tt[0-9]+/', url).group(0).strip('/')
print 'The imdb id for %s is %s' % (movie_name, imdb_id)
return imdb_id
TotRecall_id = imdb_id_for_movie("Total Recall 1990")
The imdb id for Total Recall 1990 is tt0100802
class googlesearch.GoogleSearch(query, use_proxy=True, verbose=True)
A Google search object for a specific query.
query: str
The search query for this search. -
use_proxy: bool, default: True
If True, GoogleSearch will use the proxies defined in the PROXIES_LIST variable defined in the googlesearch.settings module to do the searches. If a proxy starts getting HTTP 403 FORBIDDEN responses, it will switch to the next proxy in the list. It will raise a GoogleAPIError only if all proxies get 403 responses. -
verbose: bool, default: True
If True, GoogleSearch will report to sys.stderr when it switches to another proxy. No logging at all if False. -
hl: str, default: None
If setted, the hl parameter is added to the query, returning search results for the specified language. For example set hl='es' to get results in spanish.
- Returns a list of results for a google search.
Google API determines how many results are returned, current
default is 4.
A result is a dictionary with the following fields:
- Returns a list of results for a google search.
Google API determines how many results are returned, current
default is 4.
- Returns only the top result, the best match. This is the equivalent of "I feel lucky" See GoogleSearch.top_results() for the keys in the result dictionary.
- Returns a list of urls for a google search. Google API determines how many urls are returned, current default is 4.
- Returns the url of the top hit.
- Returns the total number of matches to the query.
- Python >= 2.6
- requests
MIT licensed. See the bundled LICENSE file for more details.