I want to fetch google images against any query. I have gon开发者_开发百科e through the google image search api but unable to understand. i have also seen some methods, they fetch images but only of first page.i have used following method.
function getGoogleImg($k) { $url = "http://images.google.it/images?as_q=##query##&hl=it&imgtbs=z&btnG=Cerca+con+Google&as_epq=&as_oq=&as_eq=&imgtype=&imgsz=m&imgw=&imgh=&imgar=&as_filetype=&imgc=&as_sitesearch=&as_rights=&safe=images&as_st=y"; $web_page = file_get_contents( str_replace("##query##",urlencode($k), $url )); $tieni = stristr($web_page,"dyn.setResults("); $tieni = str_replace( "dyn.setResults(","", str_replace(stristr($tieni,");"),"",$tieni) ); $tieni = str_replace("[]","",$tieni); $m = preg_split("/[\[\]]/",$tieni); $x = array(); for($i=0;$i<count($m);$i++) { $m[$i] = str_replace("/imgres?imgurl\\x3d","",$m[$i]); $m[$i] = str_replace(stristr($m[$i],"\\x26imgrefurl"),"",$m[$i]); $m[$i] = preg_replace("/^\"/i","",$m[$i]); $m[$i] = preg_replace("/^,/i","",$m[$i]); if ($m[$i]!="") array_push($x,$m[$i]); } return $x; }
This function return only 21 images. i want all images against this query. i am doing this in php
Sadly the image API is being closed down, so I wont suggest moving to that, but that would have been a nicer solution I think.
My best guess is that image 22 and forwards is being loaded using som ajax/javascript of some sort (if you search for say logo and scroll down you will see placeholders that gets loaded as you move down) and that you need to pass the page by a javascript engine and that is not something that I can find anyone who have done with php (yet). Have you checked that $web_page contains more than 21 images (when I toy against google image search it uses javascript to load some of the images)? When you access the link from your normal browser what happens then and what happens if you turn off javascript? Is there perhaps a link to next page in the result you have?
In the now deprecated Image API there were ways to limit the number of results per page and ways to step to the next page https://developers.google.com/image-search/v1/jsondevguide#json_snippets_php
If you wish to keep on doing searches and fetching images from the search result then for later http://simplehtmldom.sourceforge.net/ might be a nice alternative to look at. It fetches a html DOM and allows you to easily find nodes and makes it easy to work with them. But it still uses file_get_contents or curl libraries to fetch the data so it might need some fiddling to get javascript working.
I wrote a script to download images form google Image search which I currently downloading 100 original images
The original script I wrote on stackoverflow answer
Python - Download Images from google Image search?
which I will explain in detail how I am scraping url’s of original Images from Google Image search using urllib2 and BeautifulSoup
For example if u want to scrape images of movie terminator 3 from google image search
query= "Terminator 3"
query= '+'.join(query.split()) #this will make the query terminator+3
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
req = urllib2.Request(url,headers=header)
soup= urllib2.urlopen(req)
soup = BeautifulSoup(soup)
variable soup above contains the html code of the page that is requested now we need to extract the images for that u have to open the web page in your browser and and do inspect element on the image
here you will find the the tags containing the image of the url
for example for google image i found "div",{"class":"rg_meta"} containing the link to image
You can search up the BeautifulSoup documentation
print soup.find_all("div",{"class":"rg_meta"})
You will get a list of results as
<div class="rg_meta">{"cl":3,"cr":3,"ct":12,"id":"C0s-rtOZqcJOvM:","isu":"emuparadise.me","itg":false,"ity":"jpg","oh":540,"ou":"http://199.101.98.242/media/images/66433-Terminator_3_The_Redemption-1.jpg","ow":960,"pt":"Terminator 3 The Redemption ISO \\u0026lt; GCN ISOs | Emuparadise","rid":"VJSwsesuO1s1UM","ru":"http://www.emuparadise.me/Nintendo_Gamecube_ISOs/Terminator_3_The_Redemption/66433","s":"Screenshot Thumbnail / Media File 1 for Terminator 3 The Redemption","th":168,"tu":"https://encrypted-tbn2.gstatic.com/images?q\\u003dtbn:ANd9GcRs8dp-ojc4BmP1PONsXlvscfIl58k9hpu6aWlGV_WwJ33A26jaIw","tw":300}</div>
the result above contains link to our image url
http://199.101.98.242/media/images/66433-Terminator_3_The_Redemption-1.jpg
You can extract these links and images as follows
ActualImages=[]# contains the link for Large original images, type of image
for a in soup.find_all("div",{"class":"rg_meta"}):
link , Type =json.loads(a.text)["ou"] ,json.loads(a.text)["ity"]
ActualImages.append((link,Type))
for i , (img , Type) in enumerate( ActualImages):
try:
req = urllib2.Request(img, headers={'User-Agent' : header})
raw_img = urllib2.urlopen(req).read()
if not os.path.exists(DIR):
os.mkdir(DIR)
cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
print cntr
if len(Type)==0:
f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')
else :
f = open(DIR + image_type + "_"+ str(cntr)+"."+Type, 'wb')
f.write(raw_img)
f.close()
except Exception as e:
print "could not load : "+img
print e
Voila now u can use this script to download images from google search. Or for collecting training images
For the fully working script you can get it here
https://gist.github.com/rishabhsixfeet/8ff479de9d19549d5c2d8bfc14af9b88
精彩评论