开发者

How do I extract the target URL from a Google search result?

开发者 https://www.devze.com 2023-04-12 20:45 出处:网络
I am trying to extract URLs from Google search results. I use Indy IdHTTP to get HTML results from Google, and I use Achmad Z\'s code for getting the link hrefs from the page. How can I get the real l

I am trying to extract URLs from Google search results. I use Indy IdHTTP to get HTML results from Google, and I use Achmad Z's code for getting the link hrefs from the page. How can I get the real link target for each URL instead of the one that goes through Google's redirector?


I tried th开发者_开发知识库at but I get an "Operand no applicable" error in this part of the code:

function ToUTF8Encode(str: string): string;
var
  b: Byte;
begin
  for b in BytesOf(UTF8Encode(str)) do
  begin
    Result := Format('%s%s%.2x', [Result, '%', b]);
  end;
end;

I use Delphi 7 with Indy 9.00.10. Maybe indy update will help ?


If I get it right you are trying to fetch the Google search results using TIdHTTP.Get method. If so, then
you should definitely focus on some Google Search API implementation because

  1. it's impossible to fetch the results this way because you don't have any access to the document inside the iframe in which the search results are, so you won't get any search results by using HTTP GET in this case (or at least I haven't heard about the request which can do that)
  2. it's against Google policies and you should use proper Google Search API instead, for instance Google SOAP Search API, there are available also several types of Google Search API's for various purposes

You can find e.g. here the Delphi wrapper with the demo code for Google Search API. I've tested it with Delphi 2009 on Windows 7/64 and it works fine for me.


In the previous post here I've tried to explain why you should use Google Search API, in this one I'll try to provide you an example with a hope it will work in your Delphi 7.

You need to have the SuperObject (JSON parser for Delphi), I've used this version (latest at this time). Then you need Indy; the best would be to upgrade to the latest version if possible. I've used the one shipped with Delphi 2009, but I think the TIdHTTP.Get method is so important that it must work fine also in your 9.00.10 version.

Now you need a list box and a button on your form, the following piece of code and a bit of luck (for compatibility :)

The URL request building you can see for instance in the DxGoogleSearchApi.pas mentioned before but the best is to follow the Google Web Search API reference. In DxGoogleSearchApi.pas you can take the inspiration e.g. how to fetch several pages.

So take this as an inspiration

uses
  IdHTTP, IdURI, SuperObject;

const
  GSA_Version = '1.0';
  GSA_BaseURL = 'http://ajax.googleapis.com/ajax/services/search/';

procedure TForm1.GoogleSearch(const Text: string);
var
  I: Integer;
  RequestURL: string;
  HTTPObject: TIdHTTP;
  HTTPStream: TMemoryStream;
  JSONResult: ISuperObject;
  JSONResponse: ISuperObject;
begin
  RequestURL := TIdURI.URLEncode(GSA_BaseURL + 'web?v=' + GSA_Version + '&q=' + Text);

  HTTPObject := TIdHTTP.Create(nil);
  HTTPStream := TMemoryStream.Create;

  try
    HTTPObject.Get(RequestURL, HTTPStream);
    JSONResponse := TSuperObject.ParseStream(HTTPStream, True);

    if JSONResponse.I['responseStatus'] = 200 then
    begin
      ListBox1.Items.Add('Search time: ' + JSONResponse.S['responseData.cursor.searchResultTime']);
      ListBox1.Items.Add('Fetched count: ' + IntToStr(JSONResponse['responseData.results'].AsArray.Length));
      ListBox1.Items.Add('Total count: ' + JSONResponse.S['responseData.cursor.resultCount']);
      ListBox1.Items.Add('');

      for I := 0 to JSONResponse['responseData.results'].AsArray.Length - 1 do
      begin
        JSONResult := JSONResponse['responseData.results'].AsArray[I];
        ListBox1.Items.Add(JSONResult.S['unescapedUrl']);
      end;
    end;

  finally
    HTTPObject.Free;
    HTTPStream.Free;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  GoogleSearch('Delphi');
end;


Answer to my question , maybe it can help someone: Fetching web page :

memo1.Lines.Text := idhttp1.Get('http://ajax.googleapis.com/aja...tart=1&rsz=large&q=max');

extracting URL's :

function ExtractText(const Str, Delim1, Delim2: string; PosStart: integer; var PosEnd: integer): string;
var
  pos1, pos2: integer;
begin
  Result := '';
  pos1 := PosEx(Delim1, Str, PosStart);
  if pos1 > 0 then
  begin
    pos2 := PosEx(Delim2, Str, pos1 + Length(Delim1));
    if pos2 > 0 then
    begin
      PosEnd := pos2 + Length(Delim2);
      Result := Copy(Str, pos1 + Length(Delim1), pos2 - (pos1 + Length(Delim1)));
    end;
  end;
end;

And on Button1 just put :

procedure TForm1.Button1Click(Sender: TObject);
var Pos: integer;
    sText: string;
begin
  sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', 1, Pos);
  while sText <> '' do
  begin
    Memo2.Lines.Add(sText);
    sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', Pos, Pos);
  end;
end;

www.delphi.about.com has nice documentation on string manipulation , Zarko Gajic does great job on that site :) NOTE: if google changes it's source this will be useless.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号