开发者

How to parse html source code with ruby/nokogiri?

开发者 https://www.devze.com 2023-01-21 08:28 出处:网络
I\'ve successfully used ruby (1.8) and nokogiri\'s css parsing to pull out front facing data from web pages.

I've successfully used ruby (1.8) and nokogiri's css parsing to pull out front facing data from web pages.

However I now need to pull out some data from a series of pages where the data is in the "meta" tags in the source code of the page.

One of the lines I need is the following:

<meta name="geo.position" content="35.667459;139.706256" />

I've tried using xpath put haven't been able to get it right.

Any help as to what syntax is neede开发者_如何转开发d would be much appreciated.

Thanks


This is a good case for a CSS attribute selector. For example:

doc.css('meta[name="geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

The equivalent XPath expression is almost identical:

doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end


require 'nokogiri'

doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"
0

精彩评论

暂无评论...
验证码 换一张
取 消