开发者

What is the correct XPath query for 'select onchange'

开发者 https://www.devze.com 2023-04-10 02:37 出处:网络
I\'m learning XPath and looking to extract the URL embedded within the following HTML. I\'ve tried variants of @\"//table[contains(@option, \'value\')]\" without success.

I'm learning XPath and looking to extract the URL embedded within the following HTML. I've tried variants of @"//table[contains(@option, 'value')]" without success.

<body>
<div id="Wrapper">
<div id="header">
<span id="logoHolder">
<a href="http://www.foo.com">
<img src="/templates/blank_j15/images/nexus_logo.png" width="167" height="65" border="0"/>
</a>
</span>
<span style="float: left; padding-top: 27px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; color: rgb(0, 182, 222); ">Embracing Diversity. Challenging Minds.</span>
<span id="searchHolder">
<div style="clear: both; "/>
<div id="IE_P_space"/>开发者_如何学JAVA;
<div id="arttotalmenucontent_138" class="hidden">
<script type="text/javascript">
<table cellspacing="0" cellpadding="0" border="0" width="100%" id="wrapper_cont_table">
<tbody>
<tr>
<tr>
<tr>
<td valign="top" id="wrapper_cont_leftNav">
<div class="leftnavCont">
<p>
<select onchange="nl(this.value)" size="8">
<option value="/images/download/newsletter/connect04_300911.pdf">Connect 04: 30/09/2011</option>
<option value="/images/download/newsletter/connect03_230911.pdf">Connect 03: 23/09/2011</option>
<option value="/images/download/newsletter/connect02_150911.pdf">Connect 02: 15/09/2011</option>
<option value="/images/download/newsletter/connect01_120911.pdf">Connect 01: 12/09/2011</option>
</p>


//p/select/option/@value

Seems to work for me.

I think there must be a problem with the usage of your xpath library. It didn't take me long to find the source of your sample.

Here's a working example with my xml library of preference.

#!/usr/bin/env python

import os
from urllib2 import urlopen
from lxml import etree

filename = 'sample.html'
url = 'http://www.foo.example/index.php?option=com_content&view=article&id=186&Itemid=301'
# Some simple caching for a test script...
if os.path.exists(filename):
  with open(filename,'r') as f:
    data = f.read()
else:
  data  = urlopen(url).read()
  with open(filename,'w') as f:
    f.write(data)

doc = etree.HTML(data)

for v in doc.xpath('//p/select/option/@value'):
  print v

Produces:

/images/download/newsletter/connect04_300911.pdf
/images/download/newsletter/connect03_230911.pdf
/images/download/newsletter/connect02_150911.pdf
/images/download/newsletter/connect01_120911.pdf
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号