开发者

Setting timeouts to parse webpages using python lxml

开发者 https://www.devze.com 2022-12-29 18:40 出处:网络
I am using python lxml library to parse html pages: import lxml.html # this might run indefinitely page = lxml.html.parse(\'http://stackoverf开发者_StackOverflowlow.com/\')

I am using python lxml library to parse html pages:

import lxml.html

# this might run indefinitely
page = lxml.html.parse('http://stackoverf开发者_StackOverflowlow.com/')

Is there any way to set timeout for parsing?


It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.

import socket
timeout = 10
socket.setdefaulttimeout(timeout)

Of course this is a quick-and-dirty solution.

0

精彩评论

暂无评论...
验证码 换一张
取 消