开发者

Regular Expressions - testing if a String contains another String

开发者 https://www.devze.com 2023-01-21 03:10 出处:网络
Suppose you have some this String (one line) 10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] \"GET /keyser/22300/ HTTP/1.0\" 302 528 \"-\"

Suppose you have some this String (one line)

10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"

and you want to extract the part between the GET and HTTP (i.e., some url) but only if it contains开发者_C百科 the word 'puzzle'. How would you do that using regular expressions in Python?

Here's my solution so far.

match = re.search(r'GET (.*puzzle.*) HTTP', my_string)

It works but I have something in mind that I have to change the first/second/both .* to .*? in order for them to be non-greedy. Does it actually matter in this case?


No need regex

>>> s
'10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"'

>>> s.split("HTTP")[0]
'10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ '

>>> if "puzzle" in s.split("HTTP")[0].split("GET")[-1]:
...   print "found puzzle"
...


It does matter. The User-Agent can contain anything. Use non-greedy for both of them.


>>> s = '10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"'
>>> s.split()[6]
'/keyser/22300/'
0

精彩评论

暂无评论...
验证码 换一张
取 消