开发者

How can I create search terms with wildcards in Python?

开发者 https://www.devze.com 2023-02-28 22:57 出处:网络
I want to check whether a certain term is开发者_C百科 contained in a document.However, sometimes, the word is in several forms (plural, past tense, etc).

I want to check whether a certain term is开发者_C百科 contained in a document. However, sometimes, the word is in several forms (plural, past tense, etc).

'Hello Worlds'
'Hellos Worlds'
'Jello World'
'Hello Worlded'

How can I create a search term which will find all instances such as

'*ello* World*'

where star is a wild card that doesn't necessarily have to be included in the word.

I found documentation for an fnmatch module, but I can't see how that can help me search through a document.


Use regular expressions and just loop through the file:

import re
f=open('test.file.here', 'r')

pattern = re.compile("^[^\s]*ello[^\s]*\sWorld[^\s]*$")

for line in f:
  if pattern.match(line):
    print line,

f.close()


I would usually opt for a regular expression, but if for some reason you want to stick to the wildcard format, you can do this:

from fnmatch import fnmatch

pattern = '*ello* World*'

with open('sample.txt') as file:
    for line in f:
        if fnmatch(line, pattern):
            print(line)


The * syntax you describe is known as globbing. It doesn't work for documents, just files and directories. Regular expressions, as others have noted, are the answer.


If you're doing anything complicated, regular expressions are the way to go. If you're not comfortable with those, I think for your specific question you could also use "in". For example:

x = 'hello world'
if 'ello' in x and 'world' in x':
     print 'matches'
else:
     print 'does not match'


can you use a regular expression?

import re
m = re.search('\.*ello', somefile)

more here:

http://docs.python.org/library/re.html

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号