Regex to convert path to URL_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-19 22:46 出处：网络

I have this python script which is supposed to wrap all that look like a path within atag to make an url out of it.

相关专题：python regex

I have this python script which is supposed to wrap all that look like a path within a tag to make an url out of it.

def wrap(text, regex):
    s开发者_开发百科tart, end = '<a href="/static', '">Link to the file</a>'
    matchs = sorted([(s.start(), s.end()) for s in re.finditer(regex, text)],
            reverse = True)
    for match in matchs: 
        text = text[:match[1]] + end + text[match[1]:]
        text = text[:match[0]] + start + text[match[0]:]
    return text

And I tried many combination like this one :

>>> wrap('HA HA HA /services/nfs_qa/log.lol HO HO HO', '/services/nfs_qa/.* ??')
'HA HA HA <a href="/static/services/nfs_qa/log.lol HO HO HO">Link to the file</a>'

But it seems I'm not able to get it right. So I could use a little help there !

Thanks in advance

It depends a bit on which characters you allow in path names, but this does the trick for your example:

wrap('HA HA HA /services/nfs_qa/log.lol HO HO HO', '/services/nfs_qa/[^ ]*')
'HA HA HA <a href="/static/services/nfs_qa/log.lol">Link to the file</a> HO HO HO'

The [^ ] means anything but a space (the opposite of [ ]).

If any character is allowed in a path name, it's impossible.

"." mathches every character, you should match " everything except whitespace character", which means \S or on this example [^ ] :

wrap('HA HA HA /services/nfs_qa/log.lol HO HO HO', '/services/nfs_qa/\S*')

And, your wrap function could have written simplier using re.sub

import re

def tag_it(match_obj):
    tags = "<a href =\"/static{0}\">Link to the File</a>"
    return tags.format(match_obj.group(0))

def wrap(text, regex):
    return re.sub(regex, tag_it, text)

a = wrap('HA HA HA /services/nfs_qa/log.lol HO HO HO', '/services/nfs_qa/\S*')
print(a)
#Outputs: 
#HA HA HA <a href ="/static/services/nfs_qa/log.lol">Link to the File</a> HO HO HO

You are trying to match to much. You only want to match the URL so an RE like '/services/nfs_qa/\S+' is better suited. the \S+ matches any non whitespace characters after the /services/nfs_qa/