开发者

How can i do this using a Python Regex?

开发者 https://www.devze.com 2022-12-30 03:34 出处:网络
I am trying to properly extract methods definitions that are generated by comtypes for Com Interfaces using a regex. Furthermore some of them are blank which causes even more problems for me.

I am trying to properly extract methods definitions that are generated by comtypes for Com Interfaces using a regex. Furthermore some of them are blank which causes even more problems for me.

Basically i have this:

IXMLSerializerAlt._methods_ = [
    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
              ( ['in'], BSTR, 'XML' ),
              ( ['in'], BSTR, 'TypeName' ),
              ( ['in'], BSTR, 'TypeNamespaceURI' ),
              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]

class EnvironmentManager(CoClass):
    u'Singleton object that manages different environments (collections of configuration information).'
    _reg_clsid_ = GUID('{8A626D49-5F5E-47D9-9463-0B802E9C4167}')
    _idlflags_ = []
    _typelib_path_ = typelib_path
    _reg_typelib_ = ('{5E1F7BC3-67C5-4AEE-8EC6-C4B73AAC42ED}', 1, 0)

INumberFormat._methods_ = [
]

I want to extract both the IXMLSerializerAlt and INumberFormat methods definitions however i cant figure out a proper regex. E.g. for IXMLSerializer i want to extract this:

IXMLSerializerAlt._methods_ = [
    COMMETHOD([helpst开发者_如何学Pythonring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
              ( ['in'], BSTR, 'XML' ),
              ( ['in'], BSTR, 'TypeName' ),
              ( ['in'], BSTR, 'TypeNamespaceURI' ),
              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]

This regex in my mind this should work:

^\w+\._methods_\s=\s\[$
(^.+$)*
^]$

Im checking my regex's using kodos however i cannot figure out a way to make this work.


You're missing the newline characters between $ and ^, and may not be using the re.MULTILINE flag which allows those to anchor at the start and end of lines. The following (compiled with re.MULTILINE) would match:

\w+\._methods_\s=\s\[$(?:\n^.+$)*\n^\]$

However, here's a slightly simpliifed regex that will also match your examples:

>>> s = '''...\nIXMLSerializerAlt._methods_ = [\n    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',\n              ( ['in'], BSTR, 'XML' ),\n              ( ['in'], BSTR, 'TypeName' ),\n              ( ['in'], BSTR, 'TypeNamespaceURI' ),\n              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),\n]\n...'''
>>> import re
>>> re.findall(r'^\w+\._methods_\s=\s\[$.*?^\]$', s, re.DOTALL | re.MULTILINE)
["IXMLSerializerAlt._methods_ = [\n    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',\n              ( ['in'], BSTR, 'XML' ),\n              ( ['in'], BSTR, 'TypeName' ),\n              ( ['in'], BSTR, 'TypeNamespaceURI' ),\n              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),\n]"]


import re

interface_definitions = '''
IXMLSerializerAlt._methods_ = [
    COMMETHOD([helpstring(u'Loads an object from an XML string.')], HRESULT, 'LoadFromString',
              ( ['in'], BSTR, 'XML' ),
              ( ['in'], BSTR, 'TypeName' ),
              ( ['in'], BSTR, 'TypeNamespaceURI' ),
              ( ['retval', 'out'], POINTER(POINTER(IUnknown)), 'obj' )),
]

class EnvironmentManager(CoClass):
    u'Singleton object that manages different environments (collections of configuration information).'
    _reg_clsid_ = GUID('{8A626D49-5F5E-47D9-9463-0B802E9C4167}')
    _idlflags_ = []
    _typelib_path_ = typelib_path
    _reg_typelib_ = ('{5E1F7BC3-67C5-4AEE-8EC6-C4B73AAC42ED}', 1, 0)

INumberFormat._methods_ = [
]
'''

RX_METHODS = re.compile(
    r'(\w+)\._methods_\s=\s\[('
    r'.*?'
    r'(?:\[.*?\].*?)*'
    r')\]',
    re.DOTALL)

for match in RX_METHODS.finditer(interface_definitions):
    print match.groups()
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号