Forcing a pass-by-reference in Python_问答_开发者

开发者 https://www.devze.com 2023-03-21 19:36 出处：网络

Boy am I not understanding the python pass-by-reference issues...I have created the extremely useful \"unpacker\" class which I pass around to vari开发者_如何学运维ous objects that need to unpack from

Boy am I not understanding the python pass-by-reference issues... I have created the extremely useful "unpacker" class which I pass around to vari开发者_如何学运维ous objects that need to unpack from it, yet given how extraordinarily slow it is, I can tell it's making a copy of binaryStr each time I pass a BU object. I know this because if I break the BU into smaller chunks, it runs, literally, 100x faster (I was originally using it to hold a 16MB file I/O buffer)

So my question is, why is that member not getting passed by reference, and is there a way to force it to? I am pretty sure the BU object itself is passed by reference (since my code works), but the speed suggests the .binaryStr object is copied. Is there something more subtle that I'm missing?

class BinaryUnpacker(object):
    def __init__(self, binaryStr):
        self.binaryStr = binaryStr
        self.pos = 0

    def get(self, varType, sz=0):
        pos = self.pos
        if varType == UINT32:
            value = unpack('<I', self.binaryStr[pos:pos+4])[0]
            self.pos += 4
            return value
        elif varType == UINT64:
            value = unpack('<Q', self.binaryStr[pos:pos+8])[0]
            self.pos += 8
            return value
        elif varType == VAR_INT:
            [value, nBytes] = unpackVarInt(self.binaryStr[pos:])
            self.pos += nBytes
        ....

The use case for this is something along the lines of :

def unserialize(self, toUnpack):
    if isinstance(toUnpack, BinaryUnpacker):
        buData = toUnpack
    else:  # assume string
        buData = BinaryUnpacker(toUnpack)

    self.var1    = buData.get(VAR_INT)
    self.var2    = buData.get(BINARY_CHUNK, 64)
    self.var3    = buData.get(UINT64)
    self.var4obj = AnotherClass().unserialize(buData)

Thanks so much for your help.

The copies are made when you slice a string to get a substring. For example:

[value, nBytes] = unpackVarInt(self.binaryStr[pos:])

This will create a copy of the string from index pos to the end, which can take time for a long string. It will be faster if you can determine the number of bytes you actually need before taking the substring, and then use self.binaryStr[pos:pos+nBytes], since taking a small substring is relatively fast.

Note that the time depends only on the length of the substring, so self.binaryStr[pos:pos+4] should take roughly the same amount of time regardless of the length of self.binaryStr.

I did not look at your code in depth, but types that expose a buffer() method (such as strings) can be accessed with memoryview objects without having to copy the data. Here's the relevant documentation for it.

You could use a memoryview object instead of slicing the string: this way you would bypass the time-consuming passage of your current code.

A few days ago I asked a question about this that perhaps could be useful to you.

I don't think judging simply by speed is proper. You said you can tell that the string is being copied because if you break it into smaller chunks it runs much faster. But the running time of unpack() function which you didn't give detail about could also depend on the data size.

Besides, slicing a string such as

unpack('<I', self.binaryStr[pos:pos+4])[0]

will create new string objects since strings are immutable objects.

Forcing a pass-by-reference in Python

精彩评论

关注公众号

热门标签

图文推荐

Forcing a pass-by-reference in Python

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：