开发者

Forcing a pass-by-reference in Python

开发者 https://www.devze.com 2023-03-21 19:36 出处:网络
Boy am I not understanding the python pass-by-reference issues...I have created the extremely useful \"unpacker\" class which I pass around to vari开发者_如何学运维ous objects that need to unpack from

Boy am I not understanding the python pass-by-reference issues... I have created the extremely useful "unpacker" class which I pass around to vari开发者_如何学运维ous objects that need to unpack from it, yet given how extraordinarily slow it is, I can tell it's making a copy of binaryStr each time I pass a BU object. I know this because if I break the BU into smaller chunks, it runs, literally, 100x faster (I was originally using it to hold a 16MB file I/O buffer)

So my question is, why is that member not getting passed by reference, and is there a way to force it to? I am pretty sure the BU object itself is passed by reference (since my code works), but the speed suggests the .binaryStr object is copied. Is there something more subtle that I'm missing?

class BinaryUnpacker(object):
    def __init__(self, binaryStr):
        self.binaryStr = binaryStr
        self.pos = 0

    def get(self, varType, sz=0):
        pos = self.pos
        if varType == UINT32:
            value = unpack('<I', self.binaryStr[pos:pos+4])[0]
            self.pos += 4
            return value
        elif varType == UINT64:
            value = unpack('<Q', self.binaryStr[pos:pos+8])[0]
            self.pos += 8
            return value
        elif varType == VAR_INT:
            [value, nBytes] = unpackVarInt(self.binaryStr[pos:])
            self.pos += nBytes
        ....

The use case for this is something along the lines of :

def unserialize(self, toUnpack):
    if isinstance(toUnpack, BinaryUnpacker):
        buData = toUnpack
    else:  # assume string
        buData = BinaryUnpacker(toUnpack)

    self.var1    = buData.get(VAR_INT)
    self.var2    = buData.get(BINARY_CHUNK, 64)
    self.var3    = buData.get(UINT64)
    self.var4obj = AnotherClass().unserialize(buData)

Thanks so much for your help.


The copies are made when you slice a string to get a substring. For example:

[value, nBytes] = unpackVarInt(self.binaryStr[pos:])

This will create a copy of the string from index pos to the end, which can take time for a long string. It will be faster if you can determine the number of bytes you actually need before taking the substring, and then use self.binaryStr[pos:pos+nBytes], since taking a small substring is relatively fast.

Note that the time depends only on the length of the substring, so self.binaryStr[pos:pos+4] should take roughly the same amount of time regardless of the length of self.binaryStr.


I did not look at your code in depth, but types that expose a buffer() method (such as strings) can be accessed with memoryview objects without having to copy the data. Here's the relevant documentation for it.

You could use a memoryview object instead of slicing the string: this way you would bypass the time-consuming passage of your current code.

A few days ago I asked a question about this that perhaps could be useful to you.


I don't think judging simply by speed is proper. You said you can tell that the string is being copied because if you break it into smaller chunks it runs much faster. But the running time of unpack() function which you didn't give detail about could also depend on the data size.

Besides, slicing a string such as

unpack('<I', self.binaryStr[pos:pos+4])[0]

will create new string objects since strings are immutable objects.

0

精彩评论

暂无评论...
验证码 换一张
取 消