Python operating on multiple data arrays in a rolling window_问答_开发者

consider the following code:

class MyClass(object):

    def __init__(self):

        self.data_a = np.array(range(100))
        self.data_b = np.array(range(100,200))
        self.data_c = np.array(range(200,300))

    def _method_i_do_not_have_access_to(self, data, window, func):

        output = np.empty(np.size(data))

        for i in xrange(0, len(data)-window+1):
            output[i] = func(data[i:i+window])

        output[-window+1:] = np.nan

        return output

    def apply_a(self):

        a = self.data_a

        def _my_func(val):
            return sum(val)

        return self._method_i_do_not_have_access_to(a, 5, _my_func)

my_class = MyClass()
print my_class.apply_a()

The _method_i_do_not_have_access_to method takes a numpy array, a window parameter, and a user-defined function handle and returns an array containing values output by the function handle on window data points at a time of the input data array - a generic rolling method. I do not have access to changing this method.

As you can see, _method_i_do_not_have_access_to passes one input to the function handle which is the data array passed to _method_i_do_not_have_access_to. That function handle only computes output based window data points on the one data array passed to it through _method_i_do_not_have_access_to.

What I need to do is allow _my_func (the function handle passed to _method_i_do_not_have_access_to) to operate on data_b and data_c in addition to the array that is passed to _my_func through _method_i_do_not_have_access_to at the same window indexes. data_b and data_c are defined globally in the MyClass class.

The only way I have thought of doing this is including references to data_b and data_c within _my_func like this:

def _my_func(val):
    b = self.data_b
    c = self.data_c
    # do some calculations
    return sum(val)

However, I need to slice b and c at the same indexes as val (remember val is the length-window slice of the array that is passed through _method_i_do_not_have_access_to).

For example, if the loop within _method_i_do_not_have_access_to is currently operating on indexes 45 -> 50 on the input array, _my_func has to be operating on the same indexes on b and c.

The final result would be something like this:

def _my_func(val):

    b = self.data_b # somehow identify which slide we are at
    c = self.data_c # som开发者_运维技巧ehow identify which slide we are at

    # if _method_i_do_not_have_access_to is currently
    # operating on indexes 45->50, then the sum of 
    # val, b, and c should be the sum of the values at
    # index 45->50 at each

    return sum(val) * sum(b) + sum(c)

Any thoughts on how I might accomplish this?

The question is how would _my_func know on which indizes to operate? If you know the indizes in advance when calling your function, the simplest approach would be just using a lambda: lambda val: self._my_func(self.a, self.b, index, val) with _my_func obviously changed to accommodate the additional parameters.

Since you don't know the indizes, you'll have to write a wrapper around self.c that remembers which index was last accessed (or better yet catches the slice operator) and stores this in a variable for your function to use..

Edit: Knocked up a small example, not especially great coding style and all, but should give you the idea:

class Foo():
    def __init__(self, data1, data2):
        self.data1 = data1
        self.data2 = data2
        self.key = 0      

    def getData(self):
        return Foo.Wrapper(self, self.data2)

    def getKey(self):
        return self.key

    class Wrapper():
        def __init__(self, outer, data):
            self.outer = outer
            self.data = data

        def __getitem__(self, key):
            self.outer.key = key
            return self.data[key]

if __name__ == '__main__':
    data1 = [10, 20, 30, 40]
    data2 = [100, 200, 300, 400]
    foo = Foo(data1, data2)
    wrapped_data2 = foo.getData()
    print(wrapped_data2[2:4])
    print(data1[foo.getKey()])

you can pass a two dimension array to _method_i_do_not_have_access_to(). len() and slice operation will work with it:

In [29]: a = np.arange(100)
In [30]: b = np.arange(100,200)
In [31]: c = np.arange(200,300)
In [32]: data = np.c_[a,b,c] # make your three one dimension array to one two dimension array.

In [35]: data[0:10] # slice operation works.
Out[35]:
array([[  0, 100, 200],
       [  1, 101, 201],
       [  2, 102, 202],
       [  3, 103, 203],
       [  4, 104, 204],
       [  5, 105, 205],
       [  6, 106, 206],
       [  7, 107, 207],
       [  8, 108, 208],
       [  9, 109, 209]])

In [36]: len(data) # len() works.
Out[36]: 100

In [37]: data.shape
Out[37]: (100, 3)

so you can define your _my_func as follows:

def _my_func(val):
    s = np.sum(val, axis=0)
    return s[0]*s[1] + s[2]

Since it appears that _method_i_do_not.. is simply applying your function to your data, could you have the data be precisely an array of indices? Then func would use the indices for windowed access to data_a, data_b, and data_c. There might be faster ways, but I think this would work with a minimum of added complexity.

So in other words, something roughly like this, with additional processing on window added if necessary:

def apply_a(self):

    a = self.data_a
    b = self.data_b
    c = self.data_c

    def _my_func(window):
        return sum(a[window]) * sum(b[window]) + sum(c[window])

    return self._method_i_do_not_have_access_to(window_indices, 5, _my_func)

Here's a hack:

Make a new class DataProxy that has a __getitem__ method, and proxies the three data arrays (which you can pass to it e.g. on initialisation). Make func act onDataProxy instances instead of standard numpy arrays, and pass the modified func and the proxy in to the inaccessible method.

Does that make sense? The idea is that there's no constraint on data to be an array, just to be subscriptable. So you can make a custom subscriptable class to use instead of an array.

Example:

class DataProxy:
    def __init__(self, *data):
        self.data = list(zip(*data))

    def __getitem__(self, item):
        return self.data[item]

Then create a new DataProxy, passing in as many arrays as you want when you do so, and make func accept the results of indexing said instance. Try it!