开发者

Equivalent of named tuple in NumPy?

开发者 https://www.devze.com 2023-04-06 10:02 出处:网络
Is it possible to create a NumPy object that behaves very much like a c开发者_如何转开发ollections.namedtuple, in the sense that elements can be accessed like so:

Is it possible to create a NumPy object that behaves very much like a c开发者_如何转开发ollections.namedtuple, in the sense that elements can be accessed like so:

data[1] = 42
data['start date'] = '2011-09-20'  # Slight generalization of what is possible with a namedtuple

I tried to use a complex data type:

>>> data = numpy.empty(shape=tuple(), dtype=[('start date', 'S11'), ('n', int)])

This creates a 0-dimensional value with a kind of namedtuple type; it almost works:

>>> data['start date'] = '2011-09-20'
>>> data
array(('2011-09-20', -3241474627884561860), 
      dtype=[('start date', '|S11'), ('n', '<i8')])

However, element access does not work, because the "array" is 0-dimensional:

>>> data[0] = '2011-09-20'
Traceback (most recent call last):
  File "<ipython-input-19-ed41131430b9>", line 1, in <module>
    data[0] = '2011-09-20'
IndexError: 0-d arrays can't be indexed.

Is there a way of obtaining the desired behavior described above (item assignment through both a string and an index) with a NumPy object?


You can do something like this using the numpy.rec module. What you need is the record class from this module, but I don't know how to directly create an instance of such a class. One indrect way is to first create a recarray with a single entry:

>>> a = numpy.recarray(1, names=["start date", "n"], formats=["S11", "i4"])[0]
>>> a[0] = "2011-09-20"
>>> a[1] = 42
>>> a
('2011-09-20', 42)
>>> a["start date"]
'2011-09-20'
>>> a.n
42

If you figure out how to create an instance of record directly, please let me know.


(edited as EOL's recommended to be more specific in answering the question.)

create 0-dim array (I didn't find a scalar constructor either.)

>>> data0 = np.array(('2011-09-20', 0), dtype=[('start date', 'S11'), ('n', int)])
>>> data0.ndim
0

access element in 0-dim array

>>> type(data0[()])
<class 'numpy.void'>
>>> data0[()][0]
b'2011-09-20'
>>> data0[()]['start date']
b'2011-09-20'

>>> #There is also an item() method, which however returns the element as python type
>>> type(data0.item())
<class 'tuple'>

I think the easiest is to think of structured arrays (or recarrays) as list or arrays of tuples, and indexing works by name which selects column and by integers which selects rows.

>>> tupleli = [('2011-09-2%s' % i, i) for i in range(5)]
>>> tupleli
[('2011-09-20', 0), ('2011-09-21', 1), ('2011-09-22', 2), ('2011-09-23', 3), ('2011-09-24', 4)]
>>> dt = dtype=[('start date', '|S11'), ('n', np.int64)]
>>> dt
[('start date', '|S11'), ('n', <class 'numpy.int64'>)]

zero dimensional array, element is tuple, i.e. one record, changed: is not a scalar element, see at end

>>> data1 = np.array(tupleli[0], dtype=dt)
>>> data1.shape
()
>>> data1['start date']
array(b'2011-09-20', 
      dtype='|S11')
>>> data1['n']
array(0, dtype=int64)

array with one element

>>> data2 = np.array([tupleli[0]], dtype=dt)
>>> data2.shape
(1,)
>>> data2[0]
(b'2011-09-20', 0)

1d array

>>> data3 = np.array(tupleli, dtype=dt)
>>> data3.shape
(5,)
>>> data3[2]
(b'2011-09-22', 2)
>>> data3['start date']
array([b'2011-09-20', b'2011-09-21', b'2011-09-22', b'2011-09-23',
       b'2011-09-24'], 
      dtype='|S11')
>>> data3['n']
array([0, 1, 2, 3, 4], dtype=int64)

direct indexing into a single record, same as in EOL's example that I didn't know it works

>>> data3[2][1]
2
>>> data3[2][0]
b'2011-09-22'

>>> data3[2]['n']
2
>>> data3[2]['start date']
b'2011-09-22'

trying to understand EOL's example: scalar element and zero-dimensional array are different

>>> type(data1)
<class 'numpy.ndarray'>
>>> type(data1[()])   #get element out of 0-dim array
<class 'numpy.void'>

>>> data1[0]
Traceback (most recent call last):
  File "<pyshell#98>", line 1, in <module>
    data1[0]
IndexError: 0-d arrays can't be indexed
>>> data1[()][0]
b'2011-09-20'

>>> data1.ndim
0
>>> data1[()].ndim
0

(Note: I typed the example in an open python 3.2 interpreter by accident, so there is a b'...')


OK, I found a solution, but I would love to see a more elegant one:

data = numpy.empty(shape=1, dtype=[('start date', 'S11'), ('n', int)])[0]

creates a 1-dimensional array with a single element and gets the element. This makes accessing elements work with both strings and numerical indices:

>>> data['start date'] = '2011-09-20'  # Contains a space: more flexible than a namedtuple!
>>> data[1] = 123
>>> data
('2011-09-20', 123)

It would be nice if there was a way of directly constructing data, without having to first create an array with one element and extracting this element. Since

>>> type(data)
<type 'numpy.void'>

I'm not sure what NumPy constructor could be called… (there is no docstring for numpy.void).


This is nicely implemented by "Series" in the Pandas package.

For example from the tutorial:

>>> from pandas import *
>>> import numpy as np
>>> s = Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
>>> s
a    -0.125628696947
b    0.0942011098937
c    -0.71375003803
d    -0.590085433392
e    0.993157363933
>>> s[1]
0.094201109893723267
>>> s['b']
0.094201109893723267

I've just been playing around with this for a few days, but it looks like it has a lot to offer.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号