开发者

specify dtype of each object in a python numpy array

开发者 https://www.devze.com 2023-01-16 05:04 出处:网络
This is a similar question using dtypes in a list The following snippet creates a \"typical test array\", the purpose of this array is to test an assortment of things in my program. Is there a way
  • This is a similar question using dtypes in a list

The following snippet creates a "typical test array", the purpose of this array is to test an assortment of things in my program. Is there a way or is it even possible to change the type of elements in an array?

import numpy as np
import random
from random import uniform, randrange, choice

# ... bunch of silly code ...

def gen_test_array( ua, low_inc, med_inc, num_of_vectors ):
  #typical_array = [ zone_id, ua, inc, veh, pop, hh, with_se, is_cbd, re, se=0, oe]
  typical_array = np.zeros( shape = ( num_of_vectors, 11 ) )

  for i in range( 0, num_of_vectors ):
    typical_array[i] = [i, int( ua ), uniform( low_inc / 2, med_inc * 2 ), uniform( 0, 6 ),
                        randrange( 100, 5000 ), randrange( 100, 500 ),
                        choice( [True, False] ), choice( [True, False] ),
                        randrange( 100, 5000 ), randrange( 100, 5000 ),
                        randrange( 100, 5000 ) ]

开发者_Python百科  return typical_array


The way to do this in numpy is to use a structured array.

However, in many cases where you're using heterogeneous data, a simple python list is a much better choice. (Or, though it wasn't widely available when this answer was written, a pandas.DataFrame is absolutely ideal for this scenario.)

Regardless, the example you gave above will work perfectly as a "normal" numpy array. You can just make everything a float in the example you gave. (Everything appears to be an int, except for two columns of floats... The bools can easily be represented as ints.)

Nonetheless, to illustrate using structured dtypes...

import numpy as np

ua = 5 # No idea what "ua" is in your code above...
low_inc, med_inc = 0.5, 2.0 # Again, no idea what these are...

num = 100
num_fields = 11

# Use more descriptive names than "col1"! I'm just generating the names as placeholders
dtype = {'names':['col%i'%i for i in range(num_fields)],
                 'formats':2*[np.int] + 2*[np.float] + 2*[np.int] + 2*[np.bool] + 3*[np.int]}
data = np.zeros(num, dtype=dtype)

# Being rather verbose...
data['col0'] = np.arange(num, dtype=np.int)
data['col1'] = int(ua) * np.ones(num)
data['col2'] = np.random.uniform(low_inc / 2, med_inc * 2, num)
data['col3'] = np.random.uniform(0, 6, num)
data['col4'] = np.random.randint(100, 5000, num)
data['col5'] = np.random.randint(100, 500, num)
data['col6'] = np.random.randint(0, 2, num).astype(np.bool)
data['col7'] = np.random.randint(0, 2, num).astype(np.bool)
data['col8'] = np.random.randint(100, 5000, num)
data['col9'] = np.random.randint(100, 5000, num)
data['col10'] = np.random.randint(100, 5000, num)

print data

Which yields a 100-element array with 11 fields:

array([ (0, 5, 2.0886534380436226, 3.0111285613794276, 3476, 117, False, False, 4704, 4372, 4062),
       (1, 5, 2.0977199579338115, 1.8687472941590277, 4635, 496, True, False, 4079, 4263, 3196),
       ...
       ...
       (98, 5, 1.1682309811443277, 1.4100766819689299, 1213, 135, False, False, 1250, 2534, 1160),
       (99, 5, 1.746554619056416, 5.210411489007637, 1387, 352, False, False, 3520, 3772, 3249)], 
      dtype=[('col0', '<i8'), ('col1', '<i8'), ('col2', '<f8'), ('col3', '<f8'), ('col4', '<i8'), ('col5', '<i8'), ('col6', '|b1'), ('col7', '|b1'), ('col8', '<i8'), ('col9', '<i8'), ('col10', '<i8')])


Quoting the first line of chapter 1 of the NumPy reference:

NumPy provides an N-dimensional array type, the ndarray, which describes a collection of “items” of the same type.

So every member of the array has to be of the same type. The loss of generality here, as compared to regular Python lists, is the trade-off that allows high speed operations on arrays: loops can run without testing the type of each member.

0

精彩评论

暂无评论...
验证码 换一张
取 消