I want to get a list of the data contained in a histogram bin. I am using numpy, and Matplotlib. I know how to traverse the data and check the bin edges. Howe开发者_C百科ver, I want to do this for a 2D histogram and the code to do this is rather ugly. Does numpy have any constructs to make this easier?
For the 1D case, I can use searchsorted(). But the logic is not that much better, and I don’t really want to do a binary search on each data point when I don’t have to.
Most of the nasty logic is due to the bin boundary regions. All regions have boundaries like this: [left edge, right edge). Except the last bin, which has a region like this: [left edge, right edge].
Here is some sample code for the 1D case:
import numpy as np
data = [0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3]
hist, edges = np.histogram(data, bins=3)
print 'data =', data
print 'histogram =', hist
print 'edges =', edges
getbin = 2  #0, 1, or 2
print '---'
print 'alg 1:'
#for i in range(len(data)):
for d in data:
    if d >= edges[getbin]:
        if (getbin == len(edges)-2) or d < edges[getbin+1]:
            print 'found:', d
        #end if
    #end if
#end for
print '---'
print 'alg 2:'
for d in data:
    val = np.searchsorted(edges, d, side='right')-1
    if val == getbin or val == len(edges)-1:
        print 'found:', d
    #end if
#end for
Here is some sample code for the 2D case:
import numpy as np
xdata = [0, 1.5, 1.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \
         0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3]
ydata = [0, 5,5, 5, 5, 5, \
         15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \
         25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30]
xbins = 3
ybins = 3
hist2d, xedges, yedges = np.histogram2d(xdata, ydata, bins=(xbins, ybins))
print 'data2d =', zip(xdata, ydata)
print 'hist2d ='
print hist2d
print 'xedges =', xedges
print 'yedges =', yedges
getbin2d = 5  #0 through 8
print 'find data in bin #', getbin2d
xedge_i = getbin2d % xbins
yedge_i = int(getbin2d / xbins) #IMPORTANT: this is xbins
for x, y in zip(xdata, ydata):
    # x and y left edges
    if x >= xedges[xedge_i] and y >= yedges[yedge_i]:
        #x right edge
        if xedge_i == xbins-1 or x < xedges[xedge_i + 1]:
            #y right edge
            if yedge_i == ybins-1 or y < yedges[yedge_i + 1]:
                print 'found:', x, y
            #end if
        #end if
    #end if
#end for
Is there a cleaner / more efficient way to do this? It seems like numpy would have something for this.
digitize, from core NumPy, will give you the index of the bin to which each value in your histogram belongs:
import numpy as NP
A = NP.random.randint(0, 10, 100)
bins = NP.array([0., 20., 40., 60., 80., 100.])
# d is an index array holding the bin id for each point in A
d = NP.digitize(A, bins)     
how about something like:
data = numpy.array([0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3])
hist, edges = numpy.histogram(data, bins=3)
for l, r in zip(edges[:-1], edges[1:]):
   print(data[(data > l) & (data < r)]) 
Out:
[ 0.5]
[ 1.5  1.5  1.5]
[ 2.5  2.5  2.5]
with a bit of code to handle the edge cases.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论