开发者

matrix holes comprehension

开发者 https://www.devze.com 2023-04-11 10:42 出处:网络
This is an offshoot of a previous question which started to snowball.If I have a matrix A and I want to use the mean/average of each row [1:] values to create another matrix B, but keep the row headin

This is an offshoot of a previous question which started to snowball. If I have a matrix A and I want to use the mean/average of each row [1:] values to create another matrix B, but keep the row headings intact, this list comprehension works.

from operator import mul,len

# matrix A with row headings and values
A = [('Apple',0.95,0.99,0.89,0.87,0.93),
('Bear',0.33,0.25.0.85,0.44,0.33),
('Crab',0.55,0.55,0.10,0.43,0.22)]

#List Comprehension
def average(lst):
    return sum(lst) / len(lst)

B = [(a[0], average(a[1:])) for a in A]

Expected outcome

B = [('Apple', 0.926), ('Bear', 0.44), ('Crab', 0.37)]

However, if the dataset has holes in it (symbolized by 'x'), the analysis won't run, i.e.

# matrix A with row headings and values
A = [('Apple',0.95,x,0.89,0.87,0.93),
('Bear',0.33,0.25.0.85,0.44,0.33),
('Crab',x,0.55,0.10,x,0.22)]

In a matrix where the relative placement of each row and column means something, I can't just delete 开发者_如何学运维the "blank" entries, so how can I fill or skip over them and make this work, again? In retrospect, my data has more holes than an old bed sheet.

Also, how would I introduce the filters suggested below into the following definitions (which choke when they hit something that isn't a number) so that hitting an 'X' value would return another 'X' value?

    def plus(matrix, i):
        return [row[i] for row in matrix]

    def minus(matrix, i):
        return [1.00-row[i] for row in matrix]


Try this:

B = [(a[0], average(filter(lambda elt: elt != x, a[1:]))) for a in A]

Performance could be improved by using ifilter from itertools, especially for large matrices. This should give you the expected result without changing the average function or modifying A.


EDIT

You may want to consider implementing your matrix differently if it is sparse. If you want to keep your current implementation, you should use the value None to represent missing values. This is the Python equivalent to null that you may be familiar with from other languages.

How you implement the matrix drastically changes how you implement the functions you want, and I'll try to cover your way and an alternate method that could be more efficient for sparse matrices.

For both I'll use your example matrix with holes:

# matrix A with row headings and values
A = [('Apple',0.95, x,    0.89, 0.87, 0.93),
     ('Bear', 0.33, 0.25, 0.85, 0.44, 0.33),
     ('Crab', x,    0.55, 0.10, x,    0.22)]

List of lists (or tuples, or whatever)

Like I said before, use None for an empty value:

A = [('Apple', 0.95, None, 0.89, 0.87, 0.93),
     ('Bear',  0.33, 0.25, 0.85, 0.44, 0.33),
     ('Crab',  None, 0.55, 0.10, None, 0.22)]

B is similar to what I posted earlier:

B = [(a[0], average(filter(lambda x: x is not None, a[1:]))) for a in A]

Define column as a generator (iterable) that returns only the filled values:

def column(M, i):
    i += 1 # this will allow you to use zero-based indices if you want
    return (row[i] for row in M if row[i] is not None)

Then you can implement minus more easily and efficiently:

from operator import sub
from itertools import imap, repeat

def minus(M, i):
    return list(imap(sub, repeat(1.0), column(M, i)))

Dictionaries

Another way to represent your matrix is with Python dicts. There are some advantages here, especially that you don't waste storage space if you have a lot of holes in the matrix. A con to this method is that it can be more of a pain to create the matrix depending on how you construct it.

Your example might become (whitespace for clarity):

A = [('Apple', dict([(0, 0.95),            (2, 0.89), (3, 0.87), (4, 0.93)])),
     ('Bear',  dict([(0, 0.33), (1, 0.25), (2, 0.85), (3, 0.44), (4, 0.33)])),
     ('Crab',  dict([           (1, 0.55), (2, 0.10),            (4, 0.22)]))]

This is an ugly way to construct it for sure, but if you are constructing the matrix from other data with a loop it can be a lot nicer.

Now,

B = [(a[0], sum(a[1].itervalues())/len(a[1])) for a in A2]

This is uglier than it should be but I'm not so good at Python and I can't get it to do exactly what I want...

You can define a column function which returns a generator that will be more efficient than a list comprehension:

def column(M, i):
    return (row[1][i] for row in M if i in row[1]) 

minus is done exactly as in the other example.


I have a feeling that there is something I'm not getting about what you want, so feel free to let me know what needs fixing. Also, my lack of Python codez probably didn't do the dictionary version justice, but it can be efficient for sparse matrices. This whole example would be easier if you created a matrix class, then you could switch implementations and see which is better for you. Good luck.


This doesn't work because x is not necessarily a number (you don't tell us what it is either).

So you probably have to write your own summing function that checks whether an item is an x or something else (maybe you'll have to use isinstance(element, int) or isinstance(element, float)).


In average(), use a loop to remove all x values from the list with lst.remove(x) before you calculate the average (you'll have to catch the error remove() generates when it can't find what it's looking for).

I recommend using something like "" for representing holes, unless you have something made up already.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号