开发者

Pythonic way to conditionally iterate over items in a list

开发者 https://www.devze.com 2023-02-08 15:37 出处:网络
New to programming in general, so I\'m probably going about this the wrong way. I\'m writing an lxml parser where I want to omit HTML table rows that have no content from the parser output. This is wh

New to programming in general, so I'm probably going about this the wrong way. I'm writing an lxml parser where I want to omit HTML table rows that have no content from the parser output. This is what I've got:

for row in doc.cssselect('tr'):
    for cell in row.csssele开发者_StackOverflow社区ct('td'):
        sys.stdout.write(cell.text_content() + '\t')
    sys.stdout.write '\n'

The write() stuff is temporary. What I want is for the loop to only return rows where tr.text_content != ''. So I guess I'm asking how to write what my brain thinks should be 'for a in b if a != x' but that doesn't work.

Thanks!


for row in doc.cssselect('tr'):
    cells = [ cell.text_content() for cell in row.cssselect('td') ]
    if any(cells):
        sys.stdout.write('\t'.join(cells) + '\n')

prints the line only if there is at least one cell with text content.


ReEdit:

You know, I really don't like my answer at all. I voted up the other answer but I liked his original answer because not only was it clean but self explanatory without getting "fancy" which is what I fell victim to:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        if(cel.text_content() != ''):
            #do stuff here

there's not much more of an elegant solution.

Original-ish:

You can transform the second for loop as follows:

[cell for cell in row.cssselect if cell.text_content() != '']

and turn it into a list-comprehension. That way you've got a prescreened list. You can take that even farther by looking at the following example:

a = [[1,2],[2,3],[3,4]
newList = [y for x in a for y in x]

which transforms it into [1, 2, 2, 3, 3, 4]. Then you can add in the if statement at the end to screen out values. Hence, you'd reduce that into a single line.

Then again, if you were to look at itertools:

ifilter(lambda x: x.text_content() != '', row.cssselect('td'))

produces an iterator which you can iterate over, skipping all items you don't want.

Edit:

And before I get more downvotes, if you're using python 3.0, filter works the same way. No need to import ifilter.

0

精彩评论

暂无评论...
验证码 换一张
取 消