开发者

Python: Putting specific lines of a file into a list

开发者 https://www.devze.com 2023-02-25 04:57 出处:网络
Greetings, i got into the following problem: Given a file of the following structure: \'>some cookies

Greetings,

i got into the following problem:

Given a file of the following structure:

'>some cookies  
chocolatejelly  
peanutbuttermacadamia  
doublecoconutapple开发者_JAVA技巧  
'>some icecream  
cherryvanillaamaretto  
peanuthaselnuttiramisu  
bananacoffee  
'>some other stuff  
letsseewhatfancythings  
wegotinhere  

Aim: putting in all entries after every line containing '>' into a list as a single string

Code:

def parseSequenceIntoDictionary(filename):
    lis=[]
    seq=''
    with open(filename, 'r') as fp:
        for line in fp:
            if('>' not in line):
                seq+=line.rstrip()
            elif('>' in line):
                lis.append(seq)
                seq=''
        lis.remove('')
        return lis

So this function goes through each line of the file if there is not the occurrence of an '>' it concatenates all following lines and removes the '\n', if an '>' occurs, it automatically appends the concatenated string to the list and 'clears' the string 'seq' for concatenating the next sequence

The problem: To take the example of an input file, it only puts the stuff from 'some cookies' and 'some icecream' into the list - but not from 'some other stuff'. So we get as an result:

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee] but not  

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee, letsseewhatfancythings 
wegotinhere]  

What is the wrong thought in here? There is some logic mistake in the iteration I may not have taken care, but I do not know where.

Thanks in advance for any hints!


The problem is that you only store the current section seq when you hit a line with '>' in it. When the file ends, you still have that section open, but you don't store it.

The simplest way to fix your program is this:

def parseSequenceIntoDictionary(filename):
    lis=[]
    seq=''
    with open(filename, 'r') as fp:
        for line in fp:
            if('>' not in line):
                seq+=line.rstrip()
            elif('>' in line):
                lis.append(seq)
                seq=''
        # the file ended
        lis.append(seq) # store the last section
        lis.remove('')
        return lis

Btw, you should use if line.startswith("'>"): to prevent a possible bug.


You only append seq to the result list if a new line with > is found. So at the end you have a filled seq (with the data you are missing), but you don't add it to the result list. So after your loop just add seq if there is some data in it and you should be fine.


my_list = []
with open('file_in.txt') as f:
    for line in f:
        if line.startswith("'>"):
            my_list.append(line.strip().split("'>")[1])

print my_list  #['some cookies', 'some icecream', 'some other stuff']


well, you can simply split on '> (if i get you correct)

>>> s="""
... '>some cookies
... chocolatejelly
... peanutbuttermacadamia
... doublecoconutapple
... '>some icecream
... cherryvanillaamaretto
... peanuthaselnuttiramisu
... bananacoffee
... '>some other stuff
... letsseewhatfancythings
... wegotinhere  """
>>> s.split("'>")
['\n', 'some cookies  \nchocolatejelly  \npeanutbuttermacadamia  \ndoublecoconutapple  \n', 'some icecream  \ncherryvanillaamaretto  \npeanuthaselnuttiramisu  \nbananacoffee  \n', 'some other stuff  \nletsseewhatfancythings  \nwegotinhere  ']
>>>


import re

def parseSequenceIntoDictionary(filename,regx = re.compile('^.*>.*$',re.M)):
    with open(filename) as f:
        for el in regx.split(f.read()):
            if el:
                yield el.replace('\n','')

print list(parseSequenceIntoDictionary('aav.txt'))
0

精彩评论

暂无评论...
验证码 换一张
取 消