I have two file开发者_运维问答s. One is a csv and contains the search strings (one per line) and the other is a huge file which contains the search term at the start of each line but has extra information after which I would like to extract.
The search terms file is called 'search.csv' and looks like this:
3ksr
3ky8
2g5w
2gou
The file containing the other info is called 'CSA.txt' and looks like this:
3ksr,INFO.....
3ky8,INFO.....
2g5w,INFO.....
2gou,INFO.....
However, it is a very big file (over 8mb) and each search term has more than one occurence but the information is different for every occurence. I have some sample code:
import fileinput
import csv
csa = fileinput.input("CSA.dat", inplace=1)
pdb = csv.reader(open("search.csv"))
outfile = csv.writer(open("outfile.csv"), dielect = 'excel', delimiter = '\t')
for id in pdb:
for line in csa:
if id in str(line):
outfile.writerow([id, line])
csa.close()
However, this code doesnt work and seems to delete CSA.dat every time I try and run it (its backed up in an archive), or it says 'Text file busy'. Please help! Thanks in advance!
Depending on how many search terms you have, and assuming they're all 4 characters:
terms = open('search.csv').split(',')
with open('CSV.dat', 'r') as f:
for line in f:
if line[:4] in terms:
#do something with line
print line
if they're not 4 chars you can do line[:line.find(',')]
that will return either up to the first ',', or if that's not found it will return the entire line.
edit: I had never heard of fileinput, but I just looked at it and "you're doing it wrong."
Helper class to quickly write a loop over all standard input files.
fileinput
is for passing files to your program as command line arguments, which you're not doing. open(filename, mode)
is how you open files in Python.
And for something that (seems) this simple, the csv reader is overkill, though it's probably worth using to write your file if you really want it in an excel format.
It appears that the deletion of CSA.dat happens because you say inplace=1 in the fileinput constructor.
精彩评论