I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:
20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000
I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines for example are from 2009, january 18th). How can I split this file into pieces according to the date?
The number of entries per date differs, so using split with a constant number won't wor开发者_开发知识库k.
Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?
Thanks in advance, Alex
Using awk:
awk '{print  > "data"substr($1,0,8)".dat"}' myfile
This should work if the items are in date sequence:
date=20090101 # Change to the earliest date
while IFS= read -rd $'\n' line
do
    if [ "$(echo "$line" | cut -d ' ' -f 1 | cut -c 1-8)" -eq $date ]
    then
        echo "$line" >> "$date.dat"
    else
        let date++
    fi
done < log.dat
With the caveats that each day needs to have more than 1 record, and that the output file will have blank lines:
uniq --all-repeated=separate -w8 file | csplit -s - '/^$/' '{*}'
We really should have an option to uniq to output even uniq records. Also csplit should have an option to suppress the matched line.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论