开发者

Split file in blocks with counter

开发者 https://www.devze.com 2023-03-31 20:04 出处:网络
The following awk one-liner allows me to split a file according to the character at position 22: awk -v pdb=\"${file}\" -F \"\" \'{close(c);c=$22}{print > pdb\"_\"c\".pdb\"}\' ${file}.1tmp

The following awk one-liner allows me to split a file according to the character at position 22:

awk -v pdb="${file}" -F "" '{close(c);c=$22}{print > pdb"_"c".pdb"}' ${file}.1tmp 

My files are of the type:

ATOM   8911  N   SER W   1      -5.412  94.401  12.569  1.00137.46           N  
ATOM   8912  CA  SER W   1      -4.093  93.709  12.370  1.00137.35           C  
ATOM   8913  C   SER W   1      -3.115  93.771  13.604  1.00137.27           C  
ATOM   8914  O   SER W   1      -2.023  93.177  13.570  1.00137.22           O  
ATOM   8915  CB  SER W   1      -3.417  94.212  11.063  1.00137.29           C  
ATOM      1  N   ASP X   7      70.244 176.432 -72.598  1.00121.87           N  
ATOM      2  CA  ASP X   7      70.164 177.938 -72.649  1.00122.11           C  
ATOM      3  C   ASP X   7      68.705 178.495 -72.843  1.001开发者_运维百科21.38           C  
ATOM      4  O   ASP X   7      68.482 179.724 -72.941  1.00121.16           O  
ATOM      5  CB  ASP X   7      71.128 178.442 -73.745  1.00122.87           C  
ATOM   5143  N   ASP W   7     -68.623 209.141 -11.831  1.00118.10           N  
ATOM   5144  CA  ASP W   7     -67.698 209.756 -12.845  1.00118.36           C  
ATOM   5145  C   ASP W   7     -66.378 210.288 -12.223  1.00118.02           C  
ATOM   5146  O   ASP W   7     -65.657 211.116 -12.802  1.00118.06           O  
ATOM   5147  CB  ASP W   7     -68.436 210.840 -13.657  1.00118.67           C  

However, the script copies all lines with a W at the 22nd position in the same file even if they are in non-contiguous blocks. I would like to split the file in blocks so that the first contiguous block containing W (or whatever other character) will be named W1 and the second W2 and so on. Can this be easily done with awk or should I go for a loop with a counter or something like that?


awk -v pdb="${file}" 'BEGIN{f=1} NR==1{n=$5;s[$5]=f} $5!=n{s[$5]=f++ ;n=$5} { print > pdb"_"$5"_"s[$5]".txt" }' ${file}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号