开发者

awk getline skipping to last line -- possible newline character issue

开发者 https://www.devze.com 2023-04-13 02:28 出处:网络
I\'m using while( (getline line < \"filename\")> 0 ) within my BEGIN statement, but this while loop only seems to read the last line of the file instead of each line. I think it may be a newli

I'm using

while( (getline line < "filename") > 0 )

within my BEGIN statement, but this while loop only seems to read the last line of the file instead of each line. I think it may be a newline character problem, but really I don't know. Any ideas?

I'm trying to read the data in from a file other than the main input file.

The same syntax actually works for one file, but not another, and the only difference I see is that the one for which it DOES work has "^M" at the end of each line when I look at it in Vim, and the one for which it DOESN'开发者_高级运维T work doesn't have ^M. But this seems like an odd problem to be having on my (UNIX based) Mac.

I wish I understood what was going with getline a lot better than I do.


You would have to specify RS to something more vague. Here is a ugly hack to get things working

RS="[\x0d\x0a\x0d]"

Now, this may require some explanation. Diffrent systems use difrent ways to handle change of line. Read http://en.wikipedia.org/wiki/Carriage_return and http://en.wikipedia.org/wiki/Newline if you are interested in it.

Normally awk hadles this gracefully, but it appears that in your enviroment, some files are being naughty. 0x0d or 0x0a or 0x0d 0x0a (CR+LF) should be there, but not mixed.

So lets try a example of a mixed data stream

$ echo -e "foo\x0d\x0abar\x0d\x0adoe\x0arar\x0azoe\x0dqwe\x0dtry" |awk 'BEGIN{while((getline r )>0){print "r=["r"]";}}'

Result:

r=[foo]
r=[bar]
r=[doe]
r=[rar]
try]oe

We can see that the last lines are lost. Now using the ugly hack to RS

$ echo -e "foo\x0d\x0abar\x0d\x0adoe\x0arar\x0azoe\x0dqwe\x0dtry" |awk 'BEGIN{RS="[\x0d\x0a\x0d]";while((getline r )>0){print "r=["r"]";}}'

Result:

r=[foo]
r=[bar]
r=[doe]
r=[rar]
r=[zoe]
r=[qwe]
r=[try]

We can see every line is obtained, reguardless of the 0x0d 0x0a junk :-)


Maybe you should preprocess your input file with for example dos2unix (http://sourceforge.net/projects/dos2unix/) utility?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号