Reading file in a pattern using awk_问答_开发者

开发者 https://www.devze.com 2023-01-05 19:59 出处：网络

I have an input file in following manner <td> Name1 </td> <td> <span class=\"test\"><a href=\"url1\">Link </a></span&开发者_StackOverflow社区gt;</td>

相关专题：

I have an input file in following manner

<td> Name1 </td>
<td> <span class="test"><a href="url1">Link </a></span&开发者_StackOverflow社区gt;</td>
<td> Name2 </td>
<td> <span class="test"><a href="url2">Link </a></span></td>

I want a awk script to read this file and output in following manner

url1 Name1
url2 Name2

Can anyone help me out in this trivial looking problem? Thanks.

~~Extracting one href per is relatively simple, so long as they conform to XHTML standards and there is only at most one on a line and you don't care about enclosing tags, but perl is easier:~~

~~$ perl -ne 'print "$1\n" if /href="([^"]+)"/'~~

If you care about enclosing tags or they are not standard conformant, you cannot use regular expressions to parse HTML. It is impossible.

added: oops, you do care about context, forget about regexps and use a real HTML parser

Here is an awk script that does the job

awk '
/a href=\".*\"/ { sub( /^.*a href=\"/,"" ); sub(/\".*/,"");  print $0, name }
                { name = $2 }
'

this might work:

awk 'BEGIN
     {i=1}{line[i++]=$0}
     END
     {
      j=1; 
      while (j<i) 
      {print line[j+1] line[j]; j+=2}
     }' yourfile|awk '{print substr($4,7,length($4)-6),$6}'

gawk '/^<td>/ {n = $2; getline; print gensub(/.*href="([^"]*).*/,"\\1",1), n}' infile

url1 Name1
url2 Name2

awk 'BEGIN{RS="></td>\n"; FS="> | </|\""}{print $7, $2}' infile

every 2 lines as a record.

Reading file in a pattern using awk

精彩评论

关注公众号

热门标签

图文推荐

Reading file in a pattern using awk

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：