Help with Regexp from a xml (Tcl)_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-29 15:18 出处：网络

I have an XML file. <?xml version=\"1.0\"?> <catalog> <book id=\"bk101\"> </book>

相关专题：regex tcl xml

I have an XML file.

 <?xml version="1.0"?>
 <catalog>
    <book id="bk101">
    </book>
 <catalog>

I read the file and store it in file_data

 set data [split $file_data "\n"]
 foreach line $data {
 regexp { book id=\"(.*)\" } $line all dummy
 puts $all
 puts $dummy
 }

So here as you can see I am trying to read the book id and print it out. I get the error dummy not found? Am I do it wrong?

Edit

Weirdly when I开发者_StackOverflow try this :

set mydata {<book id="bk101"> testing the code }
puts $mydata

regexp {book id="(.*)"} $mydata all part
puts $all
puts $part

Output

<book id="bk101"> testing the code
book id="bk101"
bk101

Have no idea the code at the top still shows error

Don't do that (though that question is about XHTML, it is no worse than any other XML dialect in this respect; plain HTML is if anything worse). In short, XML belongs to a class of languages that REs cannot fully parse.

Instead, use tDOM to parse the XML, and XPath (supported by tDOM) to pick out the interesting parts of the document.

package require tdom

# Get the XML here by whatever method, and parse it here...
set doc [dom parse $file_data]

# Iterate over the books in the document and print their IDs
foreach book [$doc selectNodes "//book"] {
    puts "book with id=[$book @id]"
}

# Tidy up at the end...
$doc delete

Using tDOM to do XML handling is easy. It's actually easier than using REs, and it's correct too. Double win!

The spaces in the RE are significant, and you place them around the original RE where there wouldn't be any expected. If you want to parse XML though, it might be best to use tdom or TclXML.

You should check that the result of regexp returns a non-zero answer (meaning it found something), otherwise 'dummy' won't get set, or will remain as was if previously set.

To answer your specific question, you have extra spaces in your regular expression. Look closely at this line of code:

regexp { book id=\"(.*)\" }

Notice the space before the word book. That is significant. You are asking regexp to find a sequence of characters that begins with a space, the literal word 'book', another space, etc. Your pattern doesn't match, in part because ' book' does not appear in the data.

2 Points:

If you are reading the data line by line, you need to check that regexp actually made a match before reading the variables
Jeff is right, and you have an extra whitespace at the beginning and end of your regexp


  set data [split $file_data "\n"] 
  foreach line $data {   
    if { [regexp {book id=\"(.*)\"} $line all dummy] } {
       puts $all
       puts $dummy   
    } 
  }

Another option you might consider, if you can do without XML, and control the data file format, you can easily create a format which is human readable, and tcl readable making your life much easier

catalog {
  book {
    { id "bk101" }
  }
}

etc. This is very easy to read as a tcl list, and interpret in the program

Help with Regexp from a xml (Tcl)

精彩评论

关注公众号

热门标签

图文推荐

Help with Regexp from a xml (Tcl)

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：