开发者

Help with Regexp from a xml (Tcl)

开发者 https://www.devze.com 2023-01-29 15:18 出处:网络
I have an XML file. <?xml version=\"1.0\"?> <catalog> <book id=\"bk101\"> </book>

I have an XML file.

 <?xml version="1.0"?>
 <catalog>
    <book id="bk101">
    </book>
 <catalog>

I read the file and store it in file_data

 set data [split $file_data "\n"]
 foreach line $data {
 regexp { book id=\"(.*)\" } $line all dummy
 puts $all
 puts $dummy
 }

So here as you can see I am trying to read the book id and print it out. I get the error dummy not found? Am I do it wrong?

Edit

Weirdly when I开发者_StackOverflow try this :

set mydata {<book id="bk101"> testing the code }
puts $mydata

regexp {book id="(.*)"} $mydata all part
puts $all
puts $part

Output

<book id="bk101"> testing the code
book id="bk101"
bk101

Have no idea the code at the top still shows error


Don't do that (though that question is about XHTML, it is no worse than any other XML dialect in this respect; plain HTML is if anything worse). In short, XML belongs to a class of languages that REs cannot fully parse.

Instead, use tDOM to parse the XML, and XPath (supported by tDOM) to pick out the interesting parts of the document.

package require tdom

# Get the XML here by whatever method, and parse it here...
set doc [dom parse $file_data]

# Iterate over the books in the document and print their IDs
foreach book [$doc selectNodes "//book"] {
    puts "book with id=[$book @id]"
}

# Tidy up at the end...
$doc delete

Using tDOM to do XML handling is easy. It's actually easier than using REs, and it's correct too. Double win!


The spaces in the RE are significant, and you place them around the original RE where there wouldn't be any expected. If you want to parse XML though, it might be best to use tdom or TclXML.

You should check that the result of regexp returns a non-zero answer (meaning it found something), otherwise 'dummy' won't get set, or will remain as was if previously set.


To answer your specific question, you have extra spaces in your regular expression. Look closely at this line of code:

regexp { book id=\"(.*)\" }

Notice the space before the word book. That is significant. You are asking regexp to find a sequence of characters that begins with a space, the literal word 'book', another space, etc. Your pattern doesn't match, in part because ' book' does not appear in the data.


2 Points:

  1. If you are reading the data line by line, you need to check that regexp actually made a match before reading the variables
  2. Jeff is right, and you have an extra whitespace at the beginning and end of your regexp

  set data [split $file_data "\n"] 
  foreach line $data {   
    if { [regexp {book id=\"(.*)\"} $line all dummy] } {
       puts $all
       puts $dummy   
    } 
  }

Another option you might consider, if you can do without XML, and control the data file format, you can easily create a format which is human readable, and tcl readable making your life much easier

catalog {
  book {
    { id "bk101" }
  }
}

etc. This is very easy to read as a tcl list, and interpret in the program

0

精彩评论

暂无评论...
验证码 换一张
取 消