I have a Java SAXparser that downloads and parses, using parse(new InputSource(conn.getInputStream())). Unfortunately, sometimes it gives error when downloading a site's xml: "XML or text declaration not at start of entity" Apparently this is bad xml, declaration has to be first:
<!DOCTYPE ... stuff here ...>
<?xml  ... stuff here ...?>
Unfortunately, there doesn't seem to be any way to ignore this error. I suppose I could download the entire xml, then use regex or something to fix this, then parse it, but it seems this wouldn't have the benefit of parsing as i开发者_运维问答t's downloading? Is there a way to replace it while it's parsing?
Easy solution: read the first line from the stream, consuming those bytes, and then pass it to the parser.
Proper Java solution: create an intermediate stream interface that wraps any kind of stream and offers a SAX parser compatible stream in return. Then create a class implementing that interface specifically for your case.
That way, you can detect the problematic header before it ever reaches the SAX parser.
Edit: I would just use the Apache commons XML parser, or a DOM parser instead of SAX. Also, unless your XML is really long, there's not much difference in parsing it during or after the download.
Have a look at Jsoup. It can deal with wrongly formatted xml.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论