I am parsing XML documents. I do getTextContent() to get text from particular section that I want. The text that I get has tags like
<italic> </italic>
<sub> </sub>
..and some more. I want to strip of these tags and just keep the text, irrespective of what the tags are.
My document looks like this
<article>
   <sec>Section 1</sec>  
   <sec>Section 2
      <title>Title1</title>
      <sec>
         <title>Subtitle1</title>
         <p>........<italic> </italic>...</p>
      </sec>
      <sec>
         <title>Subtitle2</title>
         <p>........<sub> </sub>...</p>
      </sec>
   </sec>
</article>
I need all the text in <p>...</p> without the tags in it.
How can I go about it? I was thinking of identifying all the ta开发者_如何学Pythongs and replacing it with "". But there has to be a better way.
Thanks
You could apply this reg ex to the results of getTextContent()
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");
You could use a perl script to go through the file then use s/ \< .* \>  //xg; to get rid of all the tags.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论