When editing rich text content, our CMS generates XML-files with duplicate <br/>-tags. I'd like to remove them in order to generate output that can be read by another application that does not appreciate the occurrence of those duplicates.
Example input:
<p>
Lorem ipsum...<br />
<br />
..dolor sit
</p>
Would generate something like this:
<p>
Lorem ipsum...<br />
..dolor sit
</p>
I am already using XSLT to manipulate the output i开发者_StackOverflow社区n some other ways, and have found some examples of regexps and PHP that does the same thing, I just think it would be better if I could do this with XSLT due to the speed of the engine in our CMS (Roxen).
Thanks in advance!
Building off @Nic's answer, you could use
<xsl:template match='br[preceding-sibling::node()[1][self::br]]'/>
I've just changed * to node().
This would solve the problem of conflating two <br/>s that have text in between. However it would stop removing duplicate <br/>s even if there is only a whitespace node in between.
To solve that...
Deprecated
At first I had suggested you could strip whitespace-only nodes from p elements in the input doc, by putting this at the top level of your XSLT:
<xsl:strip-space elements="p"/>
But @Alejandro pointed out that this could easily cause you to lose important spaces, as in <p><em>bar</em> <em>baz</em></p>.
So instead,
use this modified match pattern:
<xsl:template match='br[preceding-sibling::node()
[not(self::text() and normalize-space(.) = "")][1]
[self::br]]'/>
Kind of ugly but it should work. This will match and suppress "any br for which the preceding sibling node that is not a whitespace-only text node is also a br." :-)
Given that the match pattern is so complex, you may prefer to move some of that logic into the template body, as follows. I guess this is more a matter of personal taste and style:
<xsl:template match="br">
<xsl:if test="not(preceding-sibling::node()
[not(self::text() and normalize-space(.) = '')][1]
[self::br])">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:if>
</xsl:template>
Here we use a copy of the identity transform when the <br /> is not one we want to suppress. I don't think <br /> can take child elements or text, but it doesn't hurt to be safe.
(Updated the above. I had forgotten to finish that sample code last time I saved edits.)
Using an identity transform to leave everything else alone, you could simply suppress every <br/> that is directly preceded by another one. Obviously, you can then just fit the template into your existing XSLT.
<xsl:template match='node()|@*'>
<xsl:copy>
<xsl:apply-templates select='node()|@*'/>
</xsl:copy>
</xsl:template>
<xsl:template match='br[(preceding-sibling::*)[1][self::br]]'/>
The empty template will simply suppress that <br/>.
Update: As @LarsH points out, that template is too liberal in its matching and probably should be something like:
<xsl:template match='br[preceding-sibling::node()[1]
[not(self::text() and normalize-space(.) = "")][self::br]]'/>
加载中,请稍侯......
精彩评论