I have a set of interview transcripts in MS Word docx format, which I want to convert to my own custom xml schema:
A paragraph in my word doc looks like this:
Jon: This is my interview. Now I am shouting Now I am speaking normally again.
and in my custom schema should look like this:
<para speaker="jon">
<content>This is my interview.</content>
<content emphasis="true">Now I am shouting!</content>
<content>Now I am speaking normally again.</content>
</para>
In the docx xml, I want adjacent w:r elements to be merged into a single element in all other cases.
Any hel开发者_开发知识库p would be much appreciated.
Thanks
Swami
Your example doesn't really match your question, but to answer the question "how to merge adjacent elements w/xslt", using your example w:r, and assuming the "w" namespace prefix is already declared in scope:
<xsl:template match="w:r[1]">
<w:r>
<xsl:copy-of select="@*|node()" />
<xsl:copy-of select="following-sibling::w:r/node()" />
<!-- assuming you don't care about attributes on adjacent w:r elements -->
</w:r>
</xsl:template>
<xsl:template match="w:r" />
You can also do this w/xslt2 grouping operations, which you might want to look into if your case is more complex than this simple example.
Full code here. Thanks to MarkLogic Blog!
http://www.xqzone.com/blog/smallchanges/2007-12-18
精彩评论