开发者

Parsing xml to html in Java

开发者 https://www.devze.com 2023-04-03 03:43 出处:网络
I googled many Ja开发者_如何学编程va API\'s to parse the xml into HTML but confused from where I start. I never did any xml to html parsing task. This is ouput of resume parsing 3rd party in shape of

I googled many Ja开发者_如何学编程va API's to parse the xml into HTML but confused from where I start. I never did any xml to html parsing task. This is ouput of resume parsing 3rd party in shape of xml data and I have to transform it into html.

Best regards


There is no "parse to html", maybe you mean "transform to html", in that case take a look at XSLT.

XSLT is a language (written in XML itself) to transform XML to another XML, and XHTML happens to be an XML, so using XSLT you can transform from one to another.

As for the Java library to use, you can use directly classes in the JRE, namely javax.xml.transform.TransformerFactory and related classes. Otherwise you can use XALAN directly (see http://xalan.apache.org) or SAXON, or Cocoon 3 (http://cocoon.apache.org) which makes parsing, transforming and saving the result file transparent.


Disclaimer: I work for Sovren, a Resume/CV Parser vendor.

There are generally two approaches for converting a Resume/CV into HTML:

  1. Convert directly from the original format (DOC, DOCX, RTF, etc.) into HTML, retaining the layout and appearance of the original Resume/CV. There are many general purpose document conversion tools out there that can do this. Some Resume/CV Parser vendors include this functionality in their product (Sovren does).

  2. Parse the Resume/CV to extract the data into a structured format like XML, then transform that XML into HTML. This approach has the advantage of being able to transform widely varied Resume/CV layouts into a common "branded" layout for your users. Check with your Resume/CV parsing vendor to see if they provide XSLT templates for transforming their XML into HTML, RTF, etc. The process of Parsing-Resume/CV-to-XML and then Transforming-XML-to-HTML can eliminate over 90% of the manual effort, but beware that you should not plan on this process being 100% automated. Even the best Resume/CV parsers have trouble interpreting some Resumes/CVs, so there will be some oddball results, and you will want a human to verify/edit the results before you showing that generated HTML Resume/CV to a client.

Sovren provides starter XSLT templates for transforming their XML into other formats. I can't supply a full XSLT template, but here is a rewritten subset of an XSLT template for transforming ContactInfo data from HR-XML Resume 2.5 into HTML:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
                xmlns="http://www.w3.org/1999/xhtml"
                xmlns:hr="http://ns.hr-xml.org/2006-02-28" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
                exclude-result-prefixes="xsl xsi hr">

    <xsl:output method="xml"
        media-type="text/html"
        doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
        indent="yes"
        encoding="utf-8"/>

    <xsl:template match="/">
        <html>
            <head>
                <title>HTML Resume from HR-XML Resume 2.5</title>
                <style type="text/css">
                    body { font-family: sans-serif; font-size: 10pt }
                    th { font-family: sans-serif; font-size: 10pt; font-weight: bold; padding-right: 16px; text-align: left;}
                    td { font-family: sans-serif; font-size: 10pt; padding-right:16px; }
                    h1 { font-family: sans-serif; font-size: 12pt; background-color: #FFFFCC; margin-top: 20px }
                    h2 { font-family: sans-serif; font-size: 10pt; font-style: bold; margin-top: 20px }
                </style>
            </head>
            <body>
                <p>
                    <img src="Logo.png" alt="" />
                </p>
                <xsl:for-each select="/hr:Resume/hr:StructuredXMLResume/hr:ContactInfo">
                    <h1>CONTACT INFORMATION</h1>
                    <p>
                        <b>
                            <xsl:value-of select="hr:PersonName/hr:FormattedName"/>
                        </b>
                        <br/>
                        <xsl:for-each select=".//hr:PostalAddress">
                            <xsl:text>Location: </xsl:text>
                            <xsl:value-of select="hr:Municipality"/>
                            <xsl:if test="string-length(hr:Municipality) > 0 and string-length(hr:Region) > 0">
                                <xsl:text>, </xsl:text>
                            </xsl:if>
                            <xsl:value-of select="hr:Region"/>
                            <xsl:if test="string-length(hr:Municipality) > 0 or string-length(hr:Region) > 0">
                                <xsl:text>&#160;</xsl:text>
                            </xsl:if>
                            <xsl:value-of select="hr:CountryCode"/>
                        </xsl:for-each>
                        <br/>
                        <xsl:for-each select=".//hr:InternetEmailAddress">
                            Email: <a href="mailto:"><xsl:value-of select="."/></a>
                            <br/>
                        </xsl:for-each>
                        <xsl:for-each select=".//hr:Telephone/hr:FormattedNumber">
                            Phone: <xsl:value-of select="."/>
                            <br/>
                        </xsl:for-each>
                        <xsl:for-each select=".//hr:Mobile/hr:FormattedNumber">
                            Mobile: <xsl:value-of select="."/>
                            <br/>
                        </xsl:for-each>
                        <xsl:for-each select=".//hr:Fax/hr:FormattedNumber">
                            Fax: <xsl:value-of select="."/>
                            <br/>
                        </xsl:for-each>
                    </p>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>

</xsl:stylesheet>
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号