iTextSharp HTML to PDF preserving spaces_问答_开发者

I am using the FreeTextBox.dll to get user input, and storing that information in HTML format in the database. A samle of the user's input is the below:

133 Peachtree St NE

Atlanta, GA 30303

404-652-7777

Cindy Cooley

www.somecompany.com

Product Stewardship Mgr

9/9/2011

Deidre's Company

123 Test St

Atlanta, GA 30303

Test test.

I want the HTMLWorker to perserve the white spaces the users enters, but it strips it out. Is there a way to perserve the user's white space? Below is an example of how I am creating my PDF document.

Public Shared Sub CreatePreviewPDF(ByVal vsHTML As String, ByVal vsFileName As String)

        Dim output As New MemoryStream()
        Dim oDocument As New Document(PageSize.LETTER)
        Dim writer As PdfWriter = PdfWriter.GetInstance(oDo开发者_StackOverflowcument, output)
        Dim oFont As New Font(Font.FontFamily.TIMES_ROMAN, 8, Font.NORMAL, BaseColor.BLACK)

        Using output
            Using writer
                Using oDocument
                    oDocument.Open()
                    Using sr As New StringReader(vsHTML)
                        Using worker As New html.simpleparser.HTMLWorker(oDocument)

                            worker.StartDocument()
                            worker.SetInsidePRE(True)
                            worker.Parse(sr)
                            worker.EndDocument()
                            worker.Close()
                            oDocument.Close()

                        End Using
                    End Using

                    HttpContext.Current.Response.ContentType = "application/pdf"
                    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment;filename={0}.pdf", vsFileName))
                    HttpContext.Current.Response.BinaryWrite(output.ToArray())
                    HttpContext.Current.Response.End()

                End Using
            End Using
            output.Close()
        End Using


    End Sub

There's a glitch in iText and iTextSharp but you can fix it pretty easily if you don't mind downloading the source and recompiling it. You need to make a change to two files. Any changes I've made are commented inline in the code. Line numbers are based on the 5.1.2.0 code rev 240

The first is in iTextSharp.text.html.HtmlUtilities.cs. Look for the function EliminateWhiteSpace at line 249 and change it to:

    public static String EliminateWhiteSpace(String content) {
        // multiple spaces are reduced to one,
        // newlines are treated as spaces,
        // tabs, carriage returns are ignored.
        StringBuilder buf = new StringBuilder();
        int len = content.Length;
        char character;
        bool newline = false;
        bool space = false;//Detect whether we have written at least one space already
        for (int i = 0; i < len; i++) {
            switch (character = content[i]) {
            case ' ':
                if (!newline && !space) {//If we are not at a new line AND ALSO did not just append a space
                    buf.Append(character);
                    space = true;  //flag that we just wrote a space
                }
                break;
            case '\n':
                if (i > 0) {
                    newline = true;
                    buf.Append(' ');
                }
                break;
            case '\r':
                break;
            case '\t':
                break;
            default:
                newline = false;
                space = false;  //reset flag
                buf.Append(character);
                break;
            }
        }
        return buf.ToString();
    }

The second change is in iTextSharp.text.xml.simpleparser.SimpleXMLParser.cs. In the function Go at line 185 change line 248 to:

if (html /*&& nowhite*/) {//removed the nowhite check from here because that should be handled by the HTML parser later, not the XML parser

Thanks for the help everyone. I was able to find a small work around by doing the following:

vsHTML.Replace("  ", "&nbsp;&nbsp;").Replace(Chr(9), "&nbsp;&nbsp;&nbsp;&nbsp;").Replace(Chr(160), "&nbsp;").Replace(vbCrLf, "<br />")

The actual code does not display properly but, the first replace is replacing white spaces with  , Chr(9) with 5  , and Chr(160) with  .

I would recommend using wkhtmltopdf instead of iText. wkhtmltopdf will output the html exactly as rendered by webkit (Google Chrome, Safari) instead of iText's conversion. It is just a binary that you can call. That being said, I might check the html to ensure that there are paragraphs and/or line breaks in the user input. They might be stripped out before the conversion.