开发者

Easiest way of porting html table data to readable document

开发者 https://www.devze.com 2023-04-04 17:43 出处:网络
Ok, 开发者_Python百科For the past 6 months i\'ve been struggeling to build a system that allows user input in form of big sexy textareas(with loads of support for tables,list etc). Pretty much enable

Ok,

开发者_Python百科For the past 6 months i've been struggeling to build a system that allows user input in form of big sexy textareas(with loads of support for tables,list etc). Pretty much enables the user to input data as if it were word. However when wanting to export all this data I haven't been able to find a working solution...

My first step was to try and find a reporting software that did support raw HTML from the data source and render it as normal html, worked perfectly except that the keep together function is awful, either data is split in half(tables,lists etc) which I dont want. Or report always skips to the next page to avoid this, ending up in 15+ empty pages within the final document.

So Im looking for some kind of tip/direction to what would be the best solution to export my data into a readable document(pdf or word pref).

What I got is the following data breakdown, where data is often raw html.

-Period

--Unit

---Group

----Question

-----Data

What would be the best choice? Trying to render html to pdf or rtf? I need tips :(

And also sometimes the data is 2-3 pages long with mixed tables lists and plain text.


I would suggest that you try to keep this in the browser, and add a print stylesheet to the HTML to make it render one way on the screen and another way on paper. Adding a print stylesheet to your HTML is as easy as this:

<link rel="stylesheet" media="print" href="print.css">

You should be able to parse the input it with something like Html Agility Pack and transform it (i.e. with XSLT) to whatever output format you want.

Another option is to write HTML to the browser, but with Content-Type set to a Microsoft Word-specific variant (there are several to choose from, depending on the version of Word you're targeting) should make the browser ask if the user wants to open the page with Microsoft Word. With Word 2007 and newer you can also write Office Open XML Word directly, since it's XML-based.

The content-types you can use are:

application/msword

For binary Microsoft Word files, but should also work for HTML.

application/vnd.openxmlformats-officedocument.wordprocessingml.document

For the newer "Office Open XML" formats of Word 2007 and newer.


A solution you could use is to run an application on the server using System.Diagnostics.Process that will convert the site and save it as a PDF document.

You could use wkhtmltopdf which is an open source console program that can convert from HTML to PDF or image.

The installer for windows can be obtained from wkhtmltox-0.10.0_rc2 Windows Installer (i368).

After installing wkhtmltopdf you can copy the files in the installation folder inside your solution. You can use a setup like this in the solution:

Easiest way of porting html table data to readable document

The converted pdf's will be saved to the pdf folder.

And here is code for doing the conversion:

var wkhtmltopdfLocation = Server.MapPath("~/wkhtmltopdf/") + "wkhtmltopdf.exe";
var htmlUrl = @"http://stackoverflow.com/q/7384558/750216";
var pdfSaveLocation = "\"" + Server.MapPath("~/wkhtmltopdf/pdf/") + "question.pdf\"";

var process = new Process();
process.StartInfo.UseShellExecute = false;
process.StartInfo.CreateNoWindow = true;
process.StartInfo.FileName = wkhtmltopdfLocation;
process.StartInfo.Arguments = htmlUrl + " " + pdfSaveLocation;
process.Start();
process.WaitForExit();

The htmlUrl is the location of the page you need to convert to pdf. It is set to this stackoverflow page. :)


Its a general question, but two things come to mind the Visitor Pattern and Changing the Mime Type.

Visitor Pattern You can have two seperate rendering techniques. This would be up to your implementation.

MIME Type When the request is made write date out in the Response etc

HttpContext.Current.Response.Clear();
HttpContext.Current.Response.Charset = "utf-16";
HttpContext.Current.Response.ContentEncoding = System.Text.Encoding.GetEncoding("windows-1250");
HttpContext.Current.Response.AddHeader("content-disposition", string.Format("attachment; filename={0}.doc", filename));
HttpContext.Current.Response.ContentType = "application/msword";
HttpContext.Current.Response.Write("-Period");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("--Unit");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("---Group");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("----Question");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("-----Data");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.End();


Here is another option, use print screens (Although it doesnt take care of scrolling, I think you should be able to build this in). This example can be expanded to meet the needs of your business, although it is a hack of sorts. You pass it a URL it generates an image.

Call like this

 protected void Page_Load(object sender, EventArgs e)
            {
                int screenWidth = Convert.ToInt32(Request["ScreenWidth"]);
                int screenHeight = Convert.ToInt32(Request["ScreenHeight"]);
                string url =        Request["Url"].ToString();
                string bitmapName = Request["BitmapName"].ToString();


            WebURLToImage webUrlToImage = new WebURLToImage()
            {
                Url = url,
                BrowserHeight = screenHeight,
                BrowserWidth = screenWidth,
                ImageHeight = 0,
                ImageWidth = 0
            };

        webUrlToImage.GenerateBitmapForUrl();
        webUrlToImage.GeneratedImage.Save(Server.MapPath("~") + @"Images\" +bitmapName + ".bmp");
    }

Generate an image from a webpage.

using System;
using System.Drawing;
using System.Windows.Forms;
using System.Threading;
using System.IO;

public class WebURLToImage
{
    public string Url { get; set; }
    public Bitmap GeneratedImage { get; private set; }
    public int ImageWidth { get; set; }
    public int ImageHeight { get; set; }
    public int BrowserWidth { get; set; }
    public int BrowserHeight { get; set; }

    public Bitmap GenerateBitmapForUrl()
    {
        ThreadStart threadStart = new ThreadStart(ImageGenerator);
        Thread thread = new Thread(threadStart);

        thread.SetApartmentState(ApartmentState.STA);
        thread.Start();
        thread.Join();
        return GeneratedImage;
    }

    private void ImageGenerator()
    {
        WebBrowser webBrowser = new WebBrowser();
        webBrowser.ScrollBarsEnabled = false;
        webBrowser.Navigate(Url);

        webBrowser.DocumentCompleted += new
WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);

        while (webBrowser.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        webBrowser.Dispose();
    }

    void webBrowser_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
    {
        WebBrowser webBrowser = (WebBrowser)sender;
        webBrowser.ClientSize = new Size(BrowserWidth, this.BrowserHeight);
        webBrowser.ScrollBarsEnabled = false;
        GeneratedImage = new Bitmap(webBrowser.Bounds.Width, webBrowser.Bounds.Height);
        webBrowser.BringToFront();

        webBrowser.DrawToBitmap(GeneratedImage, webBrowser.Bounds);

        if (ImageHeight != 0 && ImageWidth != 0)
            GeneratedImage =
(Bitmap)GeneratedImage.GetThumbnailImage(ImageWidth, ImageHeight,
null, IntPtr.Zero);
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号