开发者

Get document vector count in a pdf document?

开发者 https://www.devze.com 2023-02-11 09:37 出处:网络
I have a problem with some user provided pdf documents.They are created 开发者_JAVA技巧from 3d packages and are basically a HUGE list of vector lines that take and age to render (over 60 secs).

I have a problem with some user provided pdf documents. They are created 开发者_JAVA技巧from 3d packages and are basically a HUGE list of vector lines that take and age to render (over 60 secs).

How can I generate a report on the number of vector lines present in a pdf document using iTextSharp (5.0.5)?

I can get text and image data but can't see where to get a handle on vector. They don't seems to be represented as an image.


iText[Sharp]'s parser package doesn't yet handle lineTo or curveTo commands. It's a goal, but not one that's been important enough to implement as yet. Other Things are getting attention at the moment.

If you're feeling adventurous, you should check out PdfContentStreamProcessor. In a private function populateOperators, there's a long list of commands that are currently handled (in one fashion or another).

You'd need to write similar command classes for all the line art commands (moveTo, lineTo, rect, stroke, fill, clip), and expose them in some way.

Actually, if all you want to do is COUNT the number of paths, you could just implement stroke and fill to increment some static integer[s], then check them after parsing. Should be fairly simple (I'm writing in Java, but it's easy enough to translate):

private static class CountOps implements ContentOperator {
  public static int operationCount = 0;
  public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) {
    ++operationCount;
  }
}

Ah! registerContentOperator is a public function. You don't need to change iText's source at all:

PdfContentStreamProcessor proc = new PdfContentStreamProcessor(null);
CountOps counter = new CountOps();
proc.registerContentOperator("S", countOps); // stroke the path
proc.registerContentOperator("s", countOps); // close & stroke
proc.registerContentOperator("F", countOps); // fill, backward compat
proc.registerContentOperator("f", countOps); // fill
proc.registerContentOperator("f*", countOps); // fill with event-odd winding rule
proc.registerContentOperator("B", countOps); // fill & stroke
proc.registerContentOperator("B*", countOps); // fill & stroke with even-odd
proc.registerContentOperator("b", countOps); // close, fill, & stroke
proc.registerContentOperator("b*", countOps); // close, fill, & stroke with even-odd

proc.processContent( contentBytes, pageResourceDict );

int totalStrokesAndFills = CountOps.operationCount; // note that stroke&fill operators will be counted once, not twice.

Something like that. Only a null RenderListener will cause a null pointer exception if you run into any text or images. You could whip up a no-op listener yourself or use one of the existing ones and ignore its output.

PS: iTextSharp 5.0.6 should be released any day now if it isn't out already.


There is no specific Vector image. Normally it is just added to the content stream which is essentially a Vector data stream for drawing the whole page.

There is a blog article which you might find useful for understanding this at http://www.jpedal.org/PDFblog/2010/11/grow-your-own-pdf-file-%E2%80%93-part-5-path-objects/

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号