I need to extract table objects from PDF documents preferably programmatically using Perl. I am able to cut and paste to Excel, but the table would require quite a bit of manual editing once the data is imported into Excel.
I've done some search, 开发者_JAVA百科but so far it seems though most forums suggest that most APIs are very primitive.
The best module I know of for dealing with PDFs in perl is PDF::API2. However without knowing more about the manipulation you need to do its hard to give further recommendation. Another possibility is to program using Excel's built in VB functionality so that when you copy the tables into your excel spreadsheet it fires off a macro that will perform your formatting for you.
I think the best CPAN module for this would probably be CAM::PDF
.
However I've not used the module so I cannot confirm it will (easily) do what you require but it is a PDF manipulation library
and the modules author does answer questions about CAM::PDF
here on SO.
Also see this previous question: How can I extract text from a PDF file in Perl?
/I3az/
精彩评论