开发者

Open Source Java Text Parsers

开发者 https://www.devze.com 2023-03-15 04:30 出处:网络
Is there a single Java text parser which can be used to parse Office (windows) documents, OpenOffice documents, and PDFs as well? Else do I need to use something开发者_如何学Python like Apache POI for

Is there a single Java text parser which can be used to parse Office (windows) documents, OpenOffice documents, and PDFs as well? Else do I need to use something开发者_如何学Python like Apache POI for Word documents and other libraries for OpenOffice and PDFs? If so what are the best options for OpenOffice and PDFs?


Apache Tika:

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Not sure whether this qualifies as "single" for your purposes.


If the task is reading PDF documents, iText is your best bet. For Microsoft Office and OpenOffice (LibreOffice) based documents, POI would be my solution.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号