Splitting a pdf with pdfbox, but losing the font_问答_开发者

Splitting a pdf with pdfbox, but losing the font

开发者 https://www.devze.com 2023-04-10 02:00 出处：网络

I wrote some code in Java using the pdfbox API that splits a pdf document into it\'s individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the st

I wrote some code in Java using the pdfbox API that splits a pdf document into it's individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the string on it. My problem is that when the new page is saved, I lose my font. I just made a quick word document to test it and the default font was calibri, so when I run the program I get an error box that reads: "Cannot extract the embedded font..." So it replaces the font with so开发者_开发技巧me other default.

I have seen a lot of example code that shows how to change the font when you are inputting text to be placed in the pdf, but nothing that sets the font for the pdf.

If anyone is familiar with a way to do this, (or can find documentation/examples), I would greatly appreciate it!

Edit: forgot to include some sample code

if (pageContent.indexOf(findThis) >= 0){
                PDPage pageToRip = pages.get(i);
                >>set the font of pageToRip here
                res.importPage(pageToRip); //res is the new document that will be saved
            }

I don't know if that helps any, but I figured I'd include it.

Also, this is what the change looks like if the pdf is written in calibri and split:

Splitting a pdf with pdfbox, but losing the font

Note: This might be a nonissue, it depends on the font used in the files that will need to be processed. I tried some things besides Calibri and it worked out fine.

From How to extract fonts from a PDF:

You actually cannot extract a font from a PDF, not even if the font is fully embedded. There are two reasons why this is not feasible:

•Most fonts are copyrighted, making it illegal to use an extractor.

•When a font is embedded in a PDF, not all of the font data are included. Obviously the font outline data are included as well as the font width tables. Other information, such as data about ligatures, are irrelevant within the PDF so those data do not get enclosed in a PDF. I am not aware of any font extraction tools but if you come across one, the above reasons should make it clear that these utilities are to be avoided.