开发者

Crop and extract text from PDF

开发者 https://www.devze.com 2023-03-09 09:39 出处:网络
I have cropped a PDF using the following command. gswin32c.exe ^ -o cropped.pdf ^ -sDEVICE=pdfwrite ^ -c \"[/CropBox [64 418 348 803] /PAGE pdfmark\" ^

I have cropped a PDF using the following command.

gswin32c.exe ^
-o cropped.pdf ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [64 418 348 803] /PAGE pdfmark" ^
-f original.pdf

The PDF is getting cropped. I used the following command to extract the 开发者_Go百科text from the cropped PDF.

gswin32c.exe ^
-q ^
-sFONTPATH=c:/windows/fonts ^
-dNODISPLAY ^
-dSAFER ^
-dDELAYBIND ^
-dWRITESYSTEMDICT ^
-dSIMPLE ^
-f ps2ascii.ps ^
-dFirstPage=1 ^
-dLastPage=1 ^
cropped.pdf ^
-> c:\output.txt ^
-dQUIET 

The output contains the text of the original PDF and not the cropped PDF.

Can someone help to extract the text only from the cropped PDF.

Thanks Nazeer


The result you got is exactly what is to be expected.

  • Cropping of a PDF page does NOT mean: cut off everything around the cropped area and delete it.

  • Cropping means: do only display what's inside the cropped area (and zoom to it), and hide what's around it.

So when you convert such a page to text, you'll also get the hidden content back.


You may be more lucky, if you try a different means to convert the cropped.pdf to text:

Open it in Acrobat/Adobe Reader.

Click 'File --> Save as Text...'

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号