开发者

create a index from pdf [duplicate]

开发者 https://www.devze.com 2023-03-24 07:25 出处:网络
This question already has answers here: 开发者_StackOverflowClosed 11 years ago. Possible Duplicate:
This question already has answers here: 开发者_StackOverflow Closed 11 years ago.

Possible Duplicate:

How do I Index PDF files and search for keywords?

Create an index out of a PDF.


I think you can use pypdf Python library for this. This code show numbers of pages which include required word:

from pyPdf import PdfFileReader

input = PdfFileReader(file("YourPDFFile.pdf", "rb"))

numberOfPages = input.getNumPages()

i = 1
while i <  numberOfPages:
    oPage = input.getPage(i)
    text = oPage.extractText()
    text.encode('utf8', 'ignore')
    if text.find('What are you looking for') != -1:
        print i
    i += 1

The same but working with Python 3

from pyPdf import PdfFileReader

input = PdfFileReader(open("YourPDFFile.pdf", "rb"))

numberOfPages = input.getNumPages()

i = 1
while i <  numberOfPages:
    oPage = input.getPage(i)
    text = oPage.extractText()
    text.encode('utf8', 'ignore')
    if text.find('What are you looking for') != -1:
        print(i)
    i += 1
0

精彩评论

暂无评论...
验证码 换一张
取 消