I want to convert pdf into text. I tried this code in python command prompt but it is not showing any output. Maybe I'm wrong. Can you please tell me where im wrong. Thanks in advance.
import pyPdf
def getPDFContent(path):
content = ""
# Load PDF into pyPDF
pdf = pyPdf.PdfFileReader(file(path, "rb"))
# Iterate pages
for i in range(0, pdf.getNumPages()):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + "\n"
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
return content
print getPDFContent("test.pdf").encode("ascii", "ignore")
If your PDF contains only images (e.g. from a scanned page) then you won't be able to extract any text.
精彩评论