Extract text from PDF(I have link to PDF) in ruby_问答_开发者

Extract text from PDF(I have link to PDF) in ruby

开发者 https://www.devze.com 2023-02-08 11:34 出处：网络

I have a link like http://www.downloads.com/help.pdf I want to download this, and parse it to get th开发者_如何学Goe text content.

相关专题：pdf ruby

I have a link like

      http://www.downloads.com/help.pdf

I want to download this, and parse it to get th开发者_如何学Goe text content.

How do I go about this? I also plan to tag-ize(if there is a word like that) the extracted text

You can either use the pdf-reader gem (the example/text.rb example is simple and worked for me): https://github.com/yob/pdf-reader

Or the command-line utility pdftotext.

The Yomu gem will also be able to extract the text from a PDF (as well as other MIME types) for you.

require 'yomu'
Yomu.new(file_path).text

You can also take a look at DocRipper, a gem I maintain, that provides a Ruby interface for text extraction from a number of document formats including PDF, doc, docx and sketch.

DocRipper uses pdftotext under the hood and avoids Java dependencies.

require 'doc_ripper'

DocRipper::rip('/path/to/file.pdf') => "Pdf text"

You can read remote files using the Ruby standard library:

require 'open-uri'
require 'doc_ripper'

tmp_file = open("some_uri")
DocRipper::rip(tmp_file.path)

Extract text from PDF(I have link to PDF) in ruby

精彩评论

关注公众号

热门标签

图文推荐

Extract text from PDF(I have link to PDF) in ruby

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：