开发者

Problem with PdfTextExtractor in itext!

开发者 https://www.devze.com 2023-01-15 20:11 出处:网络
first excuse me for my bad english! I want to search in pdf document for a word like \"Hello\" . So I must read each page in pdf by PdfTextExtractor. I did it well. I can read all words in each page s

first excuse me for my bad english! I want to search in pdf document for a word like "Hello" . So I must read each page in pdf by PdfTextExtractor. I did it well. I can read all words in each page separately an save it in string buffer. but when i push this code in For loop ,(for example from page 1 to 7 for search in it) earlier page's words will remain in string buffer.I hop you unders开发者_运维百科tand my problem. Tanx all. this is my code :

        PdfReader reader2 = new PdfReader(openFileDialog1.FileName);
        int pagen = reader2.NumberOfPages;
        reader2.Close();
        ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
        for (int i = 1; i < pagen; i++)
        {
            textBox1.Text = "";
            PdfReader reader = new PdfReader(openFileDialog1.FileName);

            String  s = PdfTextExtractor.GetTextFromPage(reader, i, its);
            //MessageBox.Show(s.Length.ToString());
            //PdfTextArray h = new PdfTextArray(s);

            //
            // s = "";
            s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
            textBox1.Text = s;
            reader.Close();

}


SimpleTextExtractionStrategy doesn't let you reset it unfortunately, so you must move your "new SimpleTextExtractionStrategy()" inside the loop instead of reusing the same object.


There is another potential problem in the statement which controls your loop:

for (int i = 1; i < pagen; i++)

If pagen = 1, the loop is not executed at all. It should read:

for (int i = 1; i <= pagen; i++)


public string ReadPdfFile(object Filename,DataTable ReadLibray)
    {
     PdfReader reader2 = new PdfReader((string)Filename);
     string strText = string.Empty;

     for (int page = 1; page <= reader2.NumberOfPages; page++)
     {
         ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();       
         PdfReader reader = new PdfReader((string)Filename);  
         String  s = PdfTextExtractor.GetTextFromPage(reader, page, its);

         s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
         strText = strText + s;
         reader.Close(); 
      }
      return strText;
    }

This Code is very HelpFull to read PDf using itext

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号