Java Apache FileUtils readFileToString and writeStringToFile problems_问答_开发者

I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case. I've developed the following JUnit test case:

    String f1String=FileUtils.readFileToString(f1);
    File temp=File.createTempFile("deleteme", "deleteme");
    FileUtils.writeStringToFile(temp, f1String);
    assertTrue(FileUtils.contentEquals(f1, temp));

This test converts a file to a string and writtes it back. However the test is failing. I think it may be because of the encodings, but in FileUtils there is no much detailed info about this. Anyone can help? Thanks!

Added for further undestanding: Why I need this? I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done. To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and p开发者_StackOverflowut it back to a file.

A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical. For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly. In short - don't do that. Use readFileToByteArray and writeByteArrayToFile instead.

Just a few thoughts:

There might actually some BOM (byte order mark) bytes in one of the files that either gets stripped when reading or added during writing. Is there a difference in the file size (if it is the BOM the difference should be 2 or 3 bytes)?
The line breaks might not match, depending which system the files are created on, i.e. one might have CR LF while the other only has LF or CR. (1 byte difference per line break)
According to the JavaDoc both methods should use the default encoding of the JVM, which should be the same for both operations. However, try and test with an explicitly set encoding (JVM's default encoding would be queried using System.getProperty("file.encoding")).

Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:

@Test
public void testFileRWAsArray() throws IOException{
    String f1String="";
    byte[] bytes=FileUtils.readFileToByteArray(f1);
    for(byte b:bytes){
        f1String=f1String+((char)b);
    }
    File temp=File.createTempFile("deleteme", "deleteme");
    byte[] newBytes=new byte[f1String.length()];
    for(int i=0; i<f1String.length(); ++i){
        char c=f1String.charAt(i);
        newBytes[i]= (byte)c;
    }
    FileUtils.writeByteArrayToFile(temp, newBytes);
    assertTrue(FileUtils.contentEquals(f1, temp));
}

By using a cast between byte-char, I have the symmetry on conversion. Thank you all!

Try this code...

  public static String fetchBase64binaryEncodedString(String path) {
        File inboundDoc = new File(path);
        byte[] pdfData;
        try {
            pdfData = FileUtils.readFileToByteArray(inboundDoc);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        byte[] encodedPdfData = Base64.encodeBase64(pdfData);
        String attachment = new String(encodedPdfData);
        return attachment;
    }

//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
   String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
   File origFile = new File(path);
   String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);

  //now decode it
  byte[] decodeData  = Base64.decodeBase64(encodedString.getBytes());
  String decodedString = new String(decodeData);
  //or actually give the path to pdf file.
  File decodedfile = File.createTempFile("DECODED", ".pdf");
  FileUtils.writeByteArrayToFile(decodedfile,decodeData);
  Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));

 // Frame frame = new Frame("PDF Viewer");

 // frame.setLayout(new BorderLayout());

}