开发者

Quickest way to read text-file line by line in Java

开发者 https://www.devze.com 2023-02-28 19:22 出处:网络
For log processing my application needs to read text files line by line. First I used the function readLine() of BufferedReader but I read on the internet that BufferedReader is slow when reading file

For log processing my application needs to read text files line by line. First I used the function readLine() of BufferedReader but I read on the internet that BufferedReader is slow when reading files.

Afterwards I tried to use FileInputStre开发者_开发技巧am together with a FileChannel and MappedByteBuffer but in this case there's no function similar to readLine() so I search my text for a line-break and process it:

    try {
        FileInputStream f = new FileInputStream(file);
        FileChannel ch = f.getChannel( );
        MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0L, ch.size());
        byte[] bytes = new byte[1024];
        int i = 0;
        while (mb.hasRemaining()) {
            byte get = mb.get();
            if(get == '\n') {
                if(ra.run(new String(bytes)))
                    cnt++;
                for(int j = 0; j<=i; j++)
                    bytes[j] = 0;
                i = 0;
            }
            else
                bytes[i++] = get;
        }
    } catch(Exception ex) {
        ex.printStackTrace();
    }

I know this is probably not a good way to implement it but when I just read the text-file in bytes it is 3 times faster then using BufferedReader but calling new String(bytes) creates a new String and makes the program even slower then when using a BufferedReader.

So I wanted to ask what is the fastest way to read a text-file line by line? Some say BufferedReader is the only solution to this problem.

P.S.: ra is an instance of RunAutomaton from the dk.brics.Automaton library.


I very much doubt that BufferedReader is going to cause a significant overhead. Adding your own code is likely to be at least as inefficient, and quite possibly wrong too.

For example, in the code that you've given you're calling new String(bytes) which is always going to create a string from 1024 bytes, using the platform default encoding... not a good idea. Sure, you clear the array afterwards, but your strings are still going to contain a bunch of '\0' characters - which means a lot of wasted space, apart from anything else. You should at least restrict the portion of the byte array the string is being created from (which also means you don't need to clear the array afterwards).

Have you actually tried using BufferedReader and found it to be too slow? You should usually write the simplest code which will meet your goals first, and then check whether it's fast enough... especially if your only reason for not doing so is an unspecified resource you "read on the internet". DO you want me to find hundreds of examples of people spouting incorrect performance suggestions? :)

As an alternative, you might want to look at Guava's overload of Files.readLines() which takes a LineProcessor.


Using plain BufferedReader I got 100+ MB/s. It is highly likely that the speed you can read the data from disk is your bottle neck, so how you do the reading won't make much difference.

BufferedReader is not the only solution, but it is fast enough for 99% of use cases, so why make things more complicated than they need to be?


Are frameworks an alternative?

I dont know about the performance, but

http://commons.apache.org/io/

http://commons.apache.org/io/api-release/index.html See IOUtils class

defines very easy to use helper classes for such cases.


According to this SO post, you might also want to give the Scanner class a shot.


i have a very simple loop that reads about 2000 lines (50k bytes) from a file on the sdcard using BufferedReader and it reads them all in about 100mS in debug mode on galaxy tab 2. not too bad. then i put a Scanner in the loop and the time went through the roof (tens of seconds), plus lots of GC_CONCURANT messages

Scanner scanner = new Scanner(line);
int eventType = scanner.nextInt(16);

so at least in my case its the Scanner that's the problem, i guess i need to scan the ints another way, but i have no idea why it could be so slow

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号