开发者

Best Practices for java IO for creating a large CSV file

开发者 https://www.devze.com 2023-04-09 07:54 出处:网络
Hi I need to create few large CSV Files the order of entires could be 开发者_如何学JAVA2 million. i so i was wondering how to do it efficiently.. and hence few questions crop up my mind

Hi I need to create few large CSV Files the order of entires could be 开发者_如何学JAVA2 million. i so i was wondering how to do it efficiently.. and hence few questions crop up my mind

1 . when we Write File via a BufferedWriter how often should we flush? however i think that bufferedWriter maintains its own buffer and it flushes it automatically once the buffer is full if that is the case then why is flush method there at all ??

  1. As the file i am going to create would be big . so when i start writing the file will the file be automatically be committed to disk?? (before calling writer.close()) or the whole file remains in the main memory till i close the writer?.

    • by commiting i mean that no part of the already written portion is in main memory i.e it is ready for GC


  1. The BufferedWriter implementation should do a pretty good job of flushing when appropriate. In your case, you should never need to call flush.

    As for why there is a flush method, this is because sometimes you will want output written immediately rather than waiting for BufferedWriter's buffer to become full. BufferedWriter isn't just for files; it can also be used for writing to the console or a socket. For example, you may want to send some data over a network but not quite enough data to cause BufferedWriter to automatically flush. In order to send this data immediately, you would use flush.

  2. All the data you have written to the BufferedWriter will not remain in memory all at the same time. It is written out in pieces (flushed) as BufferedWriter's buffer fills up. Once you call close at the end, BufferedWriter will do one more final flush for everything remaining in its buffer that it hasn't already written to disk and close the file.


If you wrap your writer in a BufferedWriter, you specify a number of bytes to be saved in memory before a physical write to disk happens. (If you don't specify, there's a default. I think it's 8k but please don't quote that as gospel.)

If you use a PrintWriter, I think it writes to disk with each line.

Other writers write to disk with each i/o call. There is no buffering. Which usually makes for sucky performance. That's why all disk writers should be wrapped in a BufferedWriter.


My inclination would be to work in segments, flushing to disk after every 1k or 2k lines. With that much data, it would seem to be pushing a memory limit. Since this operation is likely to be slow already, fail on the safe side and write to disk often.

That's my $0.02 anyways :)


BufferedWriter uses a fixed-size buffer, and will flush automatically when the buffer gets full. Hence any big file will be written in chunks.

The flush method exists because sometimes you might wish to write something to disk before the buffer is full. A typical example is a BufferedWriter wrapping a SocketOutputStream. If you do:

writer.write(request);
reader.read(response);

your thread is likely to block indefinitely, because the request will not be sent until the buffer gets full. You'd therefore do:

writer.write(request);
writer.flush(); // make sure the request is sent now
reader.read(response);

instead.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号