开发者

Why character streams?

开发者 https://www.devze.com 2023-02-18 03:35 出处:网络
I understand that Java character streams wrap byte streams such that the underlying byte stream is interpreted as per the system default or an otherwise specifically defined character set.

I understand that Java character streams wrap byte streams such that the underlying byte stream is interpreted as per the system default or an otherwise specifically defined character set.

My systems default char-set is UTF-8.

If I use a FileReader to read in a text file, everything look开发者_JAVA技巧s normal as the default char-set is used to interpret the bytes from the underlying InputStreamReader. If I explicitly define an InputStreamReader to read the UTF-8 encoded text file in as UTF-16, everything obviously looks strange. Using a byte stream like FileInputStream and redirecting its output to System.out, everything looks fine.

So, my questions are;

  • Why is it useful to use a character stream?

  • Why would I use a character stream instead of directly using a byte stream?

  • When is it useful to define a specific char-set?


Code that deals with strings should only "think" in terms of text - for example, reading an input source line by line, you don't want to care about the nature of that source.

However, storage is usually byte-oriented - so you need to create a conversion between the byte-oriented view of a source (encapsulated by InputStream) and the character-oriented view of a source (encapsulated by Reader).

So a method which (say) counts the lines of text in an input source should take a Reader parameter. If you want to count the lines of text in two files, one of which is encoded in UTF-8 and one of which is encoded in UTF-16, you'd create an InputStreamReader around a FileInputStream for each file, specifying the appropriate encoding each time.

(Personally I would avoid FileReader completely - the fact that it doesn't let you specify an encoding makes it useless IMO.)


An InputStream reads bytes, while a Reader reads characters. Because of the way bytes map to characters, you need to specify the character set (or encoding) when you create an InputStreamReader, the default being the platform character set.


When you are reading/writing text which contains characters which could be > 127 , use a char stream. When you are reading/writing binary data use a byte stream.

You cna read text as binary if you wish, but unless you make alot of assumptions it rarely gains you much.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号