开发者

what is the meaning of Kanatype Sensitive KS and width sensitive

开发者 https://www.devze.com 2023-04-06 09:47 出处:网络
When creating new database I had to set the collation type or set its default....fine. But actually I need to know what does Kanatype Sensitive(KS) and width sensitive means, its known for me that fo

When creating new database I had to set the collation type or set its default....fine.

But actually I need to know what does Kanatype Sensitive(KS) and width sensitive means, its known for me that for example 开发者_JS百科the case sensitive means that the letters are sensitive to upper and lower cases what about the Kanatype Sensitive and width sensitive??


Both have to do with sorting and typically you would not select these two options. Here is a description couresty of Microsoft.

Kanatype Sensitive

Distinguishes between the two types of Japanese kana characters: Hiragana and Katakana.

If this option is not selected, SQL Server considers Hiragana and Katakana characters to be equal for sorting purposes

Width Sensitive

Distinguishes between a single-byte character and the same character when represented as a double-byte character.

If this option is not selected, SQL Server considers the single-byte and double-byte representation of the same character to be identical for sorting purposes.


TL;DR:

Kanatype insensitivity makes sorting Japanese text more intuitive and should generally always be enabled unless you have a reason not to.


FULL EXPLANATION:

In general, if you're storing any Japanese text that needs to be sorted, you probably want to go with Kanatype insensitive. Why would you want it like this? Because it makes sorting more intuitive in terms of Japanese language.

In english, since we have only one writing system, it's easy to sort things algorithmically. We simply order the characters by their character codes (already in alphabetical order) and we're done. In Japanese, though, because there are multiple ways to write out equivalent sounds, sorting can get a bit tricky. Hiragana and Katakana alphabets are separated into separate Unicode blocks, so when we try sorting things with "Kanatype sensitivity", we end up with results that aren't completely intuitive.

Imagine you had a list of names that you wanted to sort:

{ "ピカチュウ","さとし","マリオ","まちだ","はるか" }

The romanized equivalent to the list is:

{ "Pikachu","Satoshi","Mario","Machida","Haruka" }

When sorted kanatype sensitive, you would get the following result:

{ "さとし","はるか","まちだ","ピカチュウ","マリオ" }

{ "Satoshi","Haruka","Machida","Pikachu","Mario" }

When sorted kanatype insensitive, you would get this result instead:

{ "さとし","はるか","ピカチュウ","まちだ","マリオ" }

{ "Satoshi","Haruka","Pikachu","Machida","Mario" }

To Japanese speakers, the second sort is a lot more intuitive, as the results are actually sorted phonetically instead of based on character sets. "まちだ" and "マリオ" both start with the same phonetic sound, but because one uses hiragana "ma" and the other uses katakana "ma", they are separated when kanatype sensitivity is enabled. With kanatype insensitivity, the list can be properly sorted so that the two words appear next to each other on the list despite their writing system differences.

A good analogy for English language would be case-sensitivity. Imagine if you wanted to sort a list of words for a dictionary, some of them proper nouns while others are not:

{"New York","new","jet","Japan","squirm","SQL"}

If we ignored the fact that uppercase and lowercase letters represent the same letter and just sort based on character code, we would get something like this:

{"Japan", "New York", "SQL", "jet", "new", "squirm"}

A dictionary sorted like this would hardly be useful, especially if we wanted to look up a word without knowing whether it started with an uppercase or lowercase letter. We'd have to check the first part of the dictionary with all the proper nouns before checking the last part with all other words.

If we ran a case insensitive sort that treat "A" and "a" as the same letter despite having separate character codes. We would get a result that is much more intuitive:

{"Japan","jet","new","New York","squirm","SQL"}

So in general, unless you have a specific reason not to, you should always disable kanatype sensitivity. A phonebook-lookup would be kanatype sensitive. Note that in Japanese there is also an additional character type, Kanji, that you would also need to work with. Kanji is much harder to sort, as there are almost always multiple ways to read each Kanji and no real "alphabetical" order to the Kanji. Most forms intended for Japanese people usually have two fields for names: the user's name as it is normally written out, and the user's name completely written out in katakana. Not only does this let people know how to correctly pronounce a name which might be ambiguous written solely in Kanji, but it allows software to sort by the unambiguous katakana-only field, making the sort kanatype insensitive.

For more information, I definitely recommend checking out this excellent article, which explains the issues with sorting in Japanese much better than I can.

Reference: https://japanese.stackexchange.com/questions/29612/what-do-you-need-kanatype-sensitivity-for

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号