關(guān)于代碼單元和代碼點的理解:
1、一個代碼點可能包含一個或兩個代碼單元。
2、在我的測試程序中,“我 ”也只占用一個代碼單元。即代碼點數(shù)等于代碼單元數(shù)。
下面是在unicode的官方網(wǎng)站上找到的關(guān)于unicode的中文,韓文,日文的一些說明:
Q: I have heard that UTF-8 does not support some Japanese characters. Is this correct?
A: There is a lot of misinformation floating around about the support of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X 0221, or JIS X 0213, for example, and many more. This is true no matter which encoding form of Unicode is used: UTF-8, UTF-16, or UTF-32.
Unicode supports over 70,000 CJK characters right now, and work is underway to encode further additions. The International Standard ISO/IEC 10646 and the Unicode Standard are completely synchronized in repertoire and content. And that means that Unicode has the same repertoire as GB 18030, since that also is synchronized with ISO 10646 — although with a different ordering and byte format.
無論是那個編碼方式(UTF-8, UTF-16, or UTF-32)都可以對中文全面支持?
我的測試程序如下:
public class test0 {
public static void main(String[] args)
{String a="我 ";
int cuCount=a.length();
System.out.println("the number of code units required for string \"test\" in the UTF-16 encoding is "+cuCount);
int cpCount=a.codePointCount(0, a.length());
System.out.println("the number of code points is "+cpCount);
System.out.println("the end of string \"我 \" is "+a.charAt(a.length()-1));
}
}
輸出結(jié)果為:
the number of code units required for string "test" in the UTF-16 encoding is 2
the number of code points is 2
the end of string "我 " is [空格]
在eclipse里面找到了set encoding選項,在里面可以設(shè)置編碼方式。??
回