濡備綍錛堝湪紼嬪簭涓級鍔犲叆騫朵嬌鐢?Unicode 浠ュ疄鐜板璇敮鎸?/p>
![]() |
|
綰у埆錛?鍒濈駭
Thomas W. Burger
(twburger@bigfoot.com)Thomas Wolfgang Burger Consulting 鐨勮佹澘
2001 騫?8 鏈?01 鏃?/p>
浣滀負涓涓綆楁満鐨勫浣嶅瓧絎﹁〃紺虹郴緇燂紝Unicode 鏀寔涓栫晫涓婃墍鏈夎璦鐨勭紪鐮佸拰杞崲銆傝繖綃囨枃绔犺鏄庝簡 Linux 搴旂敤紼嬪簭涓殑鍥介檯璇█鏀寔鐨勯噸瑕佹э紝浠ュ強瑙勫垝 Unicode 鏀寔騫跺皢涔嬬粨鍚堝埌 Linux 搴旂敤紼嬪簭涓幓鐨勬濇兂銆?/blockquote>Unicode 騫朵笉鍙槸涓涓紪紼嬪伐鍏鳳紝瀹冭繕鏄竴涓斂娌葷殑銆佺粡嫻庣殑宸ュ叿銆傛病鏈夌粨鍚堜笘鐣岀殑璇█鏀寔鐨勫簲鐢ㄧ▼搴忛氬父鍙兘琚偅浜涜兘璇誨啓 ASCII 鎵鏀寔璇█鐨勪釜浜轟嬌鐢ㄣ傝繖浣垮緱寤虹珛鍦?ASCII 鍩虹涔嬩笂鐨勮綆楁満鎶鏈劚紱諱簡涓栫晫涓婂ぇ閮ㄥ垎浜恒俇nicode 鍏佽紼嬪簭浣跨敤涓栫晫涓婁換浣曚竴縐嶅瓧絎﹂泦錛屽洜姝ゅ畠鏀寔鎵鏈夎璦銆?/p>
Unicode 璁╃▼搴忓憳涓烘櫘閫氫漢鎻愪緵鐢ㄤ粬浠湰鍥借璦灝辮兘浣跨敤鐨勮蔣浠躲傝繖鏍峰氨涓嶇敤鍐嶅涓闂ㄥ璇簡錛岃屼笖鏇村鏄撳疄鐜拌綆楁満鎶鏈ぞ浼氬拰璐㈡斂涓婄殑鍒╃泭銆傚緢瀹規槗璁炬兂錛屽鏋滅敤鎴峰繀欏諱負浣跨敤鍥犵壒緗戞祻瑙堝櫒鑰屽涔犱箤灝旈兘璇殑璇濓紝鎮ㄥ氨闅句互鐪嬪埌璁$畻鏈哄湪緹庡浗鐨勪嬌鐢ㄣ俉eb 灝辨洿涓嶄細鍑虹幇浜嗐?/p>
Linux 鎵挎媴浜嗗 Unicode 寰堝ぇ紼嬪害涓婄殑鏀寔銆俇nicode 鏀寔琚祵鍏ュ埌鍐呮牳鍜屼唬鐮佸紑鍙戝簱涓傚湪寰堝ぇ紼嬪害涓婏紝浣跨敤紼嬪簭涓嚑鍙ョ畝鍗曠殑鍛戒護灝辮兘灝嗗畠浠嚜鍔ㄧ殑緇撳悎鍒頒唬鐮佷腑銆?/p>
鎵鏈夌幇浠e瓧絎﹂泦鐨勫熀紜閮芥槸鍦?1968 騫翠互 ANSIX3.4 鐗堟湰鍑虹増鐨勭編鍥戒俊鎭氦鎹㈡爣鍑嗙爜錛圓merican Standard Code for Information Interchange錛孉SCII錛夈備竴涓煎緱娉ㄦ剰鐨勪緥澶栨槸鍦?ASCII 涔嬪墠瀹氫箟鐨?IBM 鐨勬墿鍏呯殑浜岃繘鍒剁紪鐮佺殑鍗佽繘鍒朵氦鎹㈢爜錛圗xtended Binary Coded Decimal Information Code錛孍BCDIC錛夈侫SCII 鏄竴涓紪鐮佸瓧絎﹂泦錛坈oded character set錛孋CS錛夛紝鎹㈠彞璇濊錛屽畠鏄暣鏁板埌瀛楃琛ㄧず鐨勬槧灝勩侫SCII 緙栫爜瀛楃闆嗗厑璁哥敤涓涓叓浣嶏紙鍩轟簬浜岃繘鍒剁殑錛岀敤鍊?0 鎴?1 琛ㄧず鐨勶級瀛楁鎴栧瓧鑺傦紙2^8 =256錛夎〃紺?256 涓瓧絎︺傝繖鏄竴涓珮搴﹀彈闄愮殑緙栫爜瀛楃闆嗭紝瀹冧笉鑳借〃紺鴻澶氫笉鍚岃璦鐨勬墍鏈夊瓧絎︼紙濡備腑鏂囧拰鏃ユ枃錛夛紝涓嶈兘琛ㄧず縐戝絎﹀彿錛屾洿涓嶈兘琛ㄧず鍙や唬鏂囧瓧錛堢縐樼鍙峰拰璞″艦鏂囧瓧錛夊拰闊充箰絎﹀彿銆傞氳繃鏇存敼涓涓瓧鑺傜殑闀垮害鑰屼嬌鏇村ぇ鐨勫瓧絎﹂泦寰椾互琚紪鐮侊紝榪欎技涔庢湁鏁堜絾瀹屽叏涓嶅垏瀹為檯銆傛墍鏈夌殑璁$畻鏈洪兘鍩轟簬鍏綅瀛楄妭銆傝В鍐蟲柟娉曟槸涓縐嶅瓧絎︾紪鐮佹柟妗堬紙Character encoding scheme錛孋ES錛夆?鐢ㄥ畾闀挎垨鍙橀暱鐨勫瀛楄妭搴忓垪鑳藉琛ㄧず姣?256 澶х殑鏁?榪欎簺鏁板兼帴鐫閫氳繃緙栫爜瀛楃闆嗚鏄犲皠鍒板畠浠〃紺虹殑瀛楃銆?/p>
Unicode 閫氬父鐢ㄤ綔娑夊強鍙屽瓧鑺傚瓧絎︾紪鐮佹柟妗堢殑閫氱敤鏈銆俇nicode CCS 3.1 鐨勫畼鏂圭О璋撴槸 ISO10646-1 閫氱敤澶氬叓瀛楄妭緙栫爜瀛楃闆嗭紙Universal Multiple Octet Coded Character Set錛孶CS錛夈俇nicode 3.1 鐗堟湰娣誨姞浜?44,946 涓柊鐨勭紪鐮佸瓧絎︺傜畻涓?Unicode 3.0 鐗堟湰宸茬粡瀛樺湪鐨?49,194 涓瓧絎︼紝鍏辮 94,140 涓?/p>
Unicode 緙栫爜瀛楃闆嗗埄鐢ㄤ簡涓涓敱 128 涓笁緇寸殑緇勬瀯鎴愮殑鍥涚淮緙栫爜絀洪棿銆傚叾涓瘡涓粍鍖呭惈 256 涓簩緇村鉤闈€傛瘡涓鉤闈㈢敱 256 涓竴緇寸殑琛岀粍鎴愶紝騫朵笖姣忎釜琛屾湁 256 涓崟鍏冦傛瘡涓崟鍏冨湪榪欎釜緙栫爜絀洪棿鍐呭涓涓瓧絎︾紪鐮侊紝鎴栬呰澹版槑涓烘湭緇忎嬌鐢ㄣ傝繖縐嶇紪鐮佹蹇佃縐頒負 UCS-4錛涘洓涓叓浣嶅厓鐢ㄦ潵琛ㄧず鎸囧畾緇勩佸鉤闈€佽鍜屽崟鍏冪殑姣忎釜瀛楃銆?/p>
絎竴涓鉤闈紙絎?00 緇勭殑絎?00 騫抽潰錛夋槸鍩烘湰澶氳璦騫抽潰錛圔asic Multilingual Plane錛孊MP錛夈侭MP 鎸夊瓧姣嶃侀煶鑺傘佽〃鎰忕鍙峰拰鍚勭絎﹀彿鍙婃暟瀛楀畾涔変簡甯歌浣跨敤鐨勫瓧絎︺傚悗緇殑騫抽潰鐢ㄤ簬闄勫姞瀛楃鎴栧叾瀹冭繕娌℃湁鍙戞槑鐨勭紪鐮佸疄浣撱傛垜浠渶瑕佽繖瀹屾暣鐨勮寖鍥村幓澶勭悊涓栫晫涓婄殑鎵鏈夎璦錛涚壒鍒槸鎷ユ湁灝嗚繎 64,000 涓瓧絎︾殑涓浜涗笢浜氳璦銆?/p>
BMP 琚敤浣滃弻瀛楄妭鐨勭紪鐮佸瓧絎﹂泦錛岃繖縐嶇紪鐮佸瓧絎﹂泦紜畾涓?ISO 10646 UCS-2 鏍煎紡銆侷SO 10646 UCS-2 灝辨槸鎸?Unicode錛堝茍涓斾袱鑰呯浉鍚岋級銆侭MP錛屽儚鎵鏈?UCS 騫抽潰閭f牱錛屽寘鍚簡 256 琛岋紝鍏朵腑姣忚鍖呭惈 256 涓崟鍏冿紝瀛楃浠呬粎鎸夌収 BMP 涓殑琛屽拰鍗曞厓鐨勫叓浣嶅厓鍦ㄥ崟鍏冧腑琚紪鐮併?榪欏氨鍏佽 16 浣嶇紪鐮佸瓧絎﹁兘澶熻鐢ㄦ潵涔﹀啓澶у鏁板晢涓氫笂鏈閲嶈鐨勮璦銆俇CS-2 涓嶉渶瑕佷唬鐮侀〉鍒囨崲銆佷唬鐮佹墿灞曟垨浠g爜鐘舵併俇CS-2 鏄竴縐嶅皢 Unicode 緇撳悎鍒拌蔣浠朵腑鐨勭畝鍗曟柟娉曪紝浣嗗畠鍙檺浜庢敮鎸?Unicode BMP銆?/p>
鑻ヨ鐢?8 浣嶅瓧鑺傝〃紺轟竴涓浜?2^8 =256 涓瓧絎︾殑瀛楃緙栫爜緋葷粺錛坈haracter coding system錛孋CS錛夛紝灝遍渶瑕佷竴縐嶅瓧絎︾紪鐮佹柟妗?character-encoding scheme錛孋ES錛夈?/p>
![]()
![]()
![]()
![]()
鍥為〉棣?/font>
鍦?UNIX 涓紝浣跨敤寰楁渶澶氱殑瀛楃緙栫爜鏂規鏄?UTF-8銆?瀹冭冭檻鍒頒簡瀵規暣涓?Unicode 鍏ㄩ儴欏靛拰騫抽潰鐨勫叏闈㈡敮鎸侊紝鑰屼笖瀹冧粛鑳芥紜殑璇嗗埆 ASCII銆傞櫎浜?UTF-8 鐨勫叾浠栭夋嫨榪樻湁錛歎CS-4銆乁TF-16銆乁TF-7.5銆乁TF-7銆丼CSU銆丠TML 鍜?JAVA銆?/p>
Unicode 杞崲鏍煎紡錛圲nicode Transformation Formats錛孶TFs錛夋槸涓縐嶉氳繃鏄犲皠澶氬瓧鑺傜紪鐮佷腑鐨勫兼潵鏀寔 Unicode 鐨勫瓧絎︾紪鐮佹柟妗堛傛湰鏂囧皢鍒嗘瀽鏈嫻佽鐨勬牸寮?鈥?UTF-8 瀛楃緙栫爜緋葷粺銆?/p>
UTF-8 杞崲鏍煎紡姝i愭鎴愪負涓縐嶅崰涓誨鍦頒綅鐨勪氦鎹㈠浗闄呮枃鏈俊鎭殑鏂規硶錛屽洜涓哄畠鍙互鏀寔涓栫晫涓婃墍鏈夌殑璇█錛岃屼笖瀹冭繕涓?ASCII 鍏煎銆俇TF-8 浣跨敤鍙橀暱緙栫爜銆備粠 0 鍒?0x7f錛?27錛夌殑瀛楃鎶婅嚜韜紪鐮佹垚鍗曞瓧鑺傦紝鑰屽皢鍊兼洿澶х殑瀛楃緙栫爜鎴?2 鍒?6 涓瓧鑺傘?/p>
0x00000000 - 0x0000007F: 0 xxxxxxx 0x00000080 - 0x000007FF: 110 xxxxx10 xxxxxx 0x00000800 - 0x0000FFFF: 1110 xxxx10 xxxxxx10 xxxxxx 0x00010000 - 0x001FFFFF: 11110 xxx10 xxxxxx10 xxxxxx 10 xxxxxx 0x00200000 - 0x03FFFFFF: 111110 xx10 xxxxxx10 xxxxxx10 xxxxxx 10 xxxxxx 0x04000000 - 0x7FFFFFFF: 1111110 x10 xxxxxx10 xxxxxx10 xxxxxx 10 xxxxxx10 xxxxxx 瀛楄妭 10 xxxxxx鏄竴涓墿灞曞瓧鑺傦紝瀹冪殑 xxxxxx 浣嶄綅緗浠ヤ簩榪涘埗琛ㄧず鐨勫瓧絎︿唬鐮佸彿鐨勪綅鎵濉厖銆傝繖鏄兘澶熶唬琛ㄨ浣跨敤浠g爜鐨勬渶鐭殑鍙兘鐨勫瀛楄妭搴忓垪銆?
Unicode 瀛楃鐗堟潈鏍囪瀛楃 0xA9 = 1010 1001 鐢?UTF-8 緙栫爜濡備笅鎵紺猴細
11000010 10101001 = 0xC2 0xA9
鈥滀笉絳変簬鈥濈鍙峰瓧絎?0x2260 = 0010 0010 0110 0000 緙栫爜濡備笅鎵紺猴細
11100010 10001001 10100000 = 0xE2 0x89 0xA0
閫氳繃鑾峰彇
continuation byte
鐨勫煎彲浠ョ湅鍒板師濮嬫暟鎹細[1110]0010 [10]001001 [10]100000
0010 001001 100000
0010 0010 0110 0000 = 0x2260絎竴涓瓧鑺傚畾涔夊悗闈㈢揣璺熺殑鍏綅鍏冩暟錛屽鏋滄槸 7F 鎴栨洿灝忥紝榪欏氨鏄瓑浠風殑 ASCII 鍊箋傛瘡涓叓浣嶅瓧鑺備互 10 xxxxxx 寮澶達紝紜繚瀛楄妭涓嶄笌 ASCII 鐨勫兼販娣嗐?
![]()
![]()
![]()
![]()
鍥為〉棣?/font>
鍦?Linux 騫沖彴涓婁嬌鐢?UTF-8 涔嬪墠錛岃紜俊鍒嗗彂鍖呴噷鏈?glibc 2.2 鍜?XFree86 4.0 鎴栨洿鏂扮殑鐗堟湰銆傛棭鍏堢殑鐗堟湰緙哄皯 UTF-8 璇█鐜鏀寔鍜?ISO10646-1 X11 瀛椾綋銆?/p>
鍦?UTF-8 鍙戝竷涔嬪墠錛孡inux 鐢ㄦ埛浣跨敤鍚勭涓嶅悓鐗瑰畾璇█鐨勬墿灞?ASCII錛屽儚嬈ф床鐢ㄦ埛鐢?ISO 8859-1 鎴?ISO 8859-2錛屽笇鑵婄敤鎴蜂嬌鐢?ISO 8859-7錛屼縿緗楁柉鐢ㄦ埛浣跨敤 KOI-8 / ISO 8859-5/CP1251錛堣タ閲屽皵瀛楁瘝錛夈傝繖浣垮緱鏁版嵁浜ゆ崲鍑虹幇浜嗗緢澶氶棶棰橈紝騫朵笖闇瑕佷負榪欎簺緙栫爜涔嬮棿鐨勫樊寮傜紪鍐欏簲鐢ㄨ蔣浠躲傝繖縐嶈璦鏀寔鏄笉瀹屽杽鐨勶紝鑰屼笖鏁版嵁浜ゆ崲娌℃湁緇忚繃嫻嬭瘯銆侺inux 涓昏鐨勫彂琛屽晢鍜屽簲鐢ㄧ▼搴忓紑鍙戣呮鑷村姏浜庤涓昏浠?UTF-8 鏍煎紡琛ㄧず鐨?Unicode 鎴愪負 Linux 涓殑鏍囧噯銆?/p>
涓轟簡璇嗗埆 Unicode 鏂囦歡錛孧icrosoft 寤鴻鎵鏈夌殑 Unicode 鏂囦歡搴旇浠?ZERO WIDTH NOBREAK SPACE錛圲+FEFF錛夊瓧絎﹀紑澶淬傝繖浣滀負涓涓滅壒寰佺鈥濇垨鈥滃瓧鑺傞『搴忔爣璁幫紙byte-order mark錛孊OM錛夆濇潵璇嗗埆鏂囦歡涓嬌鐢ㄧ殑緙栫爜鍜屽瓧鑺傞『搴忋備絾鏄紝Linux/UNIX 騫舵病鏈変嬌鐢?BOM錛屽洜涓哄畠浼氱牬鍧忕幇鏈夌殑 ASCII 鏂囦歡鐨勮娉曠害瀹氥傚湪 POSIX 緋葷粺涓紝閫変腑鐨勮璦鐜璇嗗埆浜嗗湪涓涓繃紼嬩腑鐨勬墍鏈夎緭鍏ヨ緭鍑烘枃浠舵湡鏈涚殑緙栫爜褰㈠紡銆?/p>
鏈変袱縐嶆柟娉曞彲浠ュ皢 UTF-8 鏀寔娣誨姞鍒?Linux 搴旂敤紼嬪簭涓傜涓縐嶆柟娉曪紝鏁版嵁閮戒互 UTF-8 褰㈠紡瀛樻斁鍦ㄥ悇澶勶紝榪欐牱杞歡鏀瑰姩寰堝皯錛堣鍔ㄧ殑錛夈傚彟涓縐嶆柟娉曪紝琚鍙栫殑 UTF-8 鏁版嵁鐢ㄦ爣鍑嗙殑 C 璇█搴撳嚱鏁拌漿鍙樻垚涓哄瀛楃鏁扮粍錛堣漿鎹㈢殑錛夈傚湪杈撳嚭鏃訛紝鐢ㄥ嚱鏁?
wcsrtombs()
浣垮瓧絎︿覆琚漿鍙樺洖 UTF-8錛?
娓呭崟 1. wcsrtombs()
#include <wchar.h> size_t wcsrtombs (char *dest, const wchar_t **src, size_t len, mbstate_t *ps);
鏂規硶鐨勯夋嫨鍙栧喅浜庡簲鐢ㄧ▼搴忕殑鎬ц川銆傚ぇ澶氭暟搴旂敤紼嬪簭鍙互浣跨敤琚姩鐨勬柟娉曟搷浣溿傝繖灝辨槸鍦?UNIX 騫沖彴涓婁嬌鐢?UTF-8 浼氬姝ゆ祦琛岀殑鍘熷洜銆傚儚
cat
鍜?echo
閭f牱鐨勭▼搴忓氨涓嶉渶瑕佷慨鏀廣傚瓧鑺傛祦浠嶅彧鏄瓧鑺傛祦錛屽茍娌℃湁瀵瑰畠榪涜浠諱綍澶勭悊銆侫SCII 瀛楃鍜屾帶鍒朵唬鐮佸湪 UTF-8 璇█鐜涓笉鏀瑰彉銆?閫氳繃瀛楄妭璁℃暟瀵瑰瓧絎﹁繘琛岃鏁扮殑紼嬪簭闇瑕佷竴浜涘皬灝忕殑鏀瑰姩銆傚湪 UTF-8 涓簲鐢ㄧ▼搴忎笉瀵逛換浣曟墿灞曠殑瀛楄妭榪涜璁℃暟銆傚鏋滈夋嫨浜?UTF-8 璇█鐜錛孋 璇█搴撶殑
strlen(s)
鍑芥暟闇瑕佺敤mbstowcs()
鍑芥暟鏉ヤ唬鏇匡細
娓呭崟 2. mbstowcs() 鍑芥暟
#include <stdlib.h> size_t mbstowcs(wchar_t *pwcs, const char *s, size_t n);
strlen
鐨勪竴縐嶅父瑙佺敤娉曟槸浼扮畻鏄劇ず瀹藉害銆備腑鏂囧拰鍏跺畠琛ㄦ剰絎﹀彿灝嗗崰鐢ㄤ袱鍒椾綅緗?wcwidth()
鍑芥暟鐢ㄦ潵嫻嬭瘯姣忎釜瀛楃鐨勬樉紺哄搴︼細
娓呭崟 3. wcwidth() 鍑芥暟
#include < wchar.h> int wcwidth(wchar_t wc);
![]()
![]()
![]()
![]()
鍥為〉棣?/font>
鍦ㄦ寮忔儏鍐典笅錛屼粠 GNU glibc 2.2 寮濮嬶紝wchar_t 綾誨瀷鍙負 32 浣嶇殑 ISO 10646 鏍煎紡鏁板兼墍鐗瑰畾浣跨敤錛屼笌褰撳墠浣跨敤鐨勮璦鐜鏃犲叧銆傞氳繃 ISO C99 鎵瑕佹眰鐨?__STDC_ISO_10646__ 瀹忕殑瀹氫箟浣滀負淇″彿閫氱煡搴旂敤紼嬪簭銆?__STDC_ISO_10646__ 鐨勫畾涔夌敤鏉ユ寚鍑?wchar_t 鏄?Unicode銆傜簿紜殑鍊兼槸涓涓崄榪涘埗鐨?yyyymmL 鏍煎紡鐨勫父鏁般備緥濡傦紝浣跨敤錛?/p>
娓呭崟 4. 鎸囧嚭 wchar_t 鏄?Unicode
#define __STDC_ISO_10646__ 200104L
鏄負鎸囧嚭 wchar_t 綾誨瀷鐨勫兼槸鐢?ISO/IEC 10646 鍜屽埌鎸囧畾鐨勫勾鏈堜負姝㈢殑鎵鏈変慨姝d笌鎶鏈嫎璇畾涔夌殑瀛楃緙栫爜琛ㄧず銆?/p>
瀵?wchar_t 鐨勫埄鐢ㄥ榪欎釜紺轟緥鎵紺猴紝浣跨敤瀹忕‘瀹氬湪 ISO C99 鍙Щ妞嶄唬鐮佷腑鍐欏弻寮曞彿鐨勬柟娉曘?/p>
娓呭崟 5. 紜畾鍐欏弻寮曞彿鐨勬柟娉?/b>
#if __STDC_ISO_10646__ printf("%lc", 0x201c); #else putchar('"'); #fi
嬋媧?UTF-8 鐨勬伆褰撶殑鍔炴硶鏄?POSIX 璇█鐜鏈哄埗銆傝璦鐜鏄竴縐嶅寘鍚湁鍏寵蔣浠惰涓虹壒瀹氭枃鍖栫害瀹氱殑閰嶇疆璁懼畾銆傚畠鍖呭惈浜嗗瓧絎︾紪鐮併佹棩鏈燂紡鏃墮棿絎﹀彿銆佸垎綾昏鍒欎互鍙婂害閲忕郴緇熴傝璦鐜鐨勫悕縐伴氬父鐢?ISO 639-1 璇█銆両SO 3166-1 鍥藉鎴栧湴鍖轟唬鐮佷互鍙婂彲閫夌殑緙栫爜鍚嶇О鍜屽叾瀹冮檺瀹氱緇勬垚銆傛偍鍙互鐢ㄥ懡浠?
locale -a
鑾峰彇鎵鏈夊畨瑁呭湪緋葷粺涓婄殑璇█鐜鍒楄〃錛堥氬父鍦?/usr/lib/locale/錛夈?濡傛灉娌℃湁棰勫畨瑁?UTF-8 璇█鐜錛屼綘鍙互鐢?
localedef
鍛戒護鐢熸垚瀹冦傝嫢瑕佷負鏌愪釜鐗瑰畾鐢ㄦ埛鐢熸垚騫舵縺媧諱竴涓痙璇殑 UTF-8 璇█鐜錛岃浣跨敤濡備笅璇彞錛?
娓呭崟 6. 涓虹壒瀹氱敤鎴風敓鎴愯璦鐜
localedef -v -c -i de_DE -f UTF-8 $HOME/local/locale/de_DE.UTF-8 export LOCPATH=$HOME/local/locale export LANG=de_DE.UTF-8
鏈夋椂鍊欎負鎵鏈夌敤鎴鋒坊鍔?UTF-8 璇█鐜浼氬緢鏈夌敤銆俽oot 鐢ㄦ埛浣跨敤濡備笅鎸囦護灝卞彲浠ュ畬鎴愶細
娓呭崟 7. 涓烘瘡涓敤鎴風敓鎴愯璦鐜
localedef -v -c -i de_DE -f UTF-8 /usr/share/locale/de_DE.UTF-8
鑻ヨ涓烘瘡涓敤鎴峰皢榪欎釜璇█鐜璁句負緙虹渷鍊鹼紝鍙互灝嗕互涓嬭娣誨姞鍒?/etc/profile 鏂囦歡涓細
娓呭崟 8. 涓烘墍鏈夌敤鎴瘋緗己鐪佺殑璇█鐜
export LANG=de_DE.UTF-8
澶勭悊澶氬瓧鑺傚瓧絎︿唬鐮佸簭鍒楃殑鍑芥暟琛屼負渚濊禆浜庡綋鍓嶈璦鐜鐨?LC_CTYPE 綾誨埆錛涘畠紜畾浜嗕緷璧栬璦鐜鐨勫瀛楄妭緙栫爜銆傚?LANG=de_DE錛堝痙璇級浼氬鑷磋緭鍑烘寜 ISO 8859-1 琚牸寮忓寲銆傚?LANG=de_DE.UTF-8 浼氭妸杈撳嚭鏍煎紡鍖栨垚 UTF-8銆傝璦鐜璁劇疆浼氬鑷?
printf
涓殑%ls
鏍煎紡璇存槑絎﹁皟鐢?wcsrtombs()
鍑芥暟浠ヤ究浜庡皢瀹藉瓧絎︾殑鍙傛暟瀛楃涓茶漿鎹㈡垚渚濊禆璇█鐜鐨勫瀛楄妭緙栫爜銆傝璦鐜涓殑鍥藉鎴栧湴鍖烘爣璇嗙濡傦細LC_CTYPE= en_GB 錛堣嫳鍥借嫳璇級鍜?LC_CTYPE= en_AU錛堟境澶у埄浜氳嫳璇級錛屽畠浠箣闂寸殑宸紓鍙湪 LC_MONETARY 綾誨埆涓紝鍘熷洜鍦ㄤ簬璐у竵鐨勫悕縐板拰鎵撳嵃璐у竵鏁伴噺鐨勮鍒欎笉鍚屻?璇風粰鎮ㄩ閫夌殑璇█鐜璁劇疆鐜鍙橀噺 LANG銆傚綋涓涓?C 紼嬪簭鎵ц
setlocale()
鍑芥暟鏃訛細
娓呭崟 9. setlocale() 鍑芥暟
#include <stdio.h> #include <locale.h> //char *setlocale(int category, const char *locale); int main() { if (!setlocale(LC_CTYPE, "")) { fprintf(stderr, "Locale not specified. Check LANG, LC_CTYPE, LC_ALL. "); return 1; }
C 璇█搴撳皢浼氫緷嬈℃祴璇曠幆澧冨彉閲?LC_ALL銆丩C_CTYPE 鍜?LANG銆傚叾涓涓涓惈鍊肩殑鐜鍙橀噺灝嗗喅瀹氫負 LC_CTYPE 綾誨埆瑁呭叆鍝璇█鐜鏁版嵁銆傝璦鐜鏁版嵁鍒嗚鎴愮嫭绔嬬殑綾誨埆銆傚?LC_CTYPE 瀹氫箟浜嗗瓧絎︾紪鐮侊紝鑰?LC_COLLATE 瀹氫箟浜嗘帓搴忛『搴忋傛垜浠敤 LANG 鐜鍙橀噺涓烘墍鏈夌被鍒緗己鐪佽璦鐜錛屼絾 LC_* 鍙橀噺鍙互鐢ㄦ潵瑕嗙洊鍗曚釜綾誨埆銆?/p>
鎮ㄥ彲浠ョ敤鍛戒護
locale charmap
鏌ヨ褰撳墠璇█鐜涓瓧絎︾紪鐮佺殑鍚嶇О銆傚鏋滄偍浠?LC_CTYPE 綾誨埆涓垚鍔熼夊彇浜?UTF-8 璇█鐜錛屼細杈撳嚭 UTF-8銆傚懡浠?locale -m
鎻愪緵涓寮犲凡瀹夎鐨勬墍鏈夊瓧絎︾紪鐮佸悕縐扮殑鍒楄〃銆?濡傛灉鎮ㄤ嬌鐢ㄤ笓闂ㄧ殑 C 璇█搴撶殑澶氬瓧鑺傚嚱鏁版潵瀹屾垚鎵鏈夊閮ㄥ瓧絎︾紪鐮佸拰鍐呴儴浣跨敤鐨?wchar_t 緙栫爜涔嬮棿鐨勮漿鎹紝閭d箞 C 璇█搴撳皢鎵挎媴璐d換錛屾牴鎹?LC_CTYPE 浣跨敤姝g‘鐨勭紪鐮佹柟寮忋傝繖鐢氳嚦涓嶉渶瑕佺▼搴忚鏄庣‘鐨勭紪鐮佹垚褰撳墠鐨勫瀛楄妭緙栫爜銆?/p>
濡傛灉闇瑕佷竴涓簲鐢ㄧ▼搴忚兘鏄庣‘鐨勬敮鎸?UTF-8錛堟垨鍏跺畠緙栫爜錛夎漿鎹㈡柟娉曡屼笉鐢?libc 澶氬瓧鑺傚嚱鏁幫紝鍒欏簲鐢ㄧ▼搴忓繀欏葷‘瀹氭槸鍚﹂渶瑕佹縺媧?UTF-8 妯″紡銆傚甫鏈?<langinfo.h> 搴撳ご鏂囦歡鐨勪笌 X/Open 鍏煎緋葷粺鍙互鐢ㄥ涓嬩唬鐮侊細
娓呭崟 10. 媯嫻嬪綋鍓嶇殑璇█鐜鏄惁浣跨敤浜?UTF-8 緙栫爜
BOOL utf8_mode = FALSE; if( ! strcmp(nl_langinfo(CODESET), "UTF-8") utf8_mode = TRUE;
涓烘嫻嬪綋鍓嶈璦鐜鏄惁浣跨敤浜?UTF-8 緙栫爜銆傞鍏堝繀欏昏皟鐢?
setlocale(LC_CTYPE, "")
鍑芥暟錛屼緷鎹幆澧冨彉閲忚緗璦鐜銆俷l_langinfo(CODESET) 鍑芥暟涔熸槸鐢?locale charmap
鍛戒護璋冪敤錛屼粠鑰屾煡鎵懼綋鍓嶈璦鐜鎸囧畾鐨勭紪鐮佸悕縐般?鍙︿竴縐嶅彲浠ヤ嬌鐢ㄧ殑鏂規硶鏄煡璇㈣璦鐜鍙橀噺錛?/p>
娓呭崟 11. 鏌ヨ璇█鐜鍙橀噺
char *s; BOOL utf8_mode = FALSE; if ((s = getenv("LC_ALL")) || (s = getenv("LC_CTYPE")) || (s = getenv ("LANG"))) { if (strstr(s, "UTF-8")) utf8_mode = TRUE; }
榪欓」嫻嬭瘯鍋囪 UTF-8 璇█鐜鍚嶇О涓湁鍊尖淯TF-8鈥濓紝浣嗗疄闄呮儏鍐靛茍涓嶆繪槸濡傛錛屾墍浠ュ簲璇ヤ嬌鐢?
nl_langinfo()
鏂規硶銆?
![]()
![]()
![]()
![]()
鍥為〉棣?/font>
涓烘敮鎸佷笘鐣屼笂鐨勬墍鏈夎璦錛岄渶瑕佷竴縐嶅叿鏈夊叓浣嶅瓧鑺傚瓧絎︾紪鐮佺瓥鐣ョ殑瀛楃緙栫爜緋葷粺錛屽畠鐨勫瓧絎﹀簲澶氫簬 ASCII錛堜竴縐嶄嬌鐢ㄦ棤絎﹀彿瀛楄妭鐨勬墿灞曠増鏈級鐨?2^8 = 256 涓瓧絎︺俇nicode 灝辨槸榪欐牱涓縐嶅瓧絎︾紪鐮佺郴緇燂紝瀹冨叿鏈夌敱 128 涓笁緇寸粍錛堝甫鏈夌敱澶ч噺瀛楃緙栫爜鏂規鐨勬柟娉曟敮鎸佺殑 94,140 涓畾涔夊ソ鐨勫瓧絎﹀鹼級緇勬垚鐨勫洓緇寸紪鐮佺┖闂達紝鍦?Linux 涓洿嫻佽鐨勫瓧絎︾紪鐮佹柟妗堟槸 Unicode 杞崲鏍煎紡 UTF-8銆?/p>
- 鎮ㄥ彲浠ュ弬闃呮湰鏂囧湪 developerWorks 鍏ㄧ悆绔欑偣涓婄殑 鑻辨枃鍘熸枃.
- 璇瘋闂?Unicode 鑱旂洘鐨?Unicode 涓婚〉錛岃繖閲屽畾涔変簡 Unicode 瀛楃涔嬮棿鐨勮涓哄拰鍏崇郴錛屽茍涓哄疄鐜拌呮彁渚涗簡鎶鏈俊鎭?
- 鍥介檯鏍囧噯緇勭粐錛圛nternational Organization for Standardization錛孖SO錛?/font> 鏄竴涓敱 140 涓浗瀹剁粍鎴愮殑鍏ㄧ悆鎬х殑鍥藉鏍囧噯紺懼洟鑱旂洘銆?
- ANSI 鏄釜縐佹湁鐨勩侀潪钀ュ埄緇勭粐錛屽畠綆$悊騫惰皟鏁?U.S. 鐨勫織鎰挎爣鍑嗗寲浠ュ強涓鑷存ц瘎浠風郴緇熴?
- ISO C99 Draft 錛圓crobat PDF 鏍煎紡錛?56 欏碉級錛屾槸鏂扮殑 C 璇█鏍囧噯錛屾潵鑷?Calgary 澶у Ben 鐨?C 緙栫▼璇劇▼銆?
- 璇烽槄璇?Roman Czyborra 鐨?Unix 鐜涓嬬殑 Unicode銆?
- 璇烽槄璇?IANA錛圛nternet Assigned Numbers Authority錛?/font>涓殑 IANA Charset Registration Procedures銆?
- 璇峰弬闃?Virginia 澶у鍥句功棣?Robertson Media 涓績鐨?Unicode Music Symbols銆?
- 璇風湅鐪?graphic representation of the Roadmap to the BMP, Plane 0 of the UCS銆傝繖浜涜〃鍖呭惈浜嗙敱 0 鍙鳳紝涔熷氨鏄氱敤瀛楃闆嗭紙Universal Character Set錛孶CS錛夌殑鍩烘湰澶氳璦騫抽潰錛圔asic Multilingual Plane錛孊MP錛夊疄闄呭ぇ灝忕殑鏄犲皠緇勬垚鐨勩侲verson Gunn Teoranta 鏄竴涓嚜 1990 騫村紑鍔炵殑鏀寔灝戞暟姘戞棌璇█鍥綋鐨勮蔣浠跺拰鍑虹増鍏徃錛岀敱 Michael Everson 鍜?Marion Gunn 鍏卞悓寤虹珛銆?
- 璇鋒祻瑙?UTF-8 and Unicode FAQ for UNIX/Linux錛孧arkus Kuhn 鐨勭患鍚堟х殑 one-stop 淇℃伅璧勬簮錛屽叧浜庢偍濡備綍鍦?POSIX 緋葷粺錛圠inux錛孶NIX錛変嬌鐢?Unicode/UTF-8銆?
- 璇鋒鏌?Technology Appraisals Ltd 鐨?Solution Given by the Universal Character Set錛屽叾涓彁渚涗簡鐙珛鐨勩侀珮璐ㄩ噺鐨勬湁鍏崇數瀛愬晢鍔$郴緇熴佺數瀛愪俊鎭紶閫掋乆ML銆佺綉緇滃拰 IT 瀹夊叏鐨勪俊鎭佹暀鑲插拰鍩硅銆?
- 璇烽槄璇?Mulberry Technologies, Inc 鐨?Unicode presentation titled鈥?0646 and All That鈥?/font>錛屼竴涓笓鏀誨熀浜?SGML 鍜?XML 緋葷粺鐨勭數瀛愬嚭鐗堢墿鐨勫挩璇㈠叕鍙搞?
- 璇峰挩璇?Linux 紼嬪簭鍛樻墜鍐屼笂鐨?UTF-8 鈥?an ASCII compatible multi-byte Unicode encoding銆?
- 璇烽槄璇?Unicode Standard Annex#15 Unicode Normalization Forms錛屼竴綃囨弿鍐欎簡鍥涚 Unicode 鏂囨湰鏍囧噯鍖栨牸寮忚鑼冪殑鏂囨。銆傛湁浜嗚繖浜涙牸寮忥紝絳変環鐨勶紙瑙勮寖鎴栨槸鍏煎鐨勶級鏂囨湰灝嗕細鏈夊悓鏍風殑浜岃繘鍒惰〃寮忋傚綋瀹炵幇宸ュ叿鍦ㄦ爣鍑嗗寲鐨勬牸寮忎腑淇濈暀浜嗕竴涓瓧絎︿覆錛屽彲浠ョ‘淇濇湁涓涓互浜岃繘鍒跺艦寮忚〃鐜扮殑鐙竴鏃犱簩鐨勭瓑浠峰瓧絎︿覆銆?
- 璇烽槄璇?man-pages.net 涓婄殑
mbstowcs
錛屽畠鎶婂瀛楄妭瀛楃涓茶漿鎹㈡垚浜嗗瀛楃鐨勫瓧絎︿覆錛宮an-pages.net 涓?Linux 鎵嬪唽欏甸潰鎻愪緵浜嗘案涔呯殑鍩轟簬 Web 鐨勫綊妗f枃浠躲?- 璇烽槄璇?Hewlett Packard 鐨勫紑鍙戣呰祫婧愮珯鐐圭殑 Linux 紼嬪簭鍛樻墜鍐屼笂鐨?
wcsrtombs
錛屽畠鑳藉皢瀹藉瓧絎︾殑瀛楃涓茶漿鍖栦負澶氬瓧鑺傚瓧絎︿覆銆?- 璇烽槄璇?MKS 宸ュ叿綆辨枃妗d腑鐨?
setlocale()
錛屽畠鑳芥敼鍙樻垨鏌ヨ璇█鐜銆侻KS 杞歡鍏徃鏄湪 Windows 鐜鎴栨販鍚?UNIX/Linux 鍜?Windows 鐜涓敤浜庣郴緇熺鐞嗗拰寮鍙戠殑 Windows 鑷姩鍖栧伐鍏風殑棰嗗厛渚涘簲鍟嗐?- 璇峰涔?IBM Classes for Unicode (ICU)錛屼竴涓?C 璇█鍜?C++ 璇█搴擄紝瀹冨湪璁稿騫沖彴涓婃彁渚涗簡鍋ュ.鐨勫拰鍔熻兘瀹屽杽鐨?Unicode 鏀寔銆?
- 璇峰弬闃?IBM 鐨?鈥淚ntroduction to Unicode鈥濈珯鐐?/font>錛岃繖閲屾繁鍏ユ兜鐩栦簡 Unicode 鍩虹鐭ヨ瘑銆?
- 鍦?IBM 鐨勫叧浜庢柊鍏存妧鏈殑 alphaWorks绔欑偣 銆傝鍙傞槄錛?
- UnicodeCompressor錛岃繖閲屾彁渚涗簡浣跨敤鏍囧噯 Unicode 鍘嬬緝鏂規鐨勫帇緙╁拰瑙e帇緙?Unicode 鏂囨湰鐨勫伐鍏?
- Unicode Normalizer錛屼負瀹炵幇蹇熸帓搴忓拰鎼滅儲灝?Java 瀛楃涓插璞¤漿鎹負鏍囧噯 Unicode 鏍煎紡銆?
- 璇烽槄璇?TW Burger 鎾板啓鐨?鈥淐yrillic in Unicode鈥?/font>鍜?Jim Melnick 鎾板啓鐨?鈥淢ultilingual forms in Unicode鈥?/font>錛屼篃鍦?developerWorks涓娿?
- 璇峰湪 developerWorks涓婃祻瑙?鏇村 Linux 鍙傝冭祫鏂?/font>銆?
![]()
![]()
TW Burger 浠?1979 騫磋搗鏇劇粡鍋氳繃緙栫▼銆佽鎺堜腑絳夎綆楁満璇劇▼浠ュ強鎾板啓鏈夊叧璁$畻鏈烘妧鏈柟闈㈢殑涔︺備粬姝e湪緇忚惀涓涓俊鎭妧鏈挩璇㈠叕鍙搞傛偍鍙互閫氳繃 twburger@bigfoot.com 涓庝粬鑱旂郴銆?
]]>