字符集是非英语国家人最头疼的事情,尤其是样样有国标的中国。所以本朝的码农比洋大人程序员学各种技能都要多会一个技能点——应付编码问题。
NIO我们同样需要面对编码解码问题。
* 六、字符集:CharSet * 编码:字符串 -> 字节数组 * 解码:字节数组 -> 字符串
有哪些编码呢?
@Test public void test5(){ SortedMapavailableCharsets = Charset.availableCharsets(); for(Entry entry:availableCharsets.entrySet()){ System.out.println(String.format("%s: %s", entry.getKey(), entry.getValue())); } }
输出了NIO支持的各种编码:
Big5: Big5Big5-HKSCS: Big5-HKSCSCESU-8: CESU-8EUC-JP: EUC-JPEUC-KR: EUC-KRGB18030: GB18030GB2312: GB2312GBK: GBKIBM-Thai: IBM-ThaiIBM00858: IBM00858IBM01140: IBM01140IBM01141: IBM01141IBM01142: IBM01142IBM01143: IBM01143IBM01144: IBM01144IBM01145: IBM01145IBM01146: IBM01146IBM01147: IBM01147IBM01148: IBM01148IBM01149: IBM01149IBM037: IBM037IBM1026: IBM1026IBM1047: IBM1047IBM273: IBM273IBM277: IBM277IBM278: IBM278IBM280: IBM280IBM284: IBM284IBM285: IBM285IBM290: IBM290IBM297: IBM297IBM420: IBM420IBM424: IBM424IBM437: IBM437IBM500: IBM500IBM775: IBM775IBM850: IBM850IBM852: IBM852IBM855: IBM855IBM857: IBM857IBM860: IBM860IBM861: IBM861IBM862: IBM862IBM863: IBM863IBM864: IBM864IBM865: IBM865IBM866: IBM866IBM868: IBM868IBM869: IBM869IBM870: IBM870IBM871: IBM871IBM918: IBM918ISO-2022-CN: ISO-2022-CNISO-2022-JP: ISO-2022-JPISO-2022-JP-2: ISO-2022-JP-2ISO-2022-KR: ISO-2022-KRISO-8859-1: ISO-8859-1ISO-8859-13: ISO-8859-13ISO-8859-15: ISO-8859-15ISO-8859-2: ISO-8859-2ISO-8859-3: ISO-8859-3ISO-8859-4: ISO-8859-4ISO-8859-5: ISO-8859-5ISO-8859-6: ISO-8859-6ISO-8859-7: ISO-8859-7ISO-8859-8: ISO-8859-8ISO-8859-9: ISO-8859-9JIS_X0201: JIS_X0201JIS_X0212-1990: JIS_X0212-1990KOI8-R: KOI8-RKOI8-U: KOI8-UShift_JIS: Shift_JISTIS-620: TIS-620US-ASCII: US-ASCIIUTF-16: UTF-16UTF-16BE: UTF-16BEUTF-16LE: UTF-16LEUTF-32: UTF-32UTF-32BE: UTF-32BEUTF-32LE: UTF-32LEUTF-8: UTF-8windows-1250: windows-1250windows-1251: windows-1251windows-1252: windows-1252windows-1253: windows-1253windows-1254: windows-1254windows-1255: windows-1255windows-1256: windows-1256windows-1257: windows-1257windows-1258: windows-1258windows-31j: windows-31jx-Big5-HKSCS-2001: x-Big5-HKSCS-2001x-Big5-Solaris: x-Big5-Solarisx-euc-jp-linux: x-euc-jp-linuxx-EUC-TW: x-EUC-TWx-eucJP-Open: x-eucJP-Openx-IBM1006: x-IBM1006x-IBM1025: x-IBM1025x-IBM1046: x-IBM1046x-IBM1097: x-IBM1097x-IBM1098: x-IBM1098x-IBM1112: x-IBM1112x-IBM1122: x-IBM1122x-IBM1123: x-IBM1123x-IBM1124: x-IBM1124x-IBM1166: x-IBM1166x-IBM1364: x-IBM1364x-IBM1381: x-IBM1381x-IBM1383: x-IBM1383x-IBM300: x-IBM300x-IBM33722: x-IBM33722x-IBM737: x-IBM737x-IBM833: x-IBM833x-IBM834: x-IBM834x-IBM856: x-IBM856x-IBM874: x-IBM874x-IBM875: x-IBM875x-IBM921: x-IBM921x-IBM922: x-IBM922x-IBM930: x-IBM930x-IBM933: x-IBM933x-IBM935: x-IBM935x-IBM937: x-IBM937x-IBM939: x-IBM939x-IBM942: x-IBM942x-IBM942C: x-IBM942Cx-IBM943: x-IBM943x-IBM943C: x-IBM943Cx-IBM948: x-IBM948x-IBM949: x-IBM949x-IBM949C: x-IBM949Cx-IBM950: x-IBM950x-IBM964: x-IBM964x-IBM970: x-IBM970x-ISCII91: x-ISCII91x-ISO-2022-CN-CNS: x-ISO-2022-CN-CNSx-ISO-2022-CN-GB: x-ISO-2022-CN-GBx-iso-8859-11: x-iso-8859-11x-JIS0208: x-JIS0208x-JISAutoDetect: x-JISAutoDetectx-Johab: x-Johabx-MacArabic: x-MacArabicx-MacCentralEurope: x-MacCentralEuropex-MacCroatian: x-MacCroatianx-MacCyrillic: x-MacCyrillicx-MacDingbat: x-MacDingbatx-MacGreek: x-MacGreekx-MacHebrew: x-MacHebrewx-MacIceland: x-MacIcelandx-MacRoman: x-MacRomanx-MacRomania: x-MacRomaniax-MacSymbol: x-MacSymbolx-MacThai: x-MacThaix-MacTurkish: x-MacTurkishx-MacUkraine: x-MacUkrainex-MS932_0213: x-MS932_0213x-MS950-HKSCS: x-MS950-HKSCSx-MS950-HKSCS-XP: x-MS950-HKSCS-XPx-mswin-936: x-mswin-936x-PCK: x-PCKx-SJIS_0213: x-SJIS_0213x-UTF-16LE-BOM: x-UTF-16LE-BOMX-UTF-32BE-BOM: X-UTF-32BE-BOMX-UTF-32LE-BOM: X-UTF-32LE-BOMx-windows-50220: x-windows-50220x-windows-50221: x-windows-50221x-windows-874: x-windows-874x-windows-949: x-windows-949x-windows-950: x-windows-950x-windows-iso2022jp: x-windows-iso2022jp
如何编解码
方法是用Charset.forName(String)构造一个编码器或解码器,利用编码器和解码器来对CharBuffer编码,对ByteBuffer解码。
但是请注意,在对CharBuffer编码之前、对ByteBuffer解码之前,请记得对CharBuffer、ByteBuffer进行flip()切换到读模式,否则什么都没有。
如果编码和解码的格式不同,则会出现乱码。
@Test public void test6() throws CharacterCodingException{ Charset charset1 = Charset.forName("GBK"); //获取编码器 CharsetEncoder encoder = charset1.newEncoder(); //获取解码器 CharsetDecoder decoder = charset1.newDecoder(); CharBuffer charBuffer = CharBuffer.allocate(1024); charBuffer.put("happyBKs的博客"); //编码 charBuffer.flip();//因为编码要读取charBuffer,所以要先切到度模式 ByteBuffer byteBuffer=encoder.encode(charBuffer); //byteBuffer.limit()为14,英文字符一个1 byte,中文字符一个2 byte for(int i_byteBuffer=0;i_byteBuffer
输出结果:
104971121121216675115-75-60-78-87-65-51happyBKs的博客---------------------------------------------------happyBKs�IJ���---------------------------------------------------happyBKs的博客
所以我们在以后对文件系统进行NIO编程时,如果出现问题,原因可以这样归类:
如果结果为空 ,那么问题时在编码或解码之前没有将缓冲区切换到读模式。
如果结果又乱码,那么是编码器或者解码器出现差错;也有可能是只输出了一部分在缓冲区,多字节字符被截断造成的。
以上及之前介绍的主要是针对文件系统的,从下一篇开始进入正餐——关于网络的NIO编程。