locale错误导致Java中文乱码错误的总结
- - Java - 编程语言 - ITeye博客线上执行MapReduce任务计算时,经过排查发现了某些服务器计算的数据出现中文乱码问题,但是服务器的配置是完全一致的. 由于我们使用的key可能包含中文,中文乱码问题体现在每次合并map记录的时候计算数据的随机性,每次执行的结果都不一样(由于Map任务执分配的随机性). (注:此文章大部分都参考了同事查找到的问题解决方法.
file.encoding = ANSI_X3.4-1968 sun.jnu.encoding = ANSI_X3.4-1968
sun.jnu.encoding = UTF-8 file.encoding = UTF-8
java -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 ${mainClass}
LANG="zh_CN.UTF-8" LC_COLLATE="zh_CN.UTF-8" LC_CTYPE="zh_CN.UTF-8" LC_MESSAGES="zh_CN.UTF-8" LC_MONETARY="zh_CN.UTF-8" LC_NUMERIC="zh_CN.UTF-8" LC_TIME="zh_CN.UTF-8" LC_ALL=
“Mac OSX uses a special kind of decomposed UTF-8 to store filenames. If you need to read in filenames and write them to a ‘normal’ UTF-8 file, you must normalize them. My understanding of this is that when you pass a name with an accented character like é, it will decompose this into e plus ’ before saving it to the filesystem (this behavior is defined by the Unicode standard).”
LC_CTYPE=UTF-8
export LANG=en_US.UTF-8 export LC_CTYPE=en_US.UTF-8 export LC_NUMERIC=en_US.UTF-8 export LC_TIME=en_US.UTF-8 export LC_COLLATE=en_US.UTF-8 export LC_MONETARY=en_US.UTF-8 export LC_MESSAGES=en_US.UTF-8 export LC_PAPER=en_US.UTF-8 export LC_NAME=en_US.UTF-8 export LC_ADDRESS=en_US.UTF-8 export LC_TELEPHONE=en_US.UTF-8 export LC_MEASUREMENT=en_US.UTF-8 export LC_IDENTIFICATION=en_US.UTF-8 export LC_ALL=en_US.UTF-8