Java读取UTF-8/UNICODE等字符编码格式的文本文件
Java要读取数据流的时候一定要指定数据流的编码方式,至少读取String流的时候要这么做。
Java读取UTF-8/UNICODE等特定字符编码格式文件时,应该要指定相应的编码读取,如UTF-8,UTF-16,UNICODE,GBK,GB2312,
ISO-8859-1,Big5等。
如下示例代码:InputStreamReader read = new InputStreamReader (new FileInputStream(f),"UTF-8");//或者UNICODE,UTF-16
BufferedReader reader=new BufferedReader(read);String line;while ((line = reader.readLine()) != null) {System.out.println(line);}
reader.close();
read.close();
而下列代码可以将GB2312文件转换成UTF-8文件
import java.io.*;public class inputtest {
public static void main(String[] args) {
String outfile = null;try { convert(args[0], args[1], "GB2312", "UTF8"); } // or "BIG5"
catch (Exception e) {
System.out.print(e.getMessage());
System.exit(1);
}
}public static void convert(String infile, String outfile, String from, String to)
throws IOException, UnsupportedEncodingException
{
// set up byte streams
InputStream in;
if (infile != null) in = new FileInputStream(infile);
else in = System.in;
OutputStream out;
if (outfile != null) out = new FileOutputStream(outfile);
else out = System.out;// Use default encoding if no encoding is specified.
if (from == null) from = System.getProperty("file.encoding");
if (to == null) to = System.getProperty("file.encoding");// Set up character stream
Reader r = new BufferedReader(new InputStreamReader(in, from));
Writer w = new BufferedWriter(new OutputStreamWriter(out, to));// Copy characters from input to output. The InputStreamReader
// converts from the input encoding to Unicode,, and the OutputStreamWriter
// converts from Unicode to the output encoding. Characters that cannot be
// represented in the output encoding are output as '?'
char[] buffer = new char[4096];
int len;
while((len = r.read(buffer)) != -1)
w.write(buffer, 0, len);
r.close();
w.flush();
w.close();
}}