Java的正则表达式匹配汉字
- Macyou - Script Ahead, Code Behind刚才有同事问,顺便发出来记一下. 以前读《精通正则表达式》的时候已经知道Java的正则表达式支持\p{Inname}的形式来指定匹配属于name的Unicode区块的字符. JDK的Pattern类的文档上也有相关例子,\p{InGreek},表示匹配一个属于名为“Greek”的Unicode区块的字符.
public class TestUnicodeBlock {
public static void main(String[] args) {
final String regex = "\\p{InCJK Unified Ideographs}";
final String[] testChars = {
"\u6211", // a Chinese character
"\u300A", // a full-width punctuation
"\u3042", // a Hiragana character
"\uD55C" // a Hangul character
};
for (String s : testChars) {
System.out.printf("matched %s: %b\n", s, s.matches(regex));
}
}
}