Java的正则表达式匹配汉字
- Macyou - Script Ahead, Code Behind刚才有同事问,顺便发出来记一下. 以前读《精通正则表达式》的时候已经知道Java的正则表达式支持\p{Inname}的形式来指定匹配属于name的Unicode区块的字符. JDK的Pattern类的文档上也有相关例子,\p{InGreek},表示匹配一个属于名为“Greek”的Unicode区块的字符.
public class TestUnicodeBlock { public static void main(String[] args) { final String regex = "\\p{InCJK Unified Ideographs}"; final String[] testChars = { "\u6211", // a Chinese character "\u300A", // a full-width punctuation "\u3042", // a Hiragana character "\uD55C" // a Hangul character }; for (String s : testChars) { System.out.printf("matched %s: %b\n", s, s.matches(regex)); } } }