<< Java读取UTF-8/UNICODE等字符编码格式的文本文件 | 首页 | BEA WebLogic平台下J2EE调优攻略 >>

RSS | Atom | 电子邮件

搜索

分类 | 标签 | 高级搜索

分类

AppServer (26)

Database (61)

健康 (4)

生活 (25)

UNIX (38)

Mobile (23)

Tech (70)

Web前端 (0)

随笔 (0)

数据库 (0)

Java技术 (0)

收藏夹 (0)

标签

最新文章

陈爱云：打造坚如磐石的搜索架构 - 中生代技术 | 十条
对于一个在线系统而言，性能和稳定性是永远要追求的两个方向，如果是分布式系统，性能不够可以用机器来凑（当然这不是最好的方法，性能的提升不是本文的关注点，所以这里不对提升性能的方法赘述），但是稳定性不能靠机器来堆，并且机器越来越多可能会带来更多的稳定性的问题。做在线系统的同学应该会对墨菲定理感触特别深，...
Fix certificate problem in HTTPS - Real's Java How-to
HTTPS protocol is supported since JDK1.4 (AFAIK), you have nothing special to do. import java.io.InputStreamReader; import java.io.Reader; import java.net.URL; import java.net.URLConnection; public class ConnectHttps { public static void main(String[...
爬取百度网盘用户分享 | Guodong
获取用户订阅: http://yun.baidu.com/pcloud/friend/getfollowlist?query_uk=%s&limit=24&start=%s&bdstoken=e6f1efec456b92778e70c55ba5d81c3d&channel=chunl...

Log me in using Google

Lucene: 忽略指定的字符（Escaping Special Characters）

from javalobby,by R.J. Lorimer

When integrating Lucene into an application so it can directly take user input, it is often valuable to use the QueryParser class. This class is a very handy user-readable-text to functional query converter; perfect for taking user input without a lot of work on your part, but if you don't properly handle special characters, it will fail with a nasty-gram exception:

Was expecting one of:
     "(" ...<QUOTED> ... <TERM> ... 
     <PREFIXTERM> ... <WILDTERM> ...  
     "[" ... "{" ... <NUMBER> ...
       at
org.apache.lucene.queryParser.QueryParser.generateParseException(QueryParser.java:1226)
       at org.apache.lucene.queryParser.QueryParser.jj_consume_token(QueryParser.java:1109)
       at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:759)
       at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:712)
       at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:122)
  [...]

Thankfully, the necessary code to fix this isn't all that difficult. There are two scenarios at this point: 1.) You are using Lucene 1.9 or newer., or 2.) You are using Lucene 1.4 or prior

If you are using Lucene 1.9, the task of escaping user input for the query parser is very straightforward:

Lucene 1.9 Escaping

String userQuery = // ...
String escaped = QueryParser.escape(userQuery);
Query query = QueryParser.parse(escaped);
// ...

If, however, you are using Lucene 1.4 or prior, there is no escape convenience utility. Instead, you must write your own. The characters that need to be escaped are: + - ! ( ) { } [ ] ^ " ~ * ? : \

Here is a regex-powered block of code that does this (you could also code this using a StringBuffer, indexOf, and all those goodies if you prefer):

Lucene 1.4 Escaping

String userInput = // ...
String escapeChars ="[\\\\+\\-\\!\\(\\)\\:\\^\\]\\{\\}\\~\\*\\?]";
String escaped = userInput.replaceAll(escapeChars, "\\\\$0");
Query query = QueryParser.parse(escaped);
// ...

The 'escapeChars' string represents all possible characters that should be escaped, and the replaceAll with the $0 says that for whatever character we matched, use it in the replacement and append a '\\' to the front (a backslash).

Now, I always hate those articles on the web that do some hand-waving and over-simplification to explain how easy something is, but don't explain the consequences. In this case, using regular expressions like I have here carries with it some (most likely) unnecessary overhead, and to compress the code into a digestable format, I have performed some less-than-best-practices. If you are going to be escaping this text frequently, I'd recommend you compile the pattern ahead-of-time, and use some constants:

Lucene 1.4 Escaping (More Complete)

// Some constants.
private static final String LUCENE_ESCAPE_CHARS = "[\\\\+\\-\\!\\(\\)\\:\\^\\]\\{\\}\\~\\*\\?]";
private static final Pattern LUCENE_PATTERN = Pattern.compile(LUCENE_ESCAPE_CHARS);
private static final String REPLACEMENT_STRING = "\\\\$0";
 
// ... Then, in your code somewhere...
String userInput = // ...
String escaped = LUCENE_PATTERN.matcher(userInput).replaceAll(REPLACEMENT_STRING);
Query query = QueryParser.parse(escaped);
// ...

标签 : java, lucene

发表评论

IT瘾于2007年1月6日上午11时44分17秒发布 #

发表评论发送引用通报

Re: Lucene: 忽略指定的字符（Escaping Special Characters） Anonymous于2024年4月30日上午01时06分30秒评论 #
标题
正文	HTML : b, strong, i, em, blockquote, br, p, pre, a href="", ul, ol, li, sub, sup
OpenID Login	(Not me?)
姓名
电子邮件
网站
记住我	是否
电邮地址不会公开在网页上，您留下的电子邮件仅用于本文有新评论时通知您（以后可以随时拿掉）。

Lucene: 忽略指定的字符（Escaping Special Characters）

Re: Lucene: 忽略指定的字符（Escaping Special Characters）