Code Samples - Zoie - Confluence

Zoie is a real-time search and indexing system built on Apache Lucene.

Donated by LinkedIn.com on July 19, 2008, and has been deployed in a real-time large-scale consumer website: LinkedIn.com handling millions of searches as well as millions of updates daily.

Configuration

Zoie can be configured via Spring:

    
            <!-- An instance of a DataProvider:
     FileDataProvider recurses through a given directory and provides the DataConsumer
     indexing requests built from the gathered files.
     In the example, this provider needs to be started manually, and it is done via jmx.
-->
<bean id="dataprovider" class="proj.zoie.impl.indexing.FileDataProvider">
  <constructor-arg value="file:${source.directory}"/>
  <property name="dataConsumer" ref="indexingSystem" />
</bean>
 
 
<!--
  an instance of an IndexableInterpreter:
  FileIndexableInterpreter converts a text file into a lucene document, for example
  purposes only
-->
<bean id="fileInterpreter" class="proj.zoie.impl.indexing.FileIndexableInterpreter" />
 
<!-- A decorator for an IndexReader instance:
     The default decorator is just a pass through, the input IndexReader is returned.
-->
<bean id="idxDecorator" class="proj.zoie.impl.indexing.DefaultIndexReaderDecorator" />
 
<!-- A zoie system declaration, passed as a DataConsumer to the DataProvider declared above -->
<bean id="indexingSystem" class="proj.zoie.impl.indexing.ZoieSystem" init-method="start" destroy-method="shutdown">
 
  <!-- disk index directory-->
  <constructor-arg index="0" value="file:${index.directory}"/>
 
  <!-- sets the interpreter -->
  <constructor-arg index="1" ref="fileInterpreter" />
 
  <!-- sets the decorator -->
  <constructor-arg index="2">
    <ref bean="idxDecorator"/>
  </constructor-arg>
 
  <!-- set the Analyzer, if null is passed, Lucene's StandardAnalyzer is used -->
  <constructor-arg index="3">
    <null/>
  </constructor-arg>
 
  <!-- sets the Similarity, if null is passed, Lucene's DefaultSimilarity is used -->
  <constructor-arg index="4">
    <null/>
  </constructor-arg>
 
  <!-- the following parameters indicate how often to triggered batched indexing,
       whichever the first of the following two event happens will triggered indexing
  -->
 
  <!-- Batch size: how many items to put on the queue before indexing is triggered -->
  <constructor-arg index="5" value="1000" />
 
  <!-- Batch delay, how long to wait before indxing is triggered -->
  <constructor-arg index="6" value="300000" />
 
  <!-- flag turning on/off real time indexing -->
  <constructor-arg index="7" value="true" />
</bean>
 
<!-- a search service -->
<bean id="mySearchService" class="com.mycompany.search.SearchService">
  <!-- IndexReader factory that produces index readers to build Searchers from -->
  <constructor-arg ref="indexingSystem" />
</bean>


            

        

Basic Search

This example shows how to set up basic indexing and search

thread 1: (indexing thread)

    
            long batchVersion = 0;
while(true){
  Data[] data = buildDataEvents(...); // build a batch of data object to index
 
  // construct a collection of indexing events
  ArrayList<DataEvent> eventList = new ArrayList<DataEvent>(data.length);
  for (Data datum : data){
    eventList.add(new DataEvent<Data>(batchVersion,datum));
  }
 
  // do indexing
  indexingSystem.consume(events);
 
 // increment my version
  batchVersion++;
}

        

thread 2: (search thread)

    
            // get the IndexReaders
List<ZoieIndexReader<MyDoNothingFilterIndexReader>> readerList = indexingSystem.getIndexReaders();
 
// MyDoNothingFilterIndexReader instances can be obtained by calling
// ZoieIndexReader.getDecoratedReaders()
 
List<MyDoNothingFilterIndexReader> decoratedReaders = ZoieIndexReader.extractDecoratedReaders(readerList);
SubReaderAccessor<MyDoNothingFilterIndexReader> subReaderAccessor = ZoieIndexReader.getSubReaderAccessor(decoratedReaders);
 
// combine the readers
MultiReader reader = new MultiReader(readerList.toArray(new IndexReader[readerList.size()]),false);
// do search
IndexSearcher searcher = new IndexSearcher(reader);
Query q = buildQuery("myquery",indexingSystem.getAnalyzer());
 
TopDocs docs = searcher.search(q,10);
 
ScoreDoc[] scoreDocs = docs.scoreDocs;
 
// convert to UID for each doc
for (ScoreDoc scoreDoc : scoreDocs){
   int docid = scoreDoc.doc;
 
   SubReaderInfo<MyDoNothingFilterIndexReader> readerInfo = subReaderAccessor.getSubReaderInfo(docid);
 
   long uid = (long)((ZoieIndexReader<MyDoNothingFilterIndexReader>)readerInfo.subreader.getInnerReader()).getUID(readerInfo.subdocid);
}
 
// return readers
indexingSystem.returnIndexReaders(readerList);

阅读全文……

标签 : database, java, lucene

发表评论

IT瘾于2014年12月28日下午07时43分00秒发布 #

Apache Solr vs ElasticSearch - the Feature Smackdown!

API

Feature	Solr 4.7.0	ElasticSearch 1.0
Format	XML,CSV,JSON	JSON
HTTP REST API
Binary API	SolrJ	TransportClient, Thrift (through a plugin)
JMX support		ES specific stats are exposed through the REST API
Client libraries	PHP, Ruby, Perl, Scala, Python, .NET, Javascript	PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Erlang, Clojure
3rd-party product integration (open-source)	Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna)	Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial)	DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR	SearchBlox, Hortonworks Data Platform, MapR
Output	JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java	JSON, XML/HTML (via plugin)

Indexing

Feature	Solr 4.7.0	ElasticSearch 1.0
Data Import	DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File	Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates	with stored fields	with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Synonyms		Supports Solr and Wordnet synonym format
Multiple indexes
Near-Realtime Search/Indexing
Complex documents	Flat document structure. No native support for nesting documents
Schemaless	4.4+
Multiple document types per schema	One set of fields per schema, one schema per core
Online schema changes	Schema change requires restart. Workaround possible using MultiCore.	Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying		via multi-fields
Hash-based deduplication

Searching

Feature	Solr 4.7.0	ElasticSearch 1.0
Lucene Query parsing
Structured Query DSL	Need to programmatically create queries if going beyond Lucene query syntax.
Span queries	via SOLR-2703
Spatial search
Multi-point spatial search
Faceting		The way top N facets work now is by getting the top N from each shard, and merging the results. This can giveincorrect counts when num shards > 1.
Advanced Faceting		blog post
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries	JIRA issue	Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping		possibly 1.0+ link
Spellcheck		Suggest API
Autocomplete		Added in 0.90.3 here
Query elevation		workaround
Joins	It's not supported in distributed search. See LUCENE-3759.	via has_children and top_children queries
Resultset Scrolling	New to 4.7.0	via scan search type
Filter queries		also supports filtering by native scripts
Filter execution order	local params and cache property	_cache and _cache_key property
Alternative QueryParsers	DisMax, eDisMax	query_string, dis_max, match, multi_match etc
Negative boosting	but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes	it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload		Warmers API

Customizability

Feature	Solr 4.7.0	ElasticSearch 1.0
Pluggable API endpoints
Pluggable search workflow	via SearchComponents
Pluggable update workflow
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing
Pluggable webapps		site plugin
Automated plugin installation		Installable from GitHub, maven, sonatype or elasticsearch.org

Distributed

Feature	Solr 4.7.0	ElasticSearch 1.0
Self-contained cluster	Depends on separate ZooKeeper server	Only ElasticSearch nodes
Automatic node discovery	ZooKeeper	internal Zen Discovery or ZooKeeper
Partition tolerance	The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function.	Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this
Automatic failover	If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.
Automatic leader election
Shard replication
Sharding
Automatic shard rebalancing		it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards	Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime.	each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.
Relocate shards and replicas	can be done by creating a shard replicate on the desired node and then removing the shard from the source node	can move shards and replicas to any node in the cluster on demand
Control shard routing	shards or _route_ parameter	routing parameter
Consistency	Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index.	Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

Misc

Feature	Solr 4.7.0	ElasticSearch 1.0
Web Admin interface	bundled with Solr	via site plugins: elasticsearch-head, bigdesk, kopf,elasticsearch-HQ, Hammer
Hosting providers	WebSolr, Searchify, Hosted-Solr, IndexDepot, OpenSolr,gotosolr	bonsai.io, Indexisto, qbox.io, IndexDepot

Thoughts...

As a number of folks point out in the discussion below, feature comparisons are inherently shallow and only go so far. I think they serve a purpose, but shouldn't be taken to be the last word on these 2 fantastic search products.

If you're running a smallish site and need search features without fancy bells-and-whistles, I think you'll be very happy with either Solr or ElasticSearch.

I've found ElasticSearch to be friendlier to teams which are used to REST APIs, JSON etc and don't have a Java background. If you're planning a large installation that requires running distributed search instances, I suspect you're also going to be happier with ElasticSearch.

As Matt Weber points out below, ElasticSearch was built to be distributed from the ground up, not tacked on as an 'afterthought' like it was with Solr. This is totally evident when examining the design and architecture of the 2 products, and also when browsing the source code.

Resources

My other sites may be of interest if you're new to Lucene, Solr and ElasticSearch:
The Solr wiki and the ElasticSearch Guide are your friends.

阅读全文……

标签 : database, java, lucene

发表评论

IT瘾于2014年12月28日上午06时24分00秒发布 #

使用Lucene-Spatial实现集成地理位置的全文检索 - haiker - ITeye技术网站

Lucene通过Spatial包提供了对基于地理位置的全文检索的支持，最典型的应用场景就是：“搜索中关村附近1公里内的火锅店，并按远近排序”。使用Lucene-Spatial添加对地理位置的支持，和之前普通文本搜索主要有两点区别：

1. 将坐标信息转化为笛卡尔层，建立索引

[java]view plaincopyprint? 
     private void indexLocation(Document document, JSONObject jo)  
        throws Exception {  
  
    double longitude = jo.getDouble("longitude");  
    double latitude = jo.getDouble("latitude");  
  
    document.add(new Field("lat", NumericUtils  
            .doubleToPrefixCoded(latitude), Field.Store.YES,  
            Field.Index.NOT_ANALYZED));  
    document.add(new Field("lng", NumericUtils  
            .doubleToPrefixCoded(longitude), Field.Store.YES,  
            Field.Index.NOT_ANALYZED));  
  
    for (int tier = startTier; tier <= endTier; tier++) {  
        ctp = new CartesianTierPlotter(tier, projector,  
                CartesianTierPlotter.DEFALT_FIELD_PREFIX);  
        final double boxId = ctp.getTierBoxId(latitude, longitude);  
        document.add(new Field(ctp.getTierFieldName(), NumericUtils  
                .doubleToPrefixCoded(boxId), Field.Store.YES,  
                Field.Index.NOT_ANALYZED_NO_NORMS));  
    }  
}  

2. 搜索时，指定使用DistanceQueryFilter

[java]view plaincopyprint? 
DistanceQueryBuilder dq = new DistanceQueryBuilder(latitude,  
                longitude, miles, "lat", "lng",  
                CartesianTierPlotter.DEFALT_FIELD_PREFIX, true, startTier,  
                endTier);  
DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(  
                dq.getDistanceFilter());  
Sort sort = new Sort(new SortField("geo_distance", dsort));  

下面是基于Lucene3.2.0和JUnit4.8.2的完整代码。

[html]view plaincopyprint? 
<dependencies>  
    <dependency>  
        <groupId>junit</groupId>  
        <artifactId>junit</artifactId>  
        <version>4.8.2</version>  
        <type>jar</type>  
        <scope>test</scope>  
    </dependency>  
    <dependency>  
        <groupId>org.apache.lucene</groupId>  
        <artifactId>lucene-core</artifactId>  
        <version>3.2.0</version>  
        <type>jar</type>  
        <scope>compile</scope>  
    </dependency>  
    <dependency>  
        <groupId>org.apache.lucene</groupId>  
        <artifactId>lucene-spatial</artifactId>  
        <version>3.2.0</version>  
        <type>jar</type>  
        <scope>compile</scope>  
    </dependency>  
    <dependency>  
        <groupId>org.json</groupId>  
        <artifactId>json</artifactId>  
        <version>20100903</version>  
        <type>jar</type>  
        <scope>compile</scope>  
    </dependency>  
</dependencies>  

首先准备测试用的数据：

[plain]view plaincopyprint? 
{"id":12,"title":"时尚码头美容美发热烫特价","longitude":116.3838183,"latitude":39.9629015}  
{"id":17,"title":"审美个人美容美发套餐","longitude":116.386564,"latitude":39.966102}  
{"id":23,"title":"海底捞吃300送300","longitude":116.38629,"latitude":39.9629573}  
{"id":26,"title":"仅98元！享原价335元李老爹","longitude":116.3846175,"latitude":39.9629125}  
{"id":29,"title":"都美造型烫染美发护理套餐","longitude":116.38629,"latitude":39.9629573}  
{"id":30,"title":"仅售55元！原价80元的老舍茶馆相声下午场","longitude":116.0799914,"latitude":39.9655391}  
{"id":33,"title":"仅售55元！原价80元的新笑声客栈早场","longitude":116.0799914,"latitude":39.9655391}  
{"id":34,"title":"仅售39元（红色礼盒）！原价80元的平谷桃","longitude":116.0799914,"latitude":39.9655391}  
{"id":46,"title":"仅售38元！原价180元地质礼堂白雪公主","longitude":116.0799914,"latitude":39.9655391}  
{"id":49,"title":"仅99元！享原价342.7元自助餐","longitude":116.0799914,"latitude":39.9655391}  
{"id":58,"title":"桑海教育暑期学生报名培训九折优惠券","longitude":116.0799914,"latitude":39.9655391}  
{"id":59,"title":"全国发货：仅29元！贝玲妃超模粉红高光光","longitude":116.0799914,"latitude":39.9655391}  
{"id":65,"title":"海之屿生态水族用品店抵用券","longitude":116.0799914,"latitude":39.9655391}  
{"id":67,"title":"小区东门时尚烫染个人护理美发套餐","longitude":116.3799914,"latitude":39.9655391}  
{"id":74,"title":"《郭德纲相声专辑》CD套装","longitude":116.0799914,"latitude":39.9655391}  

根据上面的测试数据，编写测试用例，分别搜索坐标（116.3838183, 39.9629015）3千米以内的“美发”和全部内容，分别得到的结果应该是4条和6条。

[java]view plaincopyprint? 
import static org.junit.Assert.assertEquals;  
import static org.junit.Assert.fail;  
  
import java.util.List;  
  
import org.junit.Test;  
  
public class LuceneSpatialTest {  
      
    private static LuceneSpatial spatialSearcher = new LuceneSpatial();  
  
    @Test  
    public void testSearch() {  
        try {  
            long start = System.currentTimeMillis();  
            List<String> results = spatialSearcher.search("美发", 116.3838183, 39.9629015, 3.0);  
            System.out.println(results.size()  
                    + "个匹配结果，共耗时 "  
                    + (System.currentTimeMillis() - start) + "毫秒。\n");  
            assertEquals(4, results.size());  
        } catch (Exception e) {  
            fail("Exception occurs...");  
            e.printStackTrace();  
        }  
    }  
  
    @Test  
    public void testSearchWithoutKeyword() {  
        try {  
            long start = System.currentTimeMillis();  
            List<String> results = spatialSearcher.search(null, 116.3838183, 39.9629015, 3.0);  
            System.out.println( results.size()  
                    + "个匹配结果，共耗时 "  
                    + (System.currentTimeMillis() - start) + "毫秒.\n");  
            assertEquals(6, results.size());  
        } catch (Exception e) {  
            fail("Exception occurs...");  
            e.printStackTrace();  
        }  
    }  
}  

下面是LuceneSpatial类，在构造函数中初始化变量和创建索引：

[java]view plaincopyprint? 
public class LuceneSpatial {  
  
    private Analyzer analyzer;  
    private IndexWriter writer;  
    private FSDirectory indexDirectory;  
    private IndexSearcher indexSearcher;  
    private IndexReader indexReader;  
    private String indexPath = "c:/lucene-spatial";  
  
    // Spatial  
    private IProjector projector;  
    private CartesianTierPlotter ctp;  
    public static final double RATE_MILE_TO_KM = 1.609344; //英里和公里的比率  
    public static final String LAT_FIELD = "lat";  
    public static final String LON_FIELD = "lng";  
    private static final double MAX_RANGE = 15.0; // 索引支持的最大范围，单位是千米  
    private static final double MIN_RANGE = 3.0;  // 索引支持的最小范围，单位是千米  
    private int startTier;  
    private int endTier;  
  
    public LuceneSpatial() {  
        try {  
            init();  
        } catch (Exception e) {  
            e.printStackTrace();  
        }  
    }  
  
    private void init() throws Exception {  
        initializeSpatialOptions();  
  
        analyzer = new StandardAnalyzer(Version.LUCENE_32);  
  
        File path = new File(indexPath);  
  
        boolean isNeedCreateIndex = false;  
  
        if (path.exists() && !path.isDirectory())  
            throw new Exception("Specified path is not a directory");  
  
        if (!path.exists()) {  
            path.mkdirs();  
            isNeedCreateIndex = true;  
        }  
  
        indexDirectory = FSDirectory.open(new File(indexPath));  
  
        //建立索引  
        if (isNeedCreateIndex) {  
            IndexWriterConfig indexWriterConfig = new IndexWriterConfig(  
                    Version.LUCENE_32, analyzer);  
            indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);  
            writer = new IndexWriter(indexDirectory, indexWriterConfig);  
            buildIndex();  
        }  
  
        indexReader = IndexReader.open(indexDirectory, true);  
        indexSearcher = new IndexSearcher(indexReader);  
  
    }  
  
    @SuppressWarnings("deprecation")  
    private void initializeSpatialOptions() {  
        projector = new SinusoidalProjector();  
        ctp = new CartesianTierPlotter(0, projector,  
                CartesianTierPlotter.DEFALT_FIELD_PREFIX);  
        startTier = ctp.bestFit(MAX_RANGE / RATE_MILE_TO_KM);  
        endTier = ctp.bestFit(MIN_RANGE / RATE_MILE_TO_KM);  
    }  
  
  
  
    private int mile2Meter(double miles) {  
        double dMeter = miles * RATE_MILE_TO_KM * 1000;  
  
        return (int) dMeter;  
    }  
  
    private double km2Mile(double km) {  
        return km / RATE_MILE_TO_KM;  
    }  

创建索引的具体实现：

[java]view plaincopyprint? 
private void buildIndex() {  
    BufferedReader br = null;  
    try {  
        //逐行添加测试数据到索引中，测试数据文件和源文件在同一个目录下  
        br = new BufferedReader(new InputStreamReader(  
                LuceneSpatial.class.getResourceAsStream("data")));  
        String line = null;  
        while ((line = br.readLine()) != null) {  
            index(new JSONObject(line));  
        }  
  
        writer.commit();  
    } catch (Exception e) {  
        e.printStackTrace();  
    } finally {  
        if (br != null) {  
            try {  
                br.close();  
            } catch (IOException e) {  
                e.printStackTrace();  
            }  
        }  
    }  
}  
  
private void index(JSONObject jo) throws Exception {  
    Document doc = new Document();  
  
    doc.add(new Field("id", jo.getString("id"), Field.Store.YES,  
            Field.Index.ANALYZED));  
  
    doc.add(new Field("title", jo.getString("title"), Field.Store.YES,  
            Field.Index.ANALYZED));  
  
    //将位置信息添加到索引中  
    indexLocation(doc, jo);  
  
    writer.addDocument(doc);  
}  
  
private void indexLocation(Document document, JSONObject jo)  
        throws Exception {  
  
    double longitude = jo.getDouble("longitude");  
    double latitude = jo.getDouble("latitude");  
  
    document.add(new Field("lat", NumericUtils  
            .doubleToPrefixCoded(latitude), Field.Store.YES,  
            Field.Index.NOT_ANALYZED));  
    document.add(new Field("lng", NumericUtils  
            .doubleToPrefixCoded(longitude), Field.Store.YES,  
            Field.Index.NOT_ANALYZED));  
  
    for (int tier = startTier; tier <= endTier; tier++) {  
        ctp = new CartesianTierPlotter(tier, projector,  
                CartesianTierPlotter.DEFALT_FIELD_PREFIX);  
        final double boxId = ctp.getTierBoxId(latitude, longitude);  
        document.add(new Field(ctp.getTierFieldName(), NumericUtils  
                .doubleToPrefixCoded(boxId), Field.Store.YES,  
                Field.Index.NOT_ANALYZED_NO_NORMS));  
    }  
}  

搜索的具体实现：

[java]view plaincopyprint? 
public List<String> search(String keyword, double longitude,  
        double latitude, double range) throws Exception {  
    List<String> result = new ArrayList<String>();  
  
    double miles = km2Mile(range);  
      
    DistanceQueryBuilder dq = new DistanceQueryBuilder(latitude,  
            longitude, miles, "lat", "lng",  
            CartesianTierPlotter.DEFALT_FIELD_PREFIX, true, startTier,  
            endTier);  
  
    //按照距离排序  
    DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(  
            dq.getDistanceFilter());  
    Sort sort = new Sort(new SortField("geo_distance", dsort));  
  
    Query query = buildQuery(keyword);  
  
    //搜索结果  
    TopDocs hits = indexSearcher.search(query, dq.getFilter(),  
            Integer.MAX_VALUE, sort);  
    //获得各条结果相对应的距离  
    Map<Integer, Double> distances = dq.getDistanceFilter()  
            .getDistances();  
  
    for (int i = 0; i < hits.totalHits; i++) {  
        final int docID = hits.scoreDocs[i].doc;  
  
        final Document doc = indexSearcher.doc(docID);  
  
        final StringBuilder builder = new StringBuilder();  
        builder.append("找到了: ")  
                .append(doc.get("title"))  
                .append("， 距离: ")  
                .append(mile2Meter(distances.get(docID)))  
                .append("米。");  
        System.out.println(builder.toString());  
  
        result.add(builder.toString());  
    }  
  
    return result;  
}  
  
private Query buildQuery(String keyword) throws Exception {  
    //如果没有指定关键字，则返回范围内的所有结果  
    if (keyword == null || keyword.isEmpty()) {  
        return new MatchAllDocsQuery();  
    }  
    QueryParser parser = new QueryParser(Version.LUCENE_32, "title",  
            analyzer);  
  
    parser.setDefaultOperator(Operator.AND);  
  
    return parser.parse(keyword.toString());  
}  

执行测试用例，可以得到下面的结果：

[plain]view plaincopyprint? 
找到了: 时尚码头美容美发热烫特价， 距离: 0米。  
找到了: 都美造型烫染美发护理套餐， 距离: 210米。  
找到了: 审美个人美容美发套餐， 距离: 426米。  
找到了: 小区东门时尚烫染个人护理美发套餐， 距离: 439米。  
4个匹配结果，共耗时 119毫秒。  
  
找到了: 时尚码头美容美发热烫特价， 距离: 0米。  
找到了: 仅98元！享原价335元李老爹， 距离: 68米。  
找到了: 海底捞吃300送300， 距离: 210米。  
找到了: 都美造型烫染美发护理套餐， 距离: 210米。  
找到了: 审美个人美容美发套餐， 距离: 426米。  
找到了: 小区东门时尚烫染个人护理美发套餐， 距离: 439米。  
6个匹配结果，共耗时 3毫秒.  

参考文献：

Lucene-Spatial的原理介绍：http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm

GeoHash：http://en.wikipedia.org/wiki/Geohash

两篇示例（其中大部分代码就来自于这里）：

Spatial search with Lucene

Lucene Spatial Example

使用 Apache Lucene 和 Solr 进行位置感知搜索

阅读全文……

标签 : database, java, lucene

发表评论

IT瘾于2014年12月28日上午06时17分00秒发布 #