<<上篇 | 首页 | 下篇>>

Code Samples - Zoie - Confluence

Zoie is a real-time search and indexing system built on Apache Lucene.

Donated by LinkedIn.com on July 19, 2008, and has been deployed in a real-time large-scale consumer website: LinkedIn.com handling millions of searches as well as millions of updates daily.

 

Configuration

Zoie can be configured via Spring:

<!-- An instance of a DataProvider:
     FileDataProvider recurses through a given directory and provides the DataConsumer
     indexing requests built from the gathered files.
     In the example, this provider needs to be started manually, and it is done via jmx.
-->
<bean id="dataprovider" class="proj.zoie.impl.indexing.FileDataProvider">
  <constructor-arg value="file:${source.directory}"/>
  <property name="dataConsumer" ref="indexingSystem" />
</bean>
 
 
<!--
  an instance of an IndexableInterpreter:
  FileIndexableInterpreter converts a text file into a lucene document, for example
  purposes only
-->
<bean id="fileInterpreter" class="proj.zoie.impl.indexing.FileIndexableInterpreter" />
 
<!-- A decorator for an IndexReader instance:
     The default decorator is just a pass through, the input IndexReader is returned.
-->
<bean id="idxDecorator" class="proj.zoie.impl.indexing.DefaultIndexReaderDecorator" />
 
<!-- A zoie system declaration, passed as a DataConsumer to the DataProvider declared above -->
<bean id="indexingSystem" class="proj.zoie.impl.indexing.ZoieSystem" init-method="start" destroy-method="shutdown">
 
  <!-- disk index directory-->
  <constructor-arg index="0" value="file:${index.directory}"/>
 
  <!-- sets the interpreter -->
  <constructor-arg index="1" ref="fileInterpreter" />
 
  <!-- sets the decorator -->
  <constructor-arg index="2">
    <ref bean="idxDecorator"/>
  </constructor-arg>
 
  <!-- set the Analyzer, if null is passed, Lucene's StandardAnalyzer is used -->
  <constructor-arg index="3">
    <null/>
  </constructor-arg>
 
  <!-- sets the Similarity, if null is passed, Lucene's DefaultSimilarity is used -->
  <constructor-arg index="4">
    <null/>
  </constructor-arg>
 
  <!-- the following parameters indicate how often to triggered batched indexing,
       whichever the first of the following two event happens will triggered indexing
  -->
 
  <!-- Batch size: how many items to put on the queue before indexing is triggered -->
  <constructor-arg index="5" value="1000" />
 
  <!-- Batch delay, how long to wait before indxing is triggered -->
  <constructor-arg index="6" value="300000" />
 
  <!-- flag turning on/off real time indexing -->
  <constructor-arg index="7" value="true" />
</bean>
 
<!-- a search service -->
<bean id="mySearchService" class="com.mycompany.search.SearchService">
  <!-- IndexReader factory that produces index readers to build Searchers from -->
  <constructor-arg ref="indexingSystem" />
</bean>

Basic Search

This example shows how to set up basic indexing and search

thread 1: (indexing thread)

long batchVersion = 0;
while(true){
  Data[] data = buildDataEvents(...); // build a batch of data object to index
 
  // construct a collection of indexing events
  ArrayList<DataEvent> eventList = new ArrayList<DataEvent>(data.length);
  for (Data datum : data){
    eventList.add(new DataEvent<Data>(batchVersion,datum));
  }
 
  // do indexing
  indexingSystem.consume(events);
 
 // increment my version
  batchVersion++;
}

thread 2: (search thread)

// get the IndexReaders
List<ZoieIndexReader<MyDoNothingFilterIndexReader>> readerList = indexingSystem.getIndexReaders();
 
// MyDoNothingFilterIndexReader instances can be obtained by calling
// ZoieIndexReader.getDecoratedReaders()
 
List<MyDoNothingFilterIndexReader> decoratedReaders = ZoieIndexReader.extractDecoratedReaders(readerList);
SubReaderAccessor<MyDoNothingFilterIndexReader> subReaderAccessor = ZoieIndexReader.getSubReaderAccessor(decoratedReaders);
 
// combine the readers
MultiReader reader = new MultiReader(readerList.toArray(new IndexReader[readerList.size()]),false);
// do search
IndexSearcher searcher = new IndexSearcher(reader);
Query q = buildQuery("myquery",indexingSystem.getAnalyzer());
 
TopDocs docs = searcher.search(q,10);
 
ScoreDoc[] scoreDocs = docs.scoreDocs;
 
// convert to UID for each doc
for (ScoreDoc scoreDoc : scoreDocs){
   int docid = scoreDoc.doc;
 
   SubReaderInfo<MyDoNothingFilterIndexReader> readerInfo = subReaderAccessor.getSubReaderInfo(docid);
 
   long uid = (long)((ZoieIndexReader<MyDoNothingFilterIndexReader>)readerInfo.subreader.getInnerReader()).getUID(readerInfo.subdocid);
}
 
 
// return readers
indexingSystem.returnIndexReaders(readerList);

阅读全文……

标签 : , ,

Apache Solr vs ElasticSearch - the Feature Smackdown!

API

Feature Solr 4.7.0 ElasticSearch 1.0
Format XML,CSV,JSON JSON
HTTP REST API
Binary API   SolrJ  TransportClient, Thrift (through a plugin)
JMX support  ES specific stats are exposed through the REST API
Client libraries  PHP, Ruby, Perl, Scala, Python, .NET, Javascript PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Erlang, Clojure
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via plugin)

 

Indexing

Feature Solr 4.7.0 ElasticSearch 1.0
Data Import DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues 
Partial Doc Updates   with stored fields  with _source field
Custom Analyzers and Tokenizers 
Per-field analyzer chain 
Per-doc/query analyzer chain 
Synonyms   Supports Solr and Wordnet synonym format
Multiple indexes 
Near-Realtime Search/Indexing 
Complex documents   Flat document structure. No native support for nesting documents
Schemaless   4.4+
Multiple document types per schema   One set of fields per schema, one schema per core
Online schema changes   Schema change requires restart. Workaround possible using MultiCore.  Only backward-compatible changes.
Apache Tika integration 
Dynamic fields 
Field copying   via multi-fields
Hash-based deduplication 

 

Searching

Feature Solr 4.7.0 ElasticSearch 1.0
Lucene Query parsing 
Structured Query DSL   Need to programmatically create queries if going beyond Lucene query syntax.
Span queries   via SOLR-2703
Spatial search 
Multi-point spatial search 
Faceting   The way top N facets work now is by getting the top N from each shard, and merging the results. This can giveincorrect counts when num shards > 1.
Advanced Faceting   blog post
Pivot Facets 
More Like This
Boosting by functions 
Boosting using scripting languages 
Push Queries  JIRA issue  Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping   possibly 1.0+ link
Spellcheck  Suggest API
Autocomplete  Added in 0.90.3 here
Query elevation  workaround
Joins   It's not supported in distributed search. See LUCENE-3759.  via has_children and top_children queries
Resultset Scrolling   New to 4.7.0  via scan search type
Filter queries   also supports filtering by native scripts
Filter execution order   local params and cache property  _cache and _cache_key property
Alternative QueryParsers   DisMax, eDisMax  query_string, dis_max, match, multi_match etc
Negative boosting   but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes  it can search across multiple compatible collections
Result highlighting
Custom Similarity 
Searcher warming on index reload   Warmers API

 

Customizability

Feature Solr 4.7.0 ElasticSearch 1.0
Pluggable API endpoints 
Pluggable search workflow   via SearchComponents
Pluggable update workflow 
Pluggable Analyzers/Tokenizers
Pluggable Field Types
Pluggable Function queries
Pluggable scoring scripts
Pluggable hashing 
Pluggable webapps   site plugin
Automated plugin installation   Installable from GitHub, maven, sonatype or elasticsearch.org

 

Distributed

Feature Solr 4.7.0 ElasticSearch 1.0
Self-contained cluster   Depends on separate ZooKeeper server  Only ElasticSearch nodes
Automatic node discovery  ZooKeeper  internal Zen Discovery or ZooKeeper
Partition tolerance  The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function.  Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this
Automatic failover  If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.
Automatic leader election
Shard replication
Sharding 
Automatic shard rebalancing  it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards  Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime.  each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.
Relocate shards and replicas   can be done by creating a shard replicate on the desired node and then removing the shard from the source node  can move shards and replicas to any node in the cluster on demand
Control shard routing   shards or _route_ parameter  routing parameter
Consistency Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index. Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

 

Misc

Feature Solr 4.7.0 ElasticSearch 1.0
Web Admin interface  bundled with Solr  via site plugins: elasticsearch-headbigdeskkopf,elasticsearch-HQHammer
Hosting providers WebSolrSearchifyHosted-SolrIndexDepotOpenSolr,gotosolr bonsai.ioIndexistoqbox.ioIndexDepot

 


Thoughts...

As a number of folks point out in the discussion below, feature comparisons are inherently shallow and only go so far. I think they serve a purpose, but shouldn't be taken to be the last word on these 2 fantastic search products.

If you're running a smallish site and need search features without fancy bells-and-whistles, I think you'll be very happy with either Solr or ElasticSearch.

I've found ElasticSearch to be friendlier to teams which are used to REST APIs, JSON etc and don't have a Java background. If you're planning a large installation that requires running distributed search instances, I suspect you're also going to be happier with ElasticSearch.

As Matt Weber points out below, ElasticSearch was built to be distributed from the ground up, not tacked on as an 'afterthought' like it was with Solr. This is totally evident when examining the design and architecture of the 2 products, and also when browsing the source code.

 


Resources

阅读全文……

标签 : , ,

使用Lucene-Spatial实现集成地理位置的全文检索 - haiker - ITeye技术网站

Lucene通过Spatial包提供了对基于地理位置的全文检索的支持,最典型的应用场景就是:“搜索中关村附近1公里内的火锅店,并按远近排序”。使用Lucene-Spatial添加对地理位置的支持,和之前普通文本搜索主要有两点区别:

        1. 将坐标信息转化为笛卡尔层,建立索引

 

  1.      private void indexLocation(Document document, JSONObject jo)  
  2.         throws Exception {  
  3.   
  4.     double longitude = jo.getDouble("longitude");  
  5.     double latitude = jo.getDouble("latitude");  
  6.   
  7.     document.add(new Field("lat", NumericUtils  
  8.             .doubleToPrefixCoded(latitude), Field.Store.YES,  
  9.             Field.Index.NOT_ANALYZED));  
  10.     document.add(new Field("lng", NumericUtils  
  11.             .doubleToPrefixCoded(longitude), Field.Store.YES,  
  12.             Field.Index.NOT_ANALYZED));  
  13.   
  14.     for (int tier = startTier; tier <= endTier; tier++) {  
  15.         ctp = new CartesianTierPlotter(tier, projector,  
  16.                 CartesianTierPlotter.DEFALT_FIELD_PREFIX);  
  17.         final double boxId = ctp.getTierBoxId(latitude, longitude);  
  18.         document.add(new Field(ctp.getTierFieldName(), NumericUtils  
  19.                 .doubleToPrefixCoded(boxId), Field.Store.YES,  
  20.                 Field.Index.NOT_ANALYZED_NO_NORMS));  
  21.     }  
  22. }  


        2. 搜索时,指定使用DistanceQueryFilter

 

 

  1. DistanceQueryBuilder dq = new DistanceQueryBuilder(latitude,  
  2.                 longitude, miles, "lat""lng",  
  3.                 CartesianTierPlotter.DEFALT_FIELD_PREFIX, true, startTier,  
  4.                 endTier);  
  5. DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(  
  6.                 dq.getDistanceFilter());  
  7. Sort sort = new Sort(new SortField("geo_distance", dsort));  


      下面是基于Lucene3.2.0和JUnit4.8.2的完整代码。

 

 

  1. <dependencies>  
  2.     <dependency>  
  3.         <groupId>junit</groupId>  
  4.         <artifactId>junit</artifactId>  
  5.         <version>4.8.2</version>  
  6.         <type>jar</type>  
  7.         <scope>test</scope>  
  8.     </dependency>  
  9.     <dependency>  
  10.         <groupId>org.apache.lucene</groupId>  
  11.         <artifactId>lucene-core</artifactId>  
  12.         <version>3.2.0</version>  
  13.         <type>jar</type>  
  14.         <scope>compile</scope>  
  15.     </dependency>  
  16.     <dependency>  
  17.         <groupId>org.apache.lucene</groupId>  
  18.         <artifactId>lucene-spatial</artifactId>  
  19.         <version>3.2.0</version>  
  20.         <type>jar</type>  
  21.         <scope>compile</scope>  
  22.     </dependency>  
  23.     <dependency>  
  24.         <groupId>org.json</groupId>  
  25.         <artifactId>json</artifactId>  
  26.         <version>20100903</version>  
  27.         <type>jar</type>  
  28.         <scope>compile</scope>  
  29.     </dependency>  
  30. </dependencies>  

 

 

        首先准备测试用的数据:

 

  1. {"id":12,"title":"时尚码头美容美发热烫特价","longitude":116.3838183,"latitude":39.9629015}  
  2. {"id":17,"title":"审美个人美容美发套餐","longitude":116.386564,"latitude":39.966102}  
  3. {"id":23,"title":"海底捞吃300送300","longitude":116.38629,"latitude":39.9629573}  
  4. {"id":26,"title":"仅98元!享原价335元李老爹","longitude":116.3846175,"latitude":39.9629125}  
  5. {"id":29,"title":"都美造型烫染美发护理套餐","longitude":116.38629,"latitude":39.9629573}  
  6. {"id":30,"title":"仅售55元!原价80元的老舍茶馆相声下午场","longitude":116.0799914,"latitude":39.9655391}  
  7. {"id":33,"title":"仅售55元!原价80元的新笑声客栈早场","longitude":116.0799914,"latitude":39.9655391}  
  8. {"id":34,"title":"仅售39元(红色礼盒)!原价80元的平谷桃","longitude":116.0799914,"latitude":39.9655391}  
  9. {"id":46,"title":"仅售38元!原价180元地质礼堂白雪公主","longitude":116.0799914,"latitude":39.9655391}  
  10. {"id":49,"title":"仅99元!享原价342.7元自助餐","longitude":116.0799914,"latitude":39.9655391}  
  11. {"id":58,"title":"桑海教育暑期学生报名培训九折优惠券","longitude":116.0799914,"latitude":39.9655391}  
  12. {"id":59,"title":"全国发货:仅29元!贝玲妃超模粉红高光光","longitude":116.0799914,"latitude":39.9655391}  
  13. {"id":65,"title":"海之屿生态水族用品店抵用券","longitude":116.0799914,"latitude":39.9655391}  
  14. {"id":67,"title":"小区东门时尚烫染个人护理美发套餐","longitude":116.3799914,"latitude":39.9655391}  
  15. {"id":74,"title":"《郭德纲相声专辑》CD套装","longitude":116.0799914,"latitude":39.9655391}  


     根据上面的测试数据,编写测试用例,分别搜索坐标(116.383818339.96290153千米以内的“美发”和全部内容,分别得到的结果应该是4条和6条。

 

 

  1. import static org.junit.Assert.assertEquals;  
  2. import static org.junit.Assert.fail;  
  3.   
  4. import java.util.List;  
  5.   
  6. import org.junit.Test;  
  7.   
  8. public class LuceneSpatialTest {  
  9.       
  10.     private static LuceneSpatial spatialSearcher = new LuceneSpatial();  
  11.   
  12.     @Test  
  13.     public void testSearch() {  
  14.         try {  
  15.             long start = System.currentTimeMillis();  
  16.             List<String> results = spatialSearcher.search("美发"116.383818339.96290153.0);  
  17.             System.out.println(results.size()  
  18.                     + "个匹配结果,共耗时 "  
  19.                     + (System.currentTimeMillis() - start) + "毫秒。\n");  
  20.             assertEquals(4, results.size());  
  21.         } catch (Exception e) {  
  22.             fail("Exception occurs...");  
  23.             e.printStackTrace();  
  24.         }  
  25.     }  
  26.   
  27.     @Test  
  28.     public void testSearchWithoutKeyword() {  
  29.         try {  
  30.             long start = System.currentTimeMillis();  
  31.             List<String> results = spatialSearcher.search(null116.383818339.96290153.0);  
  32.             System.out.println( results.size()  
  33.                     + "个匹配结果,共耗时 "  
  34.                     + (System.currentTimeMillis() - start) + "毫秒.\n");  
  35.             assertEquals(6, results.size());  
  36.         } catch (Exception e) {  
  37.             fail("Exception occurs...");  
  38.             e.printStackTrace();  
  39.         }  
  40.     }  
  41. }  


         下面是LuceneSpatial类,在构造函数中初始化变量和创建索引:

 

  1. public class LuceneSpatial {  
  2.   
  3.     private Analyzer analyzer;  
  4.     private IndexWriter writer;  
  5.     private FSDirectory indexDirectory;  
  6.     private IndexSearcher indexSearcher;  
  7.     private IndexReader indexReader;  
  8.     private String indexPath = "c:/lucene-spatial";  
  9.   
  10.     // Spatial  
  11.     private IProjector projector;  
  12.     private CartesianTierPlotter ctp;  
  13.     public static final double RATE_MILE_TO_KM = 1.609344//英里和公里的比率  
  14.     public static final String LAT_FIELD = "lat";  
  15.     public static final String LON_FIELD = "lng";  
  16.     private static final double MAX_RANGE = 15.0// 索引支持的最大范围,单位是千米  
  17.     private static final double MIN_RANGE = 3.0;  // 索引支持的最小范围,单位是千米  
  18.     private int startTier;  
  19.     private int endTier;  
  20.   
  21.     public LuceneSpatial() {  
  22.         try {  
  23.             init();  
  24.         } catch (Exception e) {  
  25.             e.printStackTrace();  
  26.         }  
  27.     }  
  28.   
  29.     private void init() throws Exception {  
  30.         initializeSpatialOptions();  
  31.   
  32.         analyzer = new StandardAnalyzer(Version.LUCENE_32);  
  33.   
  34.         File path = new File(indexPath);  
  35.   
  36.         boolean isNeedCreateIndex = false;  
  37.   
  38.         if (path.exists() && !path.isDirectory())  
  39.             throw new Exception("Specified path is not a directory");  
  40.   
  41.         if (!path.exists()) {  
  42.             path.mkdirs();  
  43.             isNeedCreateIndex = true;  
  44.         }  
  45.   
  46.         indexDirectory = FSDirectory.open(new File(indexPath));  
  47.   
  48.         //建立索引  
  49.         if (isNeedCreateIndex) {  
  50.             IndexWriterConfig indexWriterConfig = new IndexWriterConfig(  
  51.                     Version.LUCENE_32, analyzer);  
  52.             indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);  
  53.             writer = new IndexWriter(indexDirectory, indexWriterConfig);  
  54.             buildIndex();  
  55.         }  
  56.   
  57.         indexReader = IndexReader.open(indexDirectory, true);  
  58.         indexSearcher = new IndexSearcher(indexReader);  
  59.   
  60.     }  
  61.   
  62.     @SuppressWarnings("deprecation")  
  63.     private void initializeSpatialOptions() {  
  64.         projector = new SinusoidalProjector();  
  65.         ctp = new CartesianTierPlotter(0, projector,  
  66.                 CartesianTierPlotter.DEFALT_FIELD_PREFIX);  
  67.         startTier = ctp.bestFit(MAX_RANGE / RATE_MILE_TO_KM);  
  68.         endTier = ctp.bestFit(MIN_RANGE / RATE_MILE_TO_KM);  
  69.     }  
  70.   
  71.   
  72.   
  73.     private int mile2Meter(double miles) {  
  74.         double dMeter = miles * RATE_MILE_TO_KM * 1000;  
  75.   
  76.         return (int) dMeter;  
  77.     }  
  78.   
  79.     private double km2Mile(double km) {  
  80.         return km / RATE_MILE_TO_KM;  
  81.     }  

 

 

              创建索引的具体实现:

 

  1. private void buildIndex() {  
  2.     BufferedReader br = null;  
  3.     try {  
  4.         //逐行添加测试数据到索引中,测试数据文件和源文件在同一个目录下  
  5.         br = new BufferedReader(new InputStreamReader(  
  6.                 LuceneSpatial.class.getResourceAsStream("data")));  
  7.         String line = null;  
  8.         while ((line = br.readLine()) != null) {  
  9.             index(new JSONObject(line));  
  10.         }  
  11.   
  12.         writer.commit();  
  13.     } catch (Exception e) {  
  14.         e.printStackTrace();  
  15.     } finally {  
  16.         if (br != null) {  
  17.             try {  
  18.                 br.close();  
  19.             } catch (IOException e) {  
  20.                 e.printStackTrace();  
  21.             }  
  22.         }  
  23.     }  
  24. }  
  25.   
  26. private void index(JSONObject jo) throws Exception {  
  27.     Document doc = new Document();  
  28.   
  29.     doc.add(new Field("id", jo.getString("id"), Field.Store.YES,  
  30.             Field.Index.ANALYZED));  
  31.   
  32.     doc.add(new Field("title", jo.getString("title"), Field.Store.YES,  
  33.             Field.Index.ANALYZED));  
  34.   
  35.     //将位置信息添加到索引中  
  36.     indexLocation(doc, jo);  
  37.   
  38.     writer.addDocument(doc);  
  39. }  
  40.   
  41. private void indexLocation(Document document, JSONObject jo)  
  42.         throws Exception {  
  43.   
  44.     double longitude = jo.getDouble("longitude");  
  45.     double latitude = jo.getDouble("latitude");  
  46.   
  47.     document.add(new Field("lat", NumericUtils  
  48.             .doubleToPrefixCoded(latitude), Field.Store.YES,  
  49.             Field.Index.NOT_ANALYZED));  
  50.     document.add(new Field("lng", NumericUtils  
  51.             .doubleToPrefixCoded(longitude), Field.Store.YES,  
  52.             Field.Index.NOT_ANALYZED));  
  53.   
  54.     for (int tier = startTier; tier <= endTier; tier++) {  
  55.         ctp = new CartesianTierPlotter(tier, projector,  
  56.                 CartesianTierPlotter.DEFALT_FIELD_PREFIX);  
  57.         final double boxId = ctp.getTierBoxId(latitude, longitude);  
  58.         document.add(new Field(ctp.getTierFieldName(), NumericUtils  
  59.                 .doubleToPrefixCoded(boxId), Field.Store.YES,  
  60.                 Field.Index.NOT_ANALYZED_NO_NORMS));  
  61.     }  
  62. }  


          搜索的具体实现:

 

 

  1. public List<String> search(String keyword, double longitude,  
  2.         double latitude, double range) throws Exception {  
  3.     List<String> result = new ArrayList<String>();  
  4.   
  5.     double miles = km2Mile(range);  
  6.       
  7.     DistanceQueryBuilder dq = new DistanceQueryBuilder(latitude,  
  8.             longitude, miles, "lat""lng",  
  9.             CartesianTierPlotter.DEFALT_FIELD_PREFIX, true, startTier,  
  10.             endTier);  
  11.   
  12.     //按照距离排序  
  13.     DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(  
  14.             dq.getDistanceFilter());  
  15.     Sort sort = new Sort(new SortField("geo_distance", dsort));  
  16.   
  17.     Query query = buildQuery(keyword);  
  18.   
  19.     //搜索结果  
  20.     TopDocs hits = indexSearcher.search(query, dq.getFilter(),  
  21.             Integer.MAX_VALUE, sort);  
  22.     //获得各条结果相对应的距离  
  23.     Map<Integer, Double> distances = dq.getDistanceFilter()  
  24.             .getDistances();  
  25.   
  26.     for (int i = 0; i < hits.totalHits; i++) {  
  27.         final int docID = hits.scoreDocs[i].doc;  
  28.   
  29.         final Document doc = indexSearcher.doc(docID);  
  30.   
  31.         final StringBuilder builder = new StringBuilder();  
  32.         builder.append("找到了: ")  
  33.                 .append(doc.get("title"))  
  34.                 .append(", 距离: ")  
  35.                 .append(mile2Meter(distances.get(docID)))  
  36.                 .append("米。");  
  37.         System.out.println(builder.toString());  
  38.   
  39.         result.add(builder.toString());  
  40.     }  
  41.   
  42.     return result;  
  43. }  
  44.   
  45. private Query buildQuery(String keyword) throws Exception {  
  46.     //如果没有指定关键字,则返回范围内的所有结果  
  47.     if (keyword == null || keyword.isEmpty()) {  
  48.         return new MatchAllDocsQuery();  
  49.     }  
  50.     QueryParser parser = new QueryParser(Version.LUCENE_32, "title",  
  51.             analyzer);  
  52.   
  53.     parser.setDefaultOperator(Operator.AND);  
  54.   
  55.     return parser.parse(keyword.toString());  
  56. }  

       

 

             执行测试用例,可以得到下面的结果:

 

  1. 找到了: 时尚码头美容美发热烫特价, 距离: 0米。  
  2. 找到了: 都美造型烫染美发护理套餐, 距离: 210米。  
  3. 找到了: 审美个人美容美发套餐, 距离: 426米。  
  4. 找到了: 小区东门时尚烫染个人护理美发套餐, 距离: 439米。  
  5. 4个匹配结果,共耗时 119毫秒。  
  6.   
  7. 找到了: 时尚码头美容美发热烫特价, 距离: 0米。  
  8. 找到了: 仅98元!享原价335元李老爹, 距离: 68米。  
  9. 找到了: 海底捞吃300送300, 距离: 210米。  
  10. 找到了: 都美造型烫染美发护理套餐, 距离: 210米。  
  11. 找到了: 审美个人美容美发套餐, 距离: 426米。  
  12. 找到了: 小区东门时尚烫染个人护理美发套餐, 距离: 439米。  
  13. 6个匹配结果,共耗时 3毫秒.  


            参考文献:

 

            Lucene-Spatial的原理介绍:http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm

            GeoHash:http://en.wikipedia.org/wiki/Geohash

            两篇示例(其中大部分代码就来自于这里):

            Spatial search with Lucene
            

      Lucene Spatial Example

 

            

     使用 Apache Lucene 和 Solr 进行位置感知搜索

阅读全文……

标签 : , ,