Apache Solr vs ElasticSearch - the Feature Smackdown!
API
Feature | Solr 4.7.0 | ElasticSearch 1.0 |
---|---|---|
Format | XML,CSV,JSON | JSON |
HTTP REST API | ||
Binary API | SolrJ | TransportClient, Thrift (through a plugin) |
JMX support | ES specific stats are exposed through the REST API | |
Client libraries | PHP, Ruby, Perl, Scala, Python, .NET, Javascript | PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Erlang, Clojure |
3rd-party product integration (open-source) | Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) | Drupal, Django, Symfony2, Wordpress, CouchBase |
3rd-party product integration (commercial) | DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR | SearchBlox, Hortonworks Data Platform, MapR |
Output | JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java | JSON, XML/HTML (via plugin) |
Indexing
Searching
Feature | Solr 4.7.0 | ElasticSearch 1.0 |
---|---|---|
Lucene Query parsing | ||
Structured Query DSL | Need to programmatically create queries if going beyond Lucene query syntax. | |
Span queries | via SOLR-2703 | |
Spatial search | ||
Multi-point spatial search | ||
Faceting | The way top N facets work now is by getting the top N from each shard, and merging the results. This can giveincorrect counts when num shards > 1. | |
Advanced Faceting | blog post | |
Pivot Facets | ||
More Like This | ||
Boosting by functions | ||
Boosting using scripting languages | ||
Push Queries | JIRA issue | Percolation. Distributed percolation supported in 1.0 |
Field collapsing/Results grouping | possibly 1.0+ link | |
Spellcheck | Suggest API | |
Autocomplete | Added in 0.90.3 here | |
Query elevation | workaround | |
Joins | It's not supported in distributed search. See LUCENE-3759. | via has_children and top_children queries |
Resultset Scrolling | New to 4.7.0 | via scan search type |
Filter queries | also supports filtering by native scripts | |
Filter execution order | local params and cache property | _cache and _cache_key property |
Alternative QueryParsers | DisMax, eDisMax | query_string, dis_max, match, multi_match etc |
Negative boosting | but awkward. Involves positively boosting the inverse set of negatively-boosted documents. | |
Search across multiple indexes | it can search across multiple compatible collections | |
Result highlighting | ||
Custom Similarity | ||
Searcher warming on index reload | Warmers API |
Customizability
Distributed
Feature | Solr 4.7.0 | ElasticSearch 1.0 |
---|---|---|
Self-contained cluster | Depends on separate ZooKeeper server | Only ElasticSearch nodes |
Automatic node discovery | ZooKeeper | internal Zen Discovery or ZooKeeper |
Partition tolerance | The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function. | Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this |
Automatic failover | If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards. | |
Automatic leader election | ||
Shard replication | ||
Sharding | ||
Automatic shard rebalancing | it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags. | |
Change # of shards | Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime. | each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime. |
Relocate shards and replicas | can be done by creating a shard replicate on the desired node and then removing the shard from the source node | can move shards and replicas to any node in the cluster on demand |
Control shard routing | shards or _route_ parameter | routing parameter |
Consistency | Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index. | Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available. |
Misc
Feature | Solr 4.7.0 | ElasticSearch 1.0 |
---|---|---|
Web Admin interface | bundled with Solr | via site plugins: elasticsearch-head, bigdesk, kopf,elasticsearch-HQ, Hammer |
Hosting providers | WebSolr, Searchify, Hosted-Solr, IndexDepot, OpenSolr,gotosolr | bonsai.io, Indexisto, qbox.io, IndexDepot |
Thoughts...
As a number of folks point out in the discussion below, feature comparisons are inherently shallow and only go so far. I think they serve a purpose, but shouldn't be taken to be the last word on these 2 fantastic search products.
If you're running a smallish site and need search features without fancy bells-and-whistles, I think you'll be very happy with either Solr or ElasticSearch.
I've found ElasticSearch to be friendlier to teams which are used to REST APIs, JSON etc and don't have a Java background. If you're planning a large installation that requires running distributed search instances, I suspect you're also going to be happier with ElasticSearch.
As Matt Weber points out below, ElasticSearch was built to be distributed from the ground up, not tacked on as an 'afterthought' like it was with Solr. This is totally evident when examining the design and architecture of the 2 products, and also when browsing the source code.
Resources
- My other sites may be of interest if you're new to Lucene, Solr and ElasticSearch:
- The Solr wiki and the ElasticSearch Guide are your friends.