ElasticSearch位置搜索 - Spring , Hadoop, Spark , BI , ML - CSDN博客
在ElasticSearch中,地理位置通过 geo_point
这个数据类型来支持。地理位置的数据需要提供经纬度信息,当经纬度不合法时,ES会拒绝新增文档。这种类型的数据支持距离计算,范围查询等。在底层,索引使用 Geohash实现。
1、创建索引
PUT创建一个索引 cn_large_cities
, mapping
为city:
{
"mappings":{
"city":{
"properties":{
"city":{"type":"string"},
"state":{"type":"string"},
"location":{"type":"geo_point"}}}}}
geo_point类型必须显示指定,ES无法从数据中推断。在ES中,位置数据可以通过对象,字符串,数组三种形式表示,分别如下:
#"lat,lon""location":"40.715,-74.011""location": {
"lat":40.715,
"lon":-74.011}
# [lon ,lat]"location":[-74.011,40.715]
POST下面4条测试数据:
{"city":"Beijing", "state":"BJ","location":{"lat":"39.91667", "lon":"116.41667"}}
{"city":"Shanghai", "state":"SH","location":{"lat":"34.50000", "lon":"121.43333"}}
{"city":"Xiamen", "state":"FJ","location":{"lat":"24.46667", "lon":"118.10000"}}
{"city":"Fuzhou", "state":"FJ","location":{"lat":"26.08333", "lon":"119.30000"}}
{"city":"Guangzhou", "state":"GD","location":{"lat":"23.16667", "lon":"113.23333"}}
查看全部文档:
curl -XGET"http://localhost:9200/cn_large_cities/city/_search?pretty=true"
返回全部的5条数据,score均为1:
2、位置过滤
ES中有4中位置相关的过滤器,用于过滤位置信息:
- geo_distance: 查找距离某个中心点距离在一定范围内的位置
- geo_bounding_box: 查找某个长方形区域内的位置
- geo_distance_range: 查找距离某个中心的距离在min和max之间的位置
- geo_polygon: 查找位于多边形内的地点。
geo_distance
该类型过滤器查找的范围如下图:
下面是一个查询例子:
{
"query":{
"filtered":{
"filter":{
"geo_distance":"1km",
"location":{
"lat":40.715,
"lon":-73.988}}}}}
以下查询,查找距厦门500公里以内的城市:
{
"query":{
"filtered":{
"filter":{
"geo_distance" :{
"distance" :"500km",
"location" :{
"lat" :24.46667,
"lon" :118.10000}}}}}}
geo_distance_range
{
"query":{
"filtered":{
"filter":{
"geo_distance_range":{
"gte":"1km",
"lt":"2km",
"location":{
"lat":40.715,
"lon":-73.988}}}}}
geo_bounding_box
{
"query":{
"filtered":{
"filter":{
"geo_bounding_box":{
"location":{
"top_left":{
"lat":40.8,
"lon":-74.0},
"bottom_right":{
"lat":40.715,
"lon":-73.0}}}}}}
3、按距离排序
接着我们按照距离厦门远近查找:
{
"sort" :[
{
"_geo_distance" :{
"location" :{
"lat" :24.46667,
"lon" :118.10000},
"order" :"asc",
"unit" :"km"}}
],
"query":{
"filtered" :{
"query" :{
"match_all" :{}}}}}
结果如下,依次是厦门、福州、广州…。符合我们的常识:
{
"took":8,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0},
"hits":{
"total":5,
"max_score":null,
"hits":[
{
"_index":"us_large_cities",
"_type":"city",
"_id":"AVaiSGXXjL0tfmRppc_p",
"_score":null,
"_source":{
"city":"Xiamen",
"state":"FJ",
"location":{
"lat":"24.46667",
"lon":"118.10000"}},
"sort":[0]},
{
"_index":"us_large_cities",
"_type":"city",
"_id":"AVaiSSuNjL0tfmRppc_r",
"_score":null,
"_source":{
"city":"Fuzhou",
"state":"FJ",
"location":{
"lat":"26.08333",
"lon":"119.30000"}},
"sort":[216.61105485607183]},
{
"_index":"us_large_cities",
"_type":"city",
"_id":"AVaiSd02jL0tfmRppc_s",
"_score":null,
"_source":{
"city":"Guangzhou",
"state":"GD",
"location":{
"lat":"23.16667",
"lon":"113.23333"}},
"sort":[515.9964950041397]},
{
"_index":"us_large_cities",
"_type":"city",
"_id":"AVaiR7_5jL0tfmRppc_o",
"_score":null,
"_source":{
"city":"Shanghai",
"state":"SH",
"location":{
"lat":"34.50000",
"lon":"121.43333"}},
"sort":[1161.512141925948]},
{
"_index":"us_large_cities",
"_type":"city",
"_id":"AVaiRwLUjL0tfmRppc_n",
"_score":null,
"_source":{
"city":"Beijing",
"state":"BJ",
"location":{
"lat":"39.91667",
"lon":"116.41667"}},
"sort":[1725.4543712286697]}
]}}
结果返回的sort字段是指公里数。加上限制条件,只返回最近的一个城市:
{
"from":0,
"size":1,
"sort" :[
{
"_geo_distance" :{
"location" :{
"lat" :24.46667,
"lon" :118.10000},
"order" :"asc",
"unit" :"km"}}
],
"query":{
"filtered" :{
"query" :{
"match_all" :{}}}}}
4、地理位置聚合
ES提供了3种位置聚合:
- geo_distance: 根据到特定中心点的距离聚合
- geohash_grid: 根据Geohash的单元格(cell)聚合
- geo_bounds: 根据区域聚合
4.1 geo_distance聚合
下面这个查询根据距离厦门的距离来聚合,返回0-500,500-8000km的聚合:
{
"query":{
"filtered":{
"filter":{
"geo_distance" :{
"distance" :"10000km",
"location" :{
"lat" :24.46667,
"lon" :118.10000}}}}},
"aggs":{
"per_ring":{
"geo_distance":{
"field":"location",
"unit":"km",
"origin":{
"lat" :24.46667,
"lon" :118.10000},
"ranges":[
{"from":0, "to":500},
{"from":500, "to":8000}
]}}}}
返回的聚合结果如下;
"aggregations": {
"per_ring":{
"buckets":[
{
"key":"*-500.0",
"from":0,
"from_as_string":"0.0",
"to":500,
"to_as_string":"500.0",
"doc_count":2},
{
"key":"500.0-8000.0",
"from":500,
"from_as_string":"500.0",
"to":8000,
"to_as_string":"8000.0",
"doc_count":3}
]}}
可以看到,距离厦门0-500km的城市有2个,500-8000km的有3个。
4.2 geohash_grid聚合
该聚合方式根据geo_point数据对应的geohash值所在的cell进行聚合,cell的划分精度通过 precision
属性来控制,精度是指cell划分的次数。
{
"query":{
"filtered":{
"filter":{
"geo_distance" :{
"distance" :"10000km",
"location" :{
"lat" :24.46667,
"lon" :118.10000}}}}},
"aggs":{
"grid_agg":{
"geohash_grid":{
"field":"location",
"precision":2}}}}
聚合结果如下:
"aggregations": {
"grid_agg":{
"buckets":[
{
"key":"ws",
"doc_count":3},
{
"key":"wx",
"doc_count":1},
{
"key":"ww",
"doc_count":1}
]}}
可以看到,有3个城市的的geohash值为ws。将精度提高到5,聚合结果如下:
"aggregations": {
"grid_agg":{
"buckets":[
{
"key":"wx4g1",
"doc_count":1},
{
"key":"wwnk7",
"doc_count":1},
{
"key":"wssu6",
"doc_count":1},
{
"key":"ws7gp",
"doc_count":1},
{
"key":"ws0eb",
"doc_count":1}
]}}
4.3 geo_bounds聚合
这个聚合操作计算能够覆盖所有查询结果中geo_point的最小区域,返回的是覆盖所有位置的最小矩形:
{
"query":{
"filtered":{
"filter":{
"geo_distance" :{
"distance" :"10000km",
"location" :{
"lat" :24.46667,
"lon" :118.10000}}}}},
"aggs":{
"map-zoom":{
"geo_bounds":{
"field":"location"}}}}
结果如下:
"aggregations": {
"map-zoom":{
"bounds":{
"top_left":{
"lat":39.91666993126273,
"lon":113.2333298586309},
"bottom_right":{
"lat":23.16666992381215,
"lon":121.43332997336984}}}}
也就是说,这两个点构成的矩形能够包含所有到厦门距离10000km的区域。我们把距离调整为500km,此时覆盖这些城市的矩形如下:
"aggregations": {
"map-zoom":{
"bounds":{
"top_left":{
"lat":26.083329990506172,
"lon":118.0999999679625},
"bottom_right":{
"lat":24.46666999720037,
"lon":119.29999999701977}}}}
5、参考资料
图解 MongoDB 地理位置索引的实现原理: http://blog.nosqlfan.com/html/1811.html
Geopoint数据类型: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html