使用ElasticSearch完成百万级数据查询附近的人功能 - tianyaleixiaowu的专栏 - CSDN博客

标签： | 发表时间：2018-11-08 13:45 | 作者：

出处：https://blog.csdn.net

我们来看一下使用ElasticSearch完成大数据量查询附近的人功能，搜索N米范围的内的数据。

准备环境

本机测试使用了ElasticSearch最新版5.5.1，SpringBoot1.5.4，spring-data-ElasticSearch2.1.4.

新建Springboot项目，勾选ElasticSearch和web。

pom文件如下

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<groupId>com.tianyalei</groupId>

<artifactId>elasticsearch</artifactId>

<version>0.0.1-SNAPSHOT</version>

<name>elasticsearch</name>

<description>Demo project for Spring Boot</description>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-parent</artifactId>

<version>1.5.4.RELEASE</version>

</parent>

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>

<java.version>1.8</java.version>

</properties>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-data-elasticsearch</artifactId>

</dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-web</artifactId>

</dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-test</artifactId>

</dependency>

</dependency>

</dependencies>

<build>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-maven-plugin</artifactId>

</plugin>

</plugins>

</build>

</project>

新建model类Person

package com.tianyalei.elasticsearch.model;

import org.springframework.data.annotation.Id;

import org.springframework.data.elasticsearch.annotations.Document;

import org.springframework.data.elasticsearch.annotations.GeoPointField;

import java.io.Serializable;

/**

* model类

@Document(indexName="elastic_search_project",type="person",indexStoreType="fs",shards=5,replicas=1,refreshInterval="-1")

public class Person implements Serializable {

@Id

private int id;

private String name;

private String phone;

/**

* 地理位置经纬度

* lat纬度，lon经度 "40.715,-74.011"

* 如果用数组则相反[-73.983, 40.719]

@GeoPointField

private String address;

public int getId() {

return id;

}

public void setId(int id) {

this.id = id;

}

public String getName() {

return name;

}

public void setName(String name) {

this.name = name;

}

public String getPhone() {

return phone;

}

public void setPhone(String phone) {

this.phone = phone;

}

public String getAddress() {

return address;

}

public void setAddress(String address) {

this.address = address;

}

我用address字段表示经纬度位置。注意，使用String[]和String分别来表示经纬度时是不同的，见注释。

import com.tianyalei.elasticsearch.model.Person;

import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

public interface PersonRepository extends ElasticsearchRepository<Person, Integer> {

}

看一下Service类，完成插入测试数据的功能，查询的功能我放在Controller里了，为了方便查看，正常是应该放在Service里

package com.tianyalei.elasticsearch.service;

import com.tianyalei.elasticsearch.model.Person;

import com.tianyalei.elasticsearch.repository.PersonRepository;

import org.springframework.beans.factory.annotation.Autowired;

import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;

import org.springframework.data.elasticsearch.core.query.IndexQuery;

import org.springframework.stereotype.Service;

import java.util.ArrayList;

import java.util.List;

@Service

public class PersonService {

@Autowired

PersonRepository personRepository;

@Autowired

ElasticsearchTemplate elasticsearchTemplate;

private static final String PERSON_INDEX_NAME = "elastic_search_project";

private static final String PERSON_INDEX_TYPE = "person";

public Person add(Person person) {

return personRepository.save(person);

}

public void bulkIndex(List<Person> personList) {

int counter = 0;

try {

if (!elasticsearchTemplate.indexExists(PERSON_INDEX_NAME)) {

elasticsearchTemplate.createIndex(PERSON_INDEX_TYPE);

}

List<IndexQuery> queries = new ArrayList<>();

for (Person person : personList) {

IndexQuery indexQuery = new IndexQuery();

indexQuery.setId(person.getId() + "");

indexQuery.setObject(person);

indexQuery.setIndexName(PERSON_INDEX_NAME);

indexQuery.setType(PERSON_INDEX_TYPE);

//上面的那几步也可以使用IndexQueryBuilder来构建

//IndexQuery index = new IndexQueryBuilder().withId(person.getId() + "").withObject(person).build();

queries.add(indexQuery);

if (counter % 500 == 0) {

elasticsearchTemplate.bulkIndex(queries);

queries.clear();

System.out.println("bulkIndex counter : " + counter);

}

counter++;

}

if (queries.size() > 0) {

elasticsearchTemplate.bulkIndex(queries);

}

System.out.println("bulkIndex completed.");

} catch (Exception e) {

System.out.println("IndexerService.bulkIndex e;" + e.getMessage());

throw e;

}

注意看bulkIndex方法，这个是批量插入数据用的，bulk也是ES官方推荐使用的批量插入数据的方法。这里是每逢500的整数倍就bulk插入一次。

package com.tianyalei.elasticsearch.controller;

import com.tianyalei.elasticsearch.model.Person;

import com.tianyalei.elasticsearch.service.PersonService;

import org.elasticsearch.common.unit.DistanceUnit;

import org.elasticsearch.index.query.GeoDistanceQueryBuilder;

import org.elasticsearch.index.query.QueryBuilders;

import org.elasticsearch.search.sort.GeoDistanceSortBuilder;

import org.elasticsearch.search.sort.SortBuilders;

import org.elasticsearch.search.sort.SortOrder;

import org.springframework.beans.factory.annotation.Autowired;

import org.springframework.data.domain.PageRequest;

import org.springframework.data.domain.Pageable;

import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;

import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;

import org.springframework.data.elasticsearch.core.query.SearchQuery;

import org.springframework.web.bind.annotation.GetMapping;

import org.springframework.web.bind.annotation.RestController;

import java.text.DecimalFormat;

import java.util.ArrayList;

import java.util.List;

import java.util.Random;

@RestController

public class PersonController {

@Autowired

PersonService personService;

@Autowired

ElasticsearchTemplate elasticsearchTemplate;

@GetMapping("/add")

public Object add() {

double lat = 39.929986;

double lon = 116.395645;

List<Person> personList = new ArrayList<>(900000);

for (int i = 100000; i < 1000000; i++) {

double max = 0.00001;

double min = 0.000001;

Random random = new Random();

double s = random.nextDouble() % (max - min + 1) + max;

DecimalFormat df = new DecimalFormat("######0.000000");

// System.out.println(s);

String lons = df.format(s + lon);

String lats = df.format(s + lat);

Double dlon = Double.valueOf(lons);

Double dlat = Double.valueOf(lats);

Person person = new Person();

person.setId(i);

person.setName("名字" + i);

person.setPhone("电话" + i);

person.setAddress(dlat + "," + dlon);

personList.add(person);

}

personService.bulkIndex(personList);

// SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(QueryBuilders.queryStringQuery("spring boot OR 书籍")).build();

// List<Article> articles = elas、ticsearchTemplate.queryForList(se、archQuery, Article.class);

// for (Article article : articles) {

// System.out.println(article.toString());

// }

return "添加数据";

}

/**

geo_distance: 查找距离某个中心点距离在一定范围内的位置

geo_bounding_box: 查找某个长方形区域内的位置

geo_distance_range: 查找距离某个中心的距离在min和max之间的位置

geo_polygon: 查找位于多边形内的地点。

sort可以用来排序

@GetMapping("/query")

public Object query() {

double lat = 39.929986;

double lon = 116.395645;

Long nowTime = System.currentTimeMillis();

//查询某经纬度100米范围内

GeoDistanceQueryBuilder builder = QueryBuilders.geoDistanceQuery("address").point(lat, lon)

.distance(100, DistanceUnit.METERS);

GeoDistanceSortBuilder sortBuilder = SortBuilders.geoDistanceSort("address")

.point(lat, lon)

.unit(DistanceUnit.METERS)

.order(SortOrder.ASC);

Pageable pageable = new PageRequest(0, 50);

NativeSearchQueryBuilder builder1 = new NativeSearchQueryBuilder().withFilter(builder).withSort(sortBuilder).withPageable(pageable);

SearchQuery searchQuery = builder1.build();

//queryForList默认是分页，走的是queryForPage，默认10个

List<Person> personList = elasticsearchTemplate.queryForList(searchQuery, Person.class);

System.out.println("耗时：" + (System.currentTimeMillis() - nowTime));

return personList;

}

看Controller类，在add方法中，我们插入90万条测试数据，随机产生不同的经纬度地址。

在查询方法中，我们构建了一个查询100米范围内、按照距离远近排序，分页每页50条的查询条件。如果不指明Pageable的话，ESTemplate的queryForList默认是10条，通过源码可以看到。

启动项目，先执行add，等待百万数据插入，大概几十秒。

然后执行查询，看一下结果。

第一次查询花费300多ms，再次查询后时间就大幅下降，到30ms左右，因为ES已经自动缓存到内存了。

可见，ES完成地理位置的查询还是非常快的。适用于查询附近的人、范围查询之类的功能。

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

后记，在后来的使用中，Elasticsearch2.3版本时，按上面的写法出现了geo类型无法索引的情况，进入es的为String，而不是标注的geofiled。在此记录一下解决方法，将String类型修改为GeoPoint，且是org.springframework.data.elasticsearch.core.geo.GeoPoint包下的。然后需要在创建index时，显式调用一下mapping方法，才能正确的映射为geofield。

如下

if (!elasticsearchTemplate.indexExists("abc")) {

elasticsearchTemplate.createIndex("abc");

elasticsearchTemplate.putMapping(Person.class);

}

---------------------

使用ElasticSearch完成百万级数据查询附近的人功能 - tianyaleixiaowu的专栏 - CSDN博客

相关 [elasticsearch 百万数据] 推荐：