Elasticsearch 基本操作(CRUD)

Elasticsearch是一个建立在全文搜索引擎Apache Lucene(TM)基础上的搜索引擎，它可以快速地储存、搜索和分析海量数据。在使用它做全文搜索时，只需要使用REST API的操作接口。

安装

首先需要安装JAVA 8的环境。

进入https://www.elastic.co/cn/downloads/elasticsearch下载对应平台的elastic安装包。解压后，直接进入运行bin目录下elastic即可启动：

tar xzf elasticsearch-6.5.0.tar.gz
cd elasticsearch-6.5.0
bin/elasticsearch

elasticsearch也提供了docker的官方镜像，可以直接使用docker运行elasticsearch：

docker run -p 9200:9200 -d --name=elastic elasticsearch:6.4.1

如果一切正常，elastic就会在默认的9200端口运行，访问当前机器的9200端口，会输出如下信息：

[heql@ubuntu ~]$ curl localhost:9200
{
    "name" : "me8cxrE",
    "cluster_name" : "elasticsearch",
    "cluster_uuid" : "5e1tgNawRb61AmrjvDwrHA",
    "version" : {
        "number" : "6.5.0",
        "build_flavor" : "default",
        "build_type" : "tar",
        "build_hash" : "816e6f6",
        "build_date" : "2018-11-09T18:58:36.352602Z",
        "build_snapshot" : false,
        "lucene_version" : "7.5.0",
        "minimum_wire_compatibility_version" : "5.6.0",
        "minimum_index_compatibility_version" : "5.0.0"
    },
    "tagline" : "You Know, for Search"
}

基本概念

Node 和 Cluster

一个Elasticsearch的运行实例，是集群的构成单元。多个节点组成集群，对外提供服务。

索引(Index)

Elasticsearch将数据存储于一个或多个索引中，索引是具有类似特性的文档的集合。一个索引相当于关系型数据库中的一个数据库。索引由其名称(必须为全小写字符)进行标识，并通过引用此名称完成文档的创建、搜索、更新及删除操作。

每个索引都有自己的mapping定义，用于定义字段名和类型
一个集群可以由多个索引

类型(Type)

type相当于关系型数据库中的表。但是在Elastic 6.x 版只允许每个 Index包含一个type，7.x 版将会彻底移除type。

文档(Document)

用户存储在Elasticsearch中的数据文档，是Elasticsearch中的最小单位，类似关系型数据库中一行记录。在Elasticsearch中用Json Object表示。

Index API

Elasticsearch提供了操作Index的API，用于创建、查询、删除Index。

创建

如下创建一个test_index的Index，URL的参数pretty=true表示以易读的格式返回：

[heql@ubuntu ~]$ curl -X PUT 'localhost:9200/test_index?pretty=true'

服务器返回一个JSON对象，"acknowledged" : true字段表示操作成功：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test_index"
}

查询

如下可以查询Elasticsearch已经创建的Index：

[heql@ubuntu ~]$ curl -X GET 'localhost:9200/_cat/indices'
green  open .kibana_1      u-xmAJQ4QJyo7Quo5L4eGg 1 0     3 0  9.4kb  9.4kb
green  open .kibana_2      A01iwR_oRem-oeGPQMVSgg 1 0    10 1 61.3kb 61.3kb
green  open .tasks         Ray_rqVcT76opeY0eU0tog 1 0     1 0  6.2kb  6.2kb
yellow open dating_profile LciVfunhSpKlDP6JsKaq_Q 5 1 12954 0  3.6mb  3.6mb
yellow open test_index     c9iGVakfQa-GkyM8P32-jA 5 1     0 0  1.2kb  1.2kb

删除

如下删除一个test_index的Index，服务器返回一个JSON对象，"acknowledged" : true字段表示操作成功：

[heql@ubuntu ~]$ curl -X DELETE 'localhost:9200/test_index?pretty=true'
{
    "acknowledged" : true
}

Document API

Elasticsearch提供了操作Document的API，用于创建、查询、更新、删除Document。

创建

下面向Index为test_index、Type为person新增一条id为1记录：

[heql@ubuntu ~]$ curl -H "Content-Type: application/json" -X PUT 'localhost:9200/test_index/person/1?pretty=true' -d '
{
    "username": "alfred",
    "age": 1
}'

服务器返回的 JSON对象，会给出Index、Type、Id、Version 等信息：

{
  "_index" : "test_index",
  "_type" : "person",
  "_id" : "1",
  "_version" : 3,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

创建文档时，如果索引不存在，Elasticsearch会自动创建对应的index和type。

文档的id不一定是数字，任意字符串（比如abc）都可以。创建文档时也可以不指定id，Elasticsearch会随机生成一个对应文档的id，这时要改成POST请求：

[heql@ubuntu ~]$ curl -H "Content-Type: application/json" -X POST 'localhost:9200/test_index/person?pretty=true' -d '
{
    "username": "tom",
    "age": 20
}'

Elasticsearch会生成一个对应文档的id为vMVGsWcBcp5-QXcUXGs6：

{
  "_index" : "test_index",
  "_type" : "person",
  "_id" : "vMVGsWcBcp5-QXcUXGs6",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

查询

向/Index/Type/Id发出GET请求，就可以查看这条记录：

[heql@ubuntu ~]$ curl 'localhost:9200/test_index/person/1?pretty=true'

返回的数据中，found字段表示查询成功，_source字段返回原始记录。

{
  "_index" : "test_index",
  "_type" : "person",
  "_id" : "1",
  "_version" : 3,
  "found" : true,
  "_source" : {
    "username" : "alfred",
    "age" : 1
  }
}

如果文档不存在，就查不到数据，found字段就是false:

[heql@ubuntu ~]$ curl 'localhost:9200/test_index/person/2?pretty=true'
{
    "_index" : "test_index",
    "_type" : "person",
    "_id" : "2",
    "found" : false
}

查询所有的文档：

[heql@ubuntu ~]$ curl 'localhost:9200/test_index/person/_search?pretty=true'

took：查询耗时，单位ms。total：符合条件的总文档数。hits：返回的文档详情数据数据，默认返回前10个文档。

{
  "took" : 219,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "person",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "username" : "alfred",
          "age" : 1
        }
      },
      {
        "_index" : "test_index",
        "_type" : "person",
        "_id" : "vMVGsWcBcp5-QXcUXGs6",
        "_score" : 1.0,
        "_source" : {
          "username" : "tom",
          "age" : 20
        }
      }
    ]
  }
}

更新

更新记录就是使用PUT请求，重新发送一次数据：

[heql@ubuntu ~]$ curl -H "Content-Type: application/json" -X PUT 'localhost:9200/test_index/person/1?pretty=true' -d '
{
    "username": "alfred",
    "age": 18
}'

上面将age为1更新为18，其他的字段保持不变，返回的结果中version从1变成2，result从created变成updated:

{
  "_index" : "test_index",
  "_type" : "person",
  "_id" : "1",
  "_version" : 4,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

删除

删除记录就是发出DELETE请求:

[heql@ubuntu ~]$ curl -X DELETE 'localhost:9200/test_index/person/vMVGsWcBcp5-QXcUXGs6?pretty=true'
{
    "_index" : "test_index",
    "_type" : "person",
    "_id" : "vMVGsWcBcp5-QXcUXGs6",
    "_version" : 2,
    "result" : "deleted",
    "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
    },
    "_seq_no" : 5,
    "_primary_term" : 1
}

使用 kibana 操作 Elasticsearch

下载对应版本的Elasticsearch的kibana安装包，解压后直接运行bin目录下的kibana，如果Elasticsearch和kibana不在同一台机器，则需要修改config/kibana.yml的Elasticsearch对应的URL：

[heql@ubuntu kibana-6.5.0-linux-x86_64]$ bin/kibana

启动完成后，在浏览器中输入http://192.168.1.138:5601即可看到如下界面：

点击Dev Tools，即可使用相应的api操作Elasticsearch，如下面向Elasticsearch插入一条id为2的文档：

PUT /test_index/person/2
{
    "username": "tom",
    "age": 20
}

批量创建文档

Elasticsearch允许一次创建多个文档，从而减少网络传输开销，提升写入的速率。

如下，进行了3个操作：使用index创建了一个id为3的文档（index和create的区别在于create在文档存在的时候会报错、index在文档存在时，不会报错会直接更新文档）、删除id为1的文档、更新id为2的文档：

POST /_bulk
{"index":{"_index":"test_index","_type":"person","_id":"3"}}
{"username":"tom","age":20}
{"delete":{"_index":"test_index","_type":"person","_id":"1"}}
{"update":{"_id":"2","_index":"test_index","_type":"person"}}
{"doc":{"age":20}}

服务器返回如下内容，errors表示有没有错误，items表示每个操作的结果：

{
  "took" : 25,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test_index",
        "_type" : "person",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "delete" : {
        "_index" : "test_index",
        "_type" : "person",
        "_id" : "1",
        "_version" : 5,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 6,
        "_primary_term" : 1,
        "status" : 200
      }
    },
    {
      "update" : {
        "_index" : "test_index",
        "_type" : "person",
        "_id" : "2",
        "_version" : 1,
        "result" : "noop",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "status" : 200
      }
    }
  ]
}

批量查询

Elasticsearch允许一次查询多个文档：

如下：查询id为2和3的文档：

GET /_mget
{
    "docs": [
        {"_index": "test_index", "_type": "person", "_id": 2},
        {"_index": "test_index", "_type": "person", "_id": 3}
    ]
}

服务器返回每个查询的内容：

{
  "docs" : [
    {
      "_index" : "test_index",
      "_type" : "person",
      "_id" : "2",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "username" : "tom",
        "age" : 20
      }
    },
    {
      "_index" : "test_index",
      "_type" : "person",
      "_id" : "3",
      "_version" : 1,
      "found" : true,
      "_source" : {
        "username" : "tom",
        "age" : 20
      }
    }
  ]
}