Elasticsearch的CRU - 码农教程

近端时间在搬砖过程中对es进行了操作，但是对es查询文档不熟悉，所以这两周都在研究es，简略看了《Elasticsearch权威指南》，摸摸鱼又是一天。

es是一款基于Lucene的实时分布式搜索和分析引擎，今天咱不聊其应用场景，聊一下es索引增删改。

环境：Centos 7，Elasticsearch6.8.3，jdk8

（最新的es是7版本，7版本需要jdk11以上，所以装了es6.8.3版本。）

下面都将以student索引为例

一、创建索引

PUT   http://192.168.197.100:9200/student
{
    "mapping":{
      "_doc":{ //“_doc”是类型type，es6中一个索引下只有一个type，不能有其它type
        "properties":{
          "id": {
              "type": "keyword"
          },
          "name":{
            "type":"text",
            "index":"analyzed",
            "analyzer":"standard"
          },
          "age":{
            "type":"integer",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above":256
              }
            }
          },
          "birthday":{
            "type":"date"
          },
          "gender":{
            "type":"keyword"
          },
          "grade":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                 "ignore_above":256
              }
            }
          },
          "class":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                 "ignore_above":256
              }
            }
          }
        }
      }
    },
    "settings":{
      //主分片数量
      "number_of_shards" : 1, 
      //分片副本数量
      "number_of_replicas" : 1
    }
}

type属性是text和keyword的区别：

（1）text在查询的时候会被分词，用于搜索

（2）keyword在查询的时候不会被分词，用于聚合

index属性是表示字符串以何种方式被索引，有三种值

（1）analyzed：字段可以被模糊匹配，类似于sql中的like

（2）not_analyzed：字段只能精确匹配，类似于sql中的“=”

（3）no：字段不提供搜索

analyzer属性是设置分词器，中文的话一般是ik分词器，也可以自定义分词器。

number_of_shards属性是主分片数量，默认是5，创建之后不能修改

number_of_replicas属性时分片副本数量，默认是1，可以修改

创建成功之后会返回如下json字符串

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "student"
}

创建之后如何查看索引的详细信息呢？

GET http://192.168.197.100:9200/student/_mapping

es6版本，索引之下只能有一个类型，例如上文中的“_doc”。

es跟关系型数据库比较：

二、修改索引

//修改分片副本数量为2
PUT http://192.168.197.100:9200/student/_settings
{
  "number_of_replicas":2
}

三、删除索引

//删除单个索引 
DELETE http://192.168.197.100:9200/student

//删除所有索引
DELETE  http://192.168.197.100:9200/_all

四、默认分词器standard和ik分词器比较

es默认的分词器是standard，它对英文的分词是以空格分割的，中文则是将一个词分成一个一个的文字，所以其不适合作为中文分词器。

例如：standard对英文的分词

//此api是查看文本分词情况的 
POST http://192.168.197.100:9200/_analyze
{
  "text":"the People's Republic of China",
  "analyzer":"standard"
}

结果如下：

{
    "tokens": [
        {
            "token": "the",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "people's",
            "start_offset": 4,
            "end_offset": 12,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "republic",
            "start_offset": 13,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "of",
            "start_offset": 22,
            "end_offset": 24,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "china",
            "start_offset": 25,
            "end_offset": 30,
            "type": "<ALPHANUM>",
            "position": 4
        }
    ]
}

对中文的分词：

POST http://192.168.197.100:9200/_analyze
{
  "text":"中华人民共和国万岁",
  "analyzer":"standard"
}

结果如下：

{
    "tokens": [
        {
            "token": "中",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "华",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "人",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "民",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "共",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "和",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "万",
            "start_offset": 7,
            "end_offset": 8,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        },
        {
            "token": "岁",
            "start_offset": 8,
            "end_offset": 9,
            "type": "<IDEOGRAPHIC>",
            "position": 8
        }
    ]
}

ik分词器是支持对中文进行词语分割的，其有两个分词器，分别是ik_smart和ik_max_word。

（1）ik_smart：对中文进行最大粒度的划分，简略划分

例如：

POST http://192.168.197.100:9200/_analyze
{
  "text":"中华人民共和国万岁",
  "analyzer":"ik_smart"
}

结果如下：

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "万岁",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

（2）ik_max_word：对中文进行最小粒度的划分，将文本划分尽量多的词语

例如：

POST http://192.168.197.100:9200/_analyze
{
  "text":"中华人民共和国万岁",
  "analyzer":"ik_max_word"
}

结果如下：

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 8
        },
        {
            "token": "万岁",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "万",
            "start_offset": 7,
            "end_offset": 8,
            "type": "TYPE_CNUM",
            "position": 10
        },
        {
            "token": "岁",
            "start_offset": 8,
            "end_offset": 9,
            "type": "COUNT",
            "position": 11
        }
    ]
}

ik分词器对英文的分词：

POST http://192.168.197.100:9200/_analyze
{
  "text":"the People's Republic of China",
  "analyzer":"ik_smart"
}

结果如下：会将不重要的词去掉，但standard分词器会保留（英语水平已经退化到a an the都不知道是属于什么类型的词了，身为中国人，这个不能骄傲）

{
    "tokens": [
        {
            "token": "people",
            "start_offset": 4,
            "end_offset": 10,
            "type": "ENGLISH",
            "position": 0
        },
        {
            "token": "s",
            "start_offset": 11,
            "end_offset": 12,
            "type": "ENGLISH",
            "position": 1
        },
        {
            "token": "republic",
            "start_offset": 13,
            "end_offset": 21,
            "type": "ENGLISH",
            "position": 2
        },
        {
            "token": "china",
            "start_offset": 25,
            "end_offset": 30,
            "type": "ENGLISH",
            "position": 3
        }
    ]
}

五、添加文档

可以任意添加字段

//1是“_id”的值，唯一的，也可以随机生成
POST http://192.168.197.100:9200/student/_doc/1
{
  "id":1,
  "name":"tom",
  "age":20,
  "gender":"male",
  "grade":"7",
  "class":"1"
}

六、更新文档

POST http://192.168.197.100:9200/student/_doc/1/_update
{
  "doc":{
    "name":"jack"
  }
}

七、删除文档

//1是“_id”的值 
DELETE http://192.168.197.100:9200/student/_doc/1

上述就是简略的对es进行索引创建，修改，删除，文档添加，删除，修改等操作，为避免篇幅太长，文档查询操作将在下篇进行更新。