Elasticsearch 一键安装含中文分词

时间:2022-05-03
本文章向大家介绍Elasticsearch 一键安装含中文分词,主要内容包括24.1. 安装 Elasticsearch、24.4. 中文分词插件管理、24.4.2. 创建索引、24.4.3. 删除索引、24.4.4. 配置索引分词插件、基本概念、基础应用、原理机制和需要注意的事项等,并结合实例形式分析了其使用技巧,希望通过本文能帮助到大家理解应用这部分内容。

本文节选择电子书《Netkiller Database 手札》

24.1. 安装 Elasticsearch

使用 Netkiller OSCM 一键安装 Elasticsearch 5.2

# Java
curl -s https://raw.githubusercontent.com/oscm/shell/master/lang/java/openjdk/java-1.8.0-openjdk.sh | bash

# Install
curl -s https://raw.githubusercontent.com/oscm/shell/master/search/elasticsearch/elasticsearch-5.2.sh | bash

# Bind 0.0.0.0
curl -s https://raw.githubusercontent.com/oscm/shell/master/search/elasticsearch/network.bind_host.sh | bash

# Auto create index
curl -s https://raw.githubusercontent.com/oscm/shell/master/search/elasticsearch/action.auto_create_index.sh | bash

# elasticsearch-analysis-ik

curl -s https://raw.githubusercontent.com/oscm/shell/master/search/elasticsearch/elasticsearch-analysis-ik-5.2.2.sh | bash

24.4. 中文分词插件管理

24.4.1. 手工安装插件

curl -s https://raw.githubusercontent.com/oscm/shell/master/search/elasticsearch/elasticsearch-analysis-ik-5.2.2.sh | bash			

24.4.2. 创建索引

curl -XPUT http://localhost:9200/information			

24.4.3. 删除索引

如果索引已经存在请删除后重新创建索引

curl -XDELETE http://localhost:9200/information/news/_mapping?pretty
curl -XDELETE http://localhost:9200/information/?pretty			

24.4.4. 配置索引分词插件

			curl -XPOST http://localhost:9200/information/news/_mapping?pretty -d'
{
    "news": {
            "_all": {
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_max_word",
            "term_vector": "no",
            "store": "false"
        },
        "properties": {
            "content": {
                "type": "text",
                "store": "no",
                "term_vector": "with_positions_offsets",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word",
                "include_in_all": "true",
                "boost": 8
            }
        }
    }
}'			

24.4.4.1. 测试分词效果

				curl -XPOST http://localhost:9200/information/news/ -d'
{"title": "越南胡志明游记·教堂·管风琴的天籁之音","content":"这是我平生第一次去教堂,也是第一次完整的参加宗教仪式。当我驻足教堂外的时候,耳边传来天籁之音,是管风琴,确切的说是电子风琴。真正的管风琴造价昂贵,管风琴通常需要根据教堂尺寸定制,无法量产。我记得中国只有4座管风琴,深圳音乐厅有一座。"}
'
curl -XPOST http://localhost:9200/information/news/ -d'
{"title": "越南胡志明游记·信仰·法事","content":"佛经的形成过程是与佛教的发展相始终的,按照佛教发展的时间顺序,最早形成的是小乘佛教三藏,之后形成的是大乘佛教三藏,最后形成的是密宗三藏。"}
'

curl -XPOST http://localhost:9200/information/news/_search  -d'
{
    "query" : { "term" : { "content" : "佛经" }},
    "highlight" : {
        "pre_tags" : ["<strong>", "<strong>"],
        "post_tags" : ["</strong>", "</strong>"],
        "fields" : {
            "content" : {}
        }
    }
}'		

curl -XPOST http://localhost:9200/information/news/_search  -d'
{
    "query" : { "term" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["<b>", "<i>"],
        "post_tags" : ["</b>", "</i>"],
        "fields" : {
            "content" : {}
        }
    }
}'