跳至主要內容

Elasticsearch之pinyin(ES拼音插件)安装教程

zhengcog...大约 2 分钟博文搜索引擎Elasticsearch

插件介绍

在电商网站(如:某东,某宝)中搜索商品的时候,输入拼音也能搜索到商品,那么在elasticsearch是如何实现的呢?答案就是安装pinyin插件,下面我会教大家如何安装并简单使用,如有错误,请评论指正,谢谢!

开始安装

基于elasticsearch6.2.4安装,操作系统:Mac OS

1. 使用 elasticsearch-plugin 工具安装

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.2.4/elasticsearch-analysis-pinyin-6.2.4.zip

2. 重启elasticsearch

使用栗子

1. 使用自定义拼音分析器创建索引

PUT /medcl/ 
{
  "index" : {
      "analysis" : {
          "analyzer" : {
              "pinyin_analyzer" : {
                  "tokenizer" : "my_pinyin"
                  }
          },
          "tokenizer" : {
              "my_pinyin" : {
                  "type" : "pinyin",
                  "keep_separate_first_letter" : false,
                  "keep_full_pinyin" : true,
                  "keep_original" : true,
                  "limit_first_letter_length" : 16,
                  "lowercase" : true,
                  "remove_duplicated_term" : true
              }
          }
      }
  }
}

2. 测试分析器,分析一个中文名字,比如刘德华

GET /medcl/_analyze
{
  "text": ["刘德华"],
  "analyzer": "pinyin_analyzer"
}

结果:

{
 "tokens" : [
   {
     "token" : "liu",
     "start_offset" : 0,
     "end_offset" : 1,
     "type" : "word",
     "position" : 0
   },
   {
     "token" : "de",
     "start_offset" : 1,
     "end_offset" : 2,
     "type" : "word",
     "position" : 1
   },
   {
     "token" : "hua",
     "start_offset" : 2,
     "end_offset" : 3,
     "type" : "word",
     "position" : 2
   },
   {
     "token" : "刘德华",
     "start_offset" : 0,
     "end_offset" : 3,
     "type" : "word",
     "position" : 3
   },
   {
     "token" : "ldh",
     "start_offset" : 0,
     "end_offset" : 3,
     "type" : "word",
     "position" : 4
   }
 ]
}

3. 创建mapping

POST /medcl/folks/_mapping 
{
  "folks": {
      "properties": {
          "name": {
              "type": "keyword",
              "fields": {
                  "pinyin": {
                      "type": "text",
                      "store": false,
                      "term_vector": "with_offsets",
                      "analyzer": "pinyin_analyzer",
                      "boost": 10
                  }
              }
          }
      }
  }
}

4. 添加测试文档

POST /medcl/folks/andy 
{"name":"刘德华"}

5. 搜索

http://localhost:9200/medcl/folks/_search?q=name:刘德华
curl http://localhost:9200/medcl/folks/_search?q=name.pinyin:刘德华
curl http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu
curl http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh
curl http://localhost:9200/medcl/folks/_search?q=name.pinyin:de+hua

6. 使用Pinyin-TokenFilter

PUT /medcl1/ 
{
  "index" : {
      "analysis" : {
          "analyzer" : {
              "user_name_analyzer" : {
                  "tokenizer" : "whitespace",
                  "filter" : "pinyin_first_letter_and_full_pinyin_filter"
              }
          },
          "filter" : {
              "pinyin_first_letter_and_full_pinyin_filter" : {
                  "type" : "pinyin",
                  "keep_first_letter" : true,
                  "keep_full_pinyin" : false,
                  "keep_none_chinese" : true,
                  "keep_original" : false,
                  "limit_first_letter_length" : 16,
                  "lowercase" : true,
                  "trim_whitespace" : true,
                  "keep_none_chinese_in_first_letter" : true
              }
          }
      }
  }
}

Token Test: 刘德华 张学友 郭富城 黎明 四大天王

GET /medcl/_analyze
{
"text": ["刘德华 张学友 郭富城 黎明 四大天王"],
"analyzer": "user_name_analyzer"
}

结果:

{
"tokens" : [
  {
    "token" : "ldh",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 0
  },
  {
    "token" : "zxy",
    "start_offset" : 4,
    "end_offset" : 7,
    "type" : "word",
    "position" : 1
  },
  {
    "token" : "gfc",
    "start_offset" : 8,
    "end_offset" : 11,
    "type" : "word",
    "position" : 2
  },
  {
    "token" : "lm",
    "start_offset" : 12,
    "end_offset" : 14,
    "type" : "word",
    "position" : 3
  },
  {
    "token" : "sdtw",
    "start_offset" : 15,
    "end_offset" : 19,
    "type" : "word",
    "position" : 4
  }
]
}

7. Used in phrase query

  • option 1
PUT /medcl/
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin"
                    }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "keep_first_letter":false,
                    "keep_separate_first_letter" : false,
                    "keep_full_pinyin" : true,
                    "keep_original" : false,
                    "limit_first_letter_length" : 16,
                    "lowercase" : true
                }
            }
        }
    }
}

GET /medcl/folks/_search
{
  "query": {"match_phrase": {
    "name.pinyin": "刘德华"
  }}
}
  • option2
PUT /medcl/
{
   "index" : {
       "analysis" : {
           "analyzer" : {
               "pinyin_analyzer" : {
                   "tokenizer" : "my_pinyin"
                   }
           },
           "tokenizer" : {
               "my_pinyin" : {
                   "type" : "pinyin",
                   "keep_first_letter":false,
                   "keep_separate_first_letter" : true,
                   "keep_full_pinyin" : false,
                   "keep_original" : false,
                   "limit_first_letter_length" : 16,
                   "lowercase" : true
               }
           }
       }
   }
}

POST /medcl/folks/andy
{"name":"刘德华"}

GET /medcl/folks/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "刘德h"
 }}
}

GET /medcl/folks/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "刘dh"
 }}
}

GET /medcl/folks/_search
{
 "query": {"match_phrase": {
   "name.pinyin": "dh"
 }}
}
上次编辑于:
贡献者: Hyman
评论
  • 按正序
  • 按倒序
  • 按热度
Powered by Waline v2.15.5