Elasticsearch: 运用 Field collapsing 来减少基于单个字段的搜索结果

时间:2022-07-24
本文章向大家介绍Elasticsearch: 运用 Field collapsing 来减少基于单个字段的搜索结果,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

腾讯云 Elasticsearch Service】高可用,可伸缩,云端全托管。集成X-Pack高级特性,适用日志分析/企业搜索/BI分析等场景


允许根据字段值折叠搜索结果。 折叠是通过每个折叠键仅选择排序最靠前的文档来完成的。要想理解这个其实也并不难,我们就那百度音乐的页面例子来说:

我们可以看到在上面的页面中,它有展示很多喜欢的歌曲。其实这个歌曲可能是一个专辑里的最突出的一个。当我们做页面的时候,我们没有必要把一个专辑里所有的歌曲都放到这个封面的位置。我也许就只想放这个专辑里点击率最高的或者是最受欢迎的一首歌作为这个专辑的代表。当我们点击这个专辑的时候,我们还可以看到其它在这个专辑里的歌曲:

Field collapsing 就是为这个而生。这种情况也适用于有些新闻头条出现在标题栏中。当我们点击进去过,可以看到更多的相关类别的新闻。

下面我们来通过一个例子来展示如何使用。

准备数据

今天我们使用的数据是一个最好游戏的一个数据。我们可以从我的 github 项目里把这个数据下载下来:

git clon https://github.com/liu-xiao-guo/best_games_json_data

然后,我们通过如下的方式把我们下载的JSON数据导入到Elasticsearch中:

我们把这个index的名字叫做best_games:

这样我们的数据就准备好了。整个索引共有500条数据。这个索引里的每一条数据就像:

{"id":"madden-nfl-2002-ps2-2001","name":"Madden NFL 2002","year":2001,"platform":"PS2","genre":"Sports","publisher":"Electronic Arts","global_sales":3.08,"critic_score":94,"user_score":7,"developer":"EA Sports","image_url":"http://www.mobygames.com/images/covers/l/202684-madden-nfl-2002-playstation-2-back-cover.png"}

它的mapping为:

{  "best_games" : {    "mappings" : {      "_meta" : {        "created_by" : "ml-file-data-visualizer"      },      "properties" : {        "critic_score" : {          "type" : "long"        },        "developer" : {          "type" : "text"        },        "genre" : {          "type" : "keyword"        },        "global_sales" : {          "type" : "double"        },        "id" : {          "type" : "keyword"        },        "image_url" : {          "type" : "keyword"        },        "name" : {          "type" : "text"        },        "platform" : {          "type" : "keyword"        },        "publisher" : {          "type" : "keyword"        },        "user_score" : {          "type" : "long"        },        "year" : {          "type" : "long"        }      }    }  }}

Field collapsing

下面我们用 collapsing 的方法来对我们的数据进行搜索:

GET best_games/_search{  "query": {    "match": {      "name": "Final Fantasy"    }  },  "collapse": {    "field": "publisher"  },   "sort": [    {      "critic_score": {        "order": "desc"      }    }  ]}

搜索的结果是:

{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 11,      "relation" : "eq"    },    "max_score" : null,    "hits" : [      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "E3JzF28BjrINWI3xtt80",        "_score" : null,        "_source" : {          "id" : "final-fantasy-ix-ps-2000",          "name" : "Final Fantasy IX",          "year" : 2000,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "SquareSoft",          "global_sales" : 5.3,          "critic_score" : 94,          "user_score" : 8,          "developer" : "SquareSoft",          "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"        },        "fields" : {          "publisher" : [            "SquareSoft"          ]        },        "sort" : [          94        ]      },      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "wnJzF28BjrINWI3xtt40",        "_score" : null,        "_source" : {          "id" : "final-fantasy-vii-ps-1997",          "name" : "Final Fantasy VII",          "year" : 1997,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "Sony Computer Entertainment",          "global_sales" : 9.72,          "critic_score" : 92,          "user_score" : 9,          "developer" : "SquareSoft",          "image_url" : "https://r.hswstatic.com/w_907/gif/finalfantasyvii-MAIN.jpg"        },        "fields" : {          "publisher" : [            "Sony Computer Entertainment"          ]        },        "sort" : [          92        ]      },      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "_nJzF28BjrINWI3xtt40",        "_score" : null,        "_source" : {          "id" : "final-fantasy-xii-ps2-2006",          "name" : "Final Fantasy XII",          "year" : 2006,          "platform" : "PS2",          "genre" : "Role-Playing",          "publisher" : "Square Enix",          "global_sales" : 5.95,          "critic_score" : 92,          "user_score" : 7,          "developer" : "Square Enix",          "image_url" : "https://m.media-amazon.com/images/M/MV5BM2I4MDMyMDQtNjM2OC00ZWNkLTg0ODQtNzYxZjY0M2QxODQyXkEyXkFqcGdeQXVyNjY5NTM5MjA@._V1_.jpg"        },        "fields" : {          "publisher" : [            "Square Enix"          ]        },        "sort" : [          92        ]      },      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "FXJzF28BjrINWI3xtt80",        "_score" : null,        "_source" : {          "id" : "final-fantasy-x-2-ps2-2003",          "name" : "Final Fantasy X-2",          "year" : 2003,          "platform" : "PS2",          "genre" : "Role-Playing",          "publisher" : "Electronic Arts",          "global_sales" : 5.29,          "critic_score" : 85,          "user_score" : 6,          "developer" : "SquareSoft",          "image_url" : "https://upload.wikimedia.org/wikipedia/en/thumb/6/6c/FFX-2_box.jpg/220px-FFX-2_box.jpg"        },        "fields" : {          "publisher" : [            "Electronic Arts"          ]        },        "sort" : [          85        ]      }    ]  }}

上面的结果显示:

  • 我们搜索所有的名字为 Final Fantasy 的游戏,并按照 critic_score 降序排序。
  • 由于我们使用 collapse,并按照 publisher 来进行分类。它的意思就是每个 publisher 只能有一个搜索的结果,尽管每一 publisher 有很多款的游戏

比如,我们可以找到 publisher 为 SquareSoft 并且 name 里含有 Final Fantasy 的游戏,有三款之多:

GET best_games/_search{  "query": {    "bool": {      "must": [        {          "match": {            "name": "Final Fantasy"          }        },        {          "match": {            "publisher": "SquareSoft"          }        }      ]    }  },  "sort": [    {      "critic_score": {        "order": "desc"      }    }  ]}

上面的查询结果:

    "hits" : [      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "E3JzF28BjrINWI3xtt80",        "_score" : null,        "_source" : {          "id" : "final-fantasy-ix-ps-2000",          "name" : "Final Fantasy IX",          "year" : 2000,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "SquareSoft",          "global_sales" : 5.3,          "critic_score" : 94,          "user_score" : 8,          "developer" : "SquareSoft",          "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"        },        "sort" : [          94        ]      },      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "0nJzF28BjrINWI3xtt40",        "_score" : null,        "_source" : {          "id" : "final-fantasy-viii-ps-1999",          "name" : "Final Fantasy VIII",          "year" : 1999,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "SquareSoft",          "global_sales" : 7.86,          "critic_score" : 90,          "user_score" : 8,          "developer" : "SquareSoft",          "image_url" : "https://gamingheartscollection.files.wordpress.com/2018/02/final-fantasy-8.png?w=585"        },        "sort" : [          90        ]      },      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "SHJzF28BjrINWI3xtuA1",        "_score" : null,        "_source" : {          "id" : "final-fantasy-tactics-ps-1997",          "name" : "Final Fantasy Tactics",          "year" : 1997,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "SquareSoft",          "global_sales" : 2.45,          "critic_score" : 83,          "user_score" : 8,          "developer" : "SquareSoft",          "image_url" : "https://www.thefinalfantasy.com/gallery/screenshots/ff-tactics/dynamic_previews/ff-tactics-screenshot-1_scale_800_700.jpg"        },        "sort" : [          83        ]      }    ]  }

但是由于我们使用了collapse,只有一款游戏,并且是按照 critic_score 最高的那个被搜索出来。

注意:能够被 collapse 所使用的字段必须是数字或 keyword 字段,并且含有 doc_values

扩展 Collapse 结果

我们也可以通过使用 inner_hits 选项来扩展 Collapse 的热门匹配:

GET best_games/_search{  "query": {    "match": {      "name": "Final Fantasy"    }  },  "collapse": {    "field": "publisher",    "inner_hits": {      "name": "top 3 games",      "size": 3,      "sort": [{"user_score": "desc"}]    }  },   "sort": [    {      "critic_score": {        "order": "desc"      }    }  ]}

那么运行后的结果为:

  "hits" : [      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "E3JzF28BjrINWI3xtt80",        "_score" : null,        "_source" : {          "id" : "final-fantasy-ix-ps-2000",          "name" : "Final Fantasy IX",          "year" : 2000,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "SquareSoft",          "global_sales" : 5.3,          "critic_score" : 94,          "user_score" : 8,          "developer" : "SquareSoft",          "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"        },        "fields" : {          "publisher" : [            "SquareSoft"          ]        },        "sort" : [          94        ],        "inner_hits" : {          "top 3 games" : {            "hits" : {              "total" : {                "value" : 3,                "relation" : "eq"              },              "max_score" : null,              "hits" : [                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "0nJzF28BjrINWI3xtt40",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-viii-ps-1999",                    "name" : "Final Fantasy VIII",                    "year" : 1999,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 7.86,                    "critic_score" : 90,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "https://gamingheartscollection.files.wordpress.com/2018/02/final-fantasy-8.png?w=585"                  },                  "sort" : [                    8                  ]                },                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "E3JzF28BjrINWI3xtt80",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-ix-ps-2000",                    "name" : "Final Fantasy IX",                    "year" : 2000,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 5.3,                    "critic_score" : 94,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"                  },                  "sort" : [                    8                  ]                },                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "SHJzF28BjrINWI3xtuA1",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-tactics-ps-1997",                    "name" : "Final Fantasy Tactics",                    "year" : 1997,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 2.45,                    "critic_score" : 83,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "https://www.thefinalfantasy.com/gallery/screenshots/ff-tactics/dynamic_previews/ff-tactics-screenshot-1_scale_800_700.jpg"                  },                  "sort" : [                    8                  ]                }              ]            }          }        }      },

我们可以看出来在每个 publisher 里,在 inner_hits 里同时含有3个 top 3 games。它们分别是按照 user_score 来进行分类的。

也可以为每个合拢的匹配请求多个 inner_hits。 当您想要获得 Collapse 后的匹配的多种表示形式时,此功能很有用。

GET best_games/_search{  "query": {    "match": {      "name": "Final Fantasy"    }  },  "collapse": {    "field": "publisher",    "inner_hits": [      {        "name": "top user liked",        "size": 3,        "sort": [          {            "user_score": "desc"          }        ]      },      {        "name": "top most recent games",        "size": 3,        "sort": [          {            "year": "desc"          }        ]              }    ]  },  "sort": [    {      "critic_score": {        "order": "desc"      }    }  ]}

显示结果为:

/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* "hits" : [      {        "_index" : "best_games",        "_type" : "_doc",        "_id" : "E3JzF28BjrINWI3xtt80",        "_score" : null,        "_source" : {          "id" : "final-fantasy-ix-ps-2000",          "name" : "Final Fantasy IX",          "year" : 2000,          "platform" : "PS",          "genre" : "Role-Playing",          "publisher" : "SquareSoft",          "global_sales" : 5.3,          "critic_score" : 94,          "user_score" : 8,          "developer" : "SquareSoft",          "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"        },        "fields" : {          "publisher" : [            "SquareSoft"          ]        },        "sort" : [          94        ],        "inner_hits" : {          "top user liked" : {            "hits" : {              "total" : {                "value" : 3,                "relation" : "eq"              },              "max_score" : null,              "hits" : [                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "0nJzF28BjrINWI3xtt40",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-viii-ps-1999",                    "name" : "Final Fantasy VIII",                    "year" : 1999,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 7.86,                    "critic_score" : 90,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "https://gamingheartscollection.files.wordpress.com/2018/02/final-fantasy-8.png?w=585"                  },                  "sort" : [                    8                  ]                },                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "E3JzF28BjrINWI3xtt80",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-ix-ps-2000",                    "name" : "Final Fantasy IX",                    "year" : 2000,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 5.3,                    "critic_score" : 94,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"                  },                  "sort" : [                    8                  ]                },                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "SHJzF28BjrINWI3xtuA1",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-tactics-ps-1997",                    "name" : "Final Fantasy Tactics",                    "year" : 1997,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 2.45,                    "critic_score" : 83,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "https://www.thefinalfantasy.com/gallery/screenshots/ff-tactics/dynamic_previews/ff-tactics-screenshot-1_scale_800_700.jpg"                  },                  "sort" : [                    8                  ]                }              ]            }          },          "top most recent games" : {            "hits" : {              "total" : {                "value" : 3,                "relation" : "eq"              },              "max_score" : null,              "hits" : [                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "E3JzF28BjrINWI3xtt80",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-ix-ps-2000",                    "name" : "Final Fantasy IX",                    "year" : 2000,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 5.3,                    "critic_score" : 94,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg"                  },                  "sort" : [                    2000                  ]                },                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "0nJzF28BjrINWI3xtt40",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-viii-ps-1999",                    "name" : "Final Fantasy VIII",                    "year" : 1999,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 7.86,                    "critic_score" : 90,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "https://gamingheartscollection.files.wordpress.com/2018/02/final-fantasy-8.png?w=585"                  },                  "sort" : [                    1999                  ]                },                {                  "_index" : "best_games",                  "_type" : "_doc",                  "_id" : "SHJzF28BjrINWI3xtuA1",                  "_score" : null,                  "_source" : {                    "id" : "final-fantasy-tactics-ps-1997",                    "name" : "Final Fantasy Tactics",                    "year" : 1997,                    "platform" : "PS",                    "genre" : "Role-Playing",                    "publisher" : "SquareSoft",                    "global_sales" : 2.45,                    "critic_score" : 83,                    "user_score" : 8,                    "developer" : "SquareSoft",                    "image_url" : "https://www.thefinalfantasy.com/gallery/screenshots/ff-tactics/dynamic_previews/ff-tactics-screenshot-1_scale_800_700.jpg"                  },                  "sort" : [                    1997                  ]                }              ]            }          }        }      },
*/

这样针对每个 publisher,我们也可以得到每个 publisher 在 user 中最受欢迎的三个,同时显示最新的三个游戏。

参考:

【1】https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-collapse


最新活动

包含文章发布时段最新活动,前往ES产品介绍页,可查找ES当前活动统一入口

Elasticsearch Service自建迁移特惠政策>>

Elasticsearch Service 新用户特惠狂欢,最低4折首购优惠 >>

Elasticsearch Service 企业首购特惠,助力企业复工复产>>

关注“腾讯云大数据”公众号,技术交流、最新活动、服务专享一站Get~