@xtccc
2018-07-14T00:03:33.000000Z
字数 9558
阅读 3067

ElasticSearch
在深入Query之前,先了解以下的概念:
Mapping
How the data is each field is interpretedAnalysis:
How full text is processed to make it searchableQuery DSL:
The flexible, powerful query language used by ElasticSearch
这里我们将使用 这些测试数据。
在名为employee的type中,搜索lastname为Xiao的文档:
curl localhost:9200/megacorp/employee/_search?q=lastname:Xiao{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[ {"_index":"megacorp","_type":"employee","_id":"AVCE4xivMv8zm4P4wh-e","_score":1.0,"_source": {"firstname" : "Tao","lastname" : "Xiao","age" : 30,"about" : "Hello, my wife is CCC","interests" : ["coding", "jogging"]}}, {"_index":"megacorp","_type":"employee","_id":"1","_score":0.30685282,"_source": {"firstname" : "Tao","lastname" : "Xiao","age" : 30,"about" : "Hello, my wife is CCC","interests" : ["coding", "jogging"]}}]}}
默认返回前10条匹配的记录,我们也可以要求只返回前20个:
GET megacorp/employee/_search{"query" : {"match" : {"lastname" : "xiao"}},"size" : 20}
或者返回第20个 ~ 30个文档
GET megacorp/employee/_search{"query" : {"match" : {"lastname" : "xiao"}},"from" : 20,"size" : 10}
其中,_source是找到的文档全部内容(即_source),可以要求只返回部分内容:
GET megacorp/employee/_search{"query" : {"match" : {"lastname" : "xiao"}},"_source" : ["age", "about"]}
可以在某些index里面查询
GET index1,index2,index/_search{"query" : { "match" : { "lastname" : "xiao"}}}
或者是全部的indices
GET _all/_search{"query" : { "match" : { "lastname" : "xiao"}}}
GET localhost:9200/megacorp/employee/_search?pretty -d '{"query" : { "match" : { "lastname" : "Xiao" }}}
如果lastname = "A B Xiao cc DD"通过,如果lastname = "A B tXiaoy cc DD"则不通过,即必须是单词整体匹配
GET localhost:9200/megacorp/employee/_search?pretty -d '{"query" : { "match" : { "lastname" : "Xiao lane" }}}
如果lastname中包含"Xiao"或者"lane"则可以通过。
GET localhost:9200/megacorp/employee/_search?pretty -d '{"query" : { "match_phrase" : { "lastname" : "Xiao lane" }}}
必须包含"Xiao lane"这个整体。
我们想在type为employee的范围中,所有lastname为Xiao、且age大于20的文档找出来,这里我们将使用range filter:
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '> {> "query" : {> "filtered" : {> "filter" : {> "range" : {> "age" : {"gt" : 20}> }> },> "query" : {> "match" : {> "lastname" : "Xiao"> }> }> }> }> }'
查询可以返回正确的结果。
在进行全文检索时,有以下几种case:
对于检索请求
curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=about,lastname&pretty'
返回了以下的结果:
{"took" : 4,"timed_out" : false,"_shards" : {"total" : 5,"successful" : 5,"failed" : 0},"hits" : {"total" : 2,"max_score" : 0.43920785,"hits" : [ {"_index" : "megacorp","_type" : "employee","_id" : "AVCE4xivMv8zm4P4wh-e","_score" : 0.43920785,"fields" : {"about" : [ "Hello, my wife is CCC" ],"lastname" : [ "Xiao" ]}}, {"_index" : "megacorp","_type" : "employee","_id" : "4","_score" : 0.3125,"fields" : {"about" : [ "Hello, my wife is CCC" ],"lastname" : [ "Xiao" ]}} ]}}
- took: 这次检索花费的时间,单位为milliseconds
- timed_out:检索是否超时?在默认情况下,检索永远不会超时。但是可以在发起检索请求时通过
timeout=<超时时长>参数来设置一个超时的时长。如果检索过程超时,只能返回截至超时前获得的部分结果。- _shards: 这是检索过程所涉及到的shards的统计情况。如果某些节点down造成部分shards不可用,则搜索的结果可能会不完整。
- hits: 检索返回的结果,ES默认返回所有结果中的前10条。通过在检索请求中添加
size=<结果数量>,可以改变返回结果的数量。- fields: 查询时指定的检索域。如果在查询请求中没有指定field,则这里会变为
_source(即原始的JSON文档内容)
这里用GET的方式来演示。
case 1:其中一个field满足条件
$ curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=lastname,about&pretty'{"took" : 2,"timed_out" : false,"_shards" : {"total" : 5,"successful" : 5,"failed" : 0},"hits" : {"total" : 2,"max_score" : 0.43920785,"hits" : [ {"_index" : "megacorp","_type" : "employee","_id" : "AVCE4xivMv8zm4P4wh-e","_score" : 0.43920785,"fields" : {"about" : [ "Hello, my wife is CCC" ],"lastname" : [ "Xiao" ]}}, {"_index" : "megacorp","_type" : "employee","_id" : "4","_score" : 0.3125,"fields" : {"about" : [ "Hello, my wife is CCC" ],"lastname" : [ "Xiao" ]}} ]}}
这里指定了检索出lastname 或者 about域含有 "xiao"的文档文档。
case 2:所有的fields都要满足条件
待写
例如,在get-together这个index的范围内对文档进行检索
$ curl 'ecs1:9200/get-together/_search?q=elasticsearch&pretty'
例如,在get-together和other-index这两个index的范围内对文档进行检索
$ curl 'ecs1:9200/get-together,other-index/_search?q=elasticsearch&pretty'
$ curl 'ecs1:9200/_search?q=elasticsearch&pretty'
$ curl 'ecs1:9200/get-together/group,event/_search?q=elasticsearch&pretty'
$ curl 'ecs1:9200/_all/group,event/_search?q=elasticsearch&pretty'
通过在Query Request Body中构造JSON格式的内容,我们可以实现复杂的查询需求。
在默认情况下,会返回所有满足任意一个query term的文档。
curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '{"query" : {"query_string" : {"query" : "xiao LA","default_field" : "my"}}}'
例如,上面的查询会返回以下两个文档:
"hits" : [ {..."_source": {..."about" : "Hello, my wife is CCC",}}, {..."_source": {"about" : "Hello, my wife is CCC",}}]
如果要求查询的域中必须同时包含全部的query terms,则可以加上参数default_operator,如下:
curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '{"query" : {"query_string" : {"query" : "xiao LA","default_field" : "my","default_operator" : "AND"}}}'
Filter只关心查询的结果是否与查询条件匹配,而不关心score(返回的所有结果的score都是1.0),因此filter比普通的查询速度更快。
现在,我们将找出about域含有"am Jack"的所有文档。
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '> {> "query": {> "match" : {> "about" : "am Jack"> }> }> }'{"took" : 3,"timed_out" : false,"_shards" : {"total" : 5,"successful" : 5,"failed" : 0},"hits" : {"total" : 2,"max_score" : 0.70710677,"hits" : [ {"_index" : "megacorp","_type" : "employee","_id" : "2","_score" : 0.70710677,"_source" : {"firstname" : "Jack","lastname" : "Chen","age" : 40,"about" : "Hello, I am Jack","interests" : ["sports", "music"]}}, {"_index" : "megacorp","_type" : "employee","_id" : "3","_score" : 0.02250402,"_source" : {"firstname" : "Lucy","lastname" : "Liu","age" : 50,"about" : "Hello, I am Lucy","interests" : ["tv", "talking"]}} ]}}
这里,我们使用了match query 来对about域进行了全文检索。默认情况下,ES将返回的结果按照相关度_score进行排序。
从结果可以看到,这次查询返回了两个文档,一个文档的about域为"Hello, I am Jack",包含了全部的查询词,其相关度为0.70710677;另一个文档的about域为"Hello, I am Lucy",只含有查询词中的一个词,其相关度为0.02250402。
如果我们要求对整个词组进行匹配(要包含全部的单词),则可以使用match_phrase:
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '> {> "query": {> "match_phrase" : {> "about" : "am Jack"> }> }> }'{"took" : 6,"timed_out" : false,"_shards" : {"total" : 5,"successful" : 5,"failed" : 0},"hits" : {"total" : 1,"max_score" : 1.0,"hits" : [ {"_index" : "megacorp","_type" : "employee","_id" : "2","_score" : 1.0,"_source": {"firstname" : "Jack","lastname" : "Chen","age" : 40,"about" : "Hello, I am Jack","interests" : ["sports", "music"]}} ]}}
在使用match_phrase时,返回的满足条件的文档的about域的必须:
- 同时包含 am 和 Jack 这两个单词
- am 和 Jack 这两个单词必须紧挨在一起,中间不能有其他单词,但是可以有符号(例如逗号,中文或者英文符号均可)
- 必须 am 在前,Jack 在后
下列两种查询条件都无法返回结果:
- { "about" : "Jack am" }
- { "about" : "am a Jack" }
通过hightlight可以返回:文档中的哪一段文本hit了搜索条件
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '> {> "query" : {> "match_phrase" : {> "about" : "am Jack"> }> },> "highlight" : {> "fields" : {> "about" : {}> }> }> }'{"took" : 28,"timed_out" : false,"_shards" : {"total" : 5,"successful" : 5,"failed" : 0},"hits" : {"total" : 1,"max_score" : 1.0,"hits" : [ {"_index" : "megacorp","_type" : "employee","_id" : "2","_score" : 1.0,"_source": {"firstname" : "Jack","lastname" : "Chen","age" : 40,"about" : "Hello, I am Jack","interests" : ["sports", "music"]},"highlight" : {"about" : [ "Hello, I <em>am</em> <em>Jack</em>" ]}} ]}}
下面,我们将对<megacorp,employee>范围内的4篇文档进行分析,按照interests这个field进行aggregate。
curl localhost:9200/megacorp/employee/_search?pretty -d '> {> "aggs" : {> "all_interests" : {> "terms" : { "field" : "interests" }> }> }> } '{"took" : 66,"timed_out" : false,"_shards" : {"total" : 5,"successful" : 5,"failed" : 0},"hits" : {"total" : 4,"max_score" : 1.0,"hits" : [ {"_index" : "megacorp","_type" : "employee","_id" : "1","_score" : 1.0,"_source": {"firstname" : "Tao","lastname" : "Xiao","age" : 30,"about" : "Hello, my wife is CCC","interests" : ["coding", "jogging"]}}, {"_index" : "megacorp","_type" : "employee","_id" : "AVCE4xivMv8zm4P4wh-e","_score" : 1.0,"_source": {"firstname" : "Tao","lastname" : "Xiao","age" : 30,"about" : "Hello, my wife is CCC","interests" : ["coding", "jogging"]}}, {"_index" : "megacorp","_type" : "employee","_id" : "2","_score" : 1.0,"_source": {"firstname" : "Jack","lastname" : "Chen","age" : 40,"about" : "Hello, I am Jack","interests" : ["sports", "music"]}}, {"_index" : "megacorp","_type" : "employee","_id" : "3","_score" : 1.0,"_source": {"firstname" : "Lucy","lastname" : "Liu","age" : 50,"about" : "Hello, I am Lucy","interests" : ["tv", "talking"]}} ]},"aggregations" : {"all_interests" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [ {"key" : "coding","doc_count" : 2}, {"key" : "jogging","doc_count" : 2}, {"key" : "music","doc_count" : 1}, {"key" : "sports","doc_count" : 1}, {"key" : "talking","doc_count" : 1}, {"key" : "tv","doc_count" : 1} ]}}}
如果希望在aggregate时加上一个限定条件(例如要求lastname为Xiao),可以如下:
[root@ecs1 ~]# curl localhost:9200/megacorp/employee/_search?pretty -d '> {> "query" : {> "match" : {> "lastname" : "Xiao"> }> },> "aggs" : {> "all_interests" : {> "terms" : { "field" : "interests" }> }> }> }'
此外,还可以查询:对于拥有相同兴趣的人,他们的平均年龄是多少?
为了说明这个例子,我们首先增加一个人:
curl localhost:9200/megacorp/employee/4?pretty -d '> {> "firstname" : "Tao",> "lastname" : "Xiao",> "age" : 20,> "about" : "Hello, my wife is CCC",> "interests" : ["music", "tv"]> }'
现在来发起查询请求:
curl localhost:9200/megacorp/employee/_search?pretty -d '{"aggs" : {"all_interests" : {"terms" : { "field" : "interests"},"aggs" : {"avg_age" : {"avg" : { "field" : "age"}}}}}}'--> 返回结果如下···"aggregations" : {"all_interests" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [ {"key" : "coding","doc_count" : 2,"avg_age" : {"value" : 30.0}}, {"key" : "jogging","doc_count" : 2,"avg_age" : {"value" : 30.0}}, {"key" : "music","doc_count" : 2,"avg_age" : {"value" : 30.0}}, {"key" : "tv","doc_count" : 2,"avg_age" : {"value" : 35.0}}, {"key" : "sports","doc_count" : 1,"avg_age" : {"value" : 40.0}}, {"key" : "talking","doc_count" : 1,"avg_age" : {"value" : 50.0}} ]}}