@xtccc
2018-07-14T08:03:33.000000Z
字数 9558
阅读 2632
ElasticSearch
在深入Query之前,先了解以下的概念:
Mapping
How the data is each field is interpretedAnalysis:
How full text is processed to make it searchableQuery DSL:
The flexible, powerful query language used by ElasticSearch
这里我们将使用 这些测试数据。
在名为employee
的type中,搜索lastname
为Xiao
的文档:
curl localhost:9200/megacorp/employee/_search?q=lastname:Xiao
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":2,
"max_score":1.0,
"hits":[ {
"_index":"megacorp",
"_type":"employee",
"_id":"AVCE4xivMv8zm4P4wh-e",
"_score":1.0,
"_source": {
"firstname" : "Tao",
"lastname" : "Xiao",
"age" : 30,
"about" : "Hello, my wife is CCC",
"interests" : ["coding", "jogging"]
}
}, {
"_index":"megacorp",
"_type":"employee",
"_id":"1",
"_score":0.30685282,
"_source": {
"firstname" : "Tao",
"lastname" : "Xiao",
"age" : 30,
"about" : "Hello, my wife is CCC",
"interests" : ["coding", "jogging"]
}
}]
}
}
默认返回前10条匹配的记录,我们也可以要求只返回前20个:
GET megacorp/employee/_search
{
"query" : {
"match" : {
"lastname" : "xiao"
}
},
"size" : 20
}
或者返回第20个 ~ 30个文档
GET megacorp/employee/_search
{
"query" : {
"match" : {
"lastname" : "xiao"
}
},
"from" : 20,
"size" : 10
}
其中,_source
是找到的文档全部内容(即_source
),可以要求只返回部分内容:
GET megacorp/employee/_search
{
"query" : {
"match" : {
"lastname" : "xiao"
}
},
"_source" : ["age", "about"]
}
可以在某些index里面查询
GET index1,index2,index/_search
{
"query" : { "match" : { "lastname" : "xiao"}}
}
或者是全部的indices
GET _all/_search
{
"query" : { "match" : { "lastname" : "xiao"}}
}
GET localhost:9200/megacorp/employee/_search?pretty -d '
{
"query" : { "match" : { "lastname" : "Xiao" }}
}
如果lastname = "A B Xiao cc DD"通过,如果lastname = "A B tXiaoy cc DD"则不通过,即必须是单词整体匹配
GET localhost:9200/megacorp/employee/_search?pretty -d '
{
"query" : { "match" : { "lastname" : "Xiao lane" }}
}
如果lastname中包含"Xiao"或者"lane"则可以通过。
GET localhost:9200/megacorp/employee/_search?pretty -d '
{
"query" : { "match_phrase" : { "lastname" : "Xiao lane" }}
}
必须包含"Xiao lane"这个整体。
我们想在type
为employee的范围中,所有lastname
为Xiao、且age
大于20的文档找出来,这里我们将使用range filter:
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
> "query" : {
> "filtered" : {
> "filter" : {
> "range" : {
> "age" : {"gt" : 20}
> }
> },
> "query" : {
> "match" : {
> "lastname" : "Xiao"
> }
> }
> }
> }
> }'
查询可以返回正确的结果。
在进行全文检索时,有以下几种case:
对于检索请求
curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=about,lastname&pretty'
返回了以下的结果:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.43920785,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "AVCE4xivMv8zm4P4wh-e",
"_score" : 0.43920785,
"fields" : {
"about" : [ "Hello, my wife is CCC" ],
"lastname" : [ "Xiao" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "4",
"_score" : 0.3125,
"fields" : {
"about" : [ "Hello, my wife is CCC" ],
"lastname" : [ "Xiao" ]
}
} ]
}
}
- took: 这次检索花费的时间,单位为milliseconds
- timed_out:检索是否超时?在默认情况下,检索永远不会超时。但是可以在发起检索请求时通过
timeout=<超时时长>
参数来设置一个超时的时长。如果检索过程超时,只能返回截至超时前获得的部分结果。- _shards: 这是检索过程所涉及到的shards的统计情况。如果某些节点down造成部分shards不可用,则搜索的结果可能会不完整。
- hits: 检索返回的结果,ES默认返回所有结果中的前10条。通过在检索请求中添加
size=<结果数量>
,可以改变返回结果的数量。- fields: 查询时指定的检索域。如果在查询请求中没有指定field,则这里会变为
_source
(即原始的JSON文档内容)
这里用GET的方式来演示。
case 1:其中一个field满足条件
$ curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=lastname,about&pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.43920785,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "AVCE4xivMv8zm4P4wh-e",
"_score" : 0.43920785,
"fields" : {
"about" : [ "Hello, my wife is CCC" ],
"lastname" : [ "Xiao" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "4",
"_score" : 0.3125,
"fields" : {
"about" : [ "Hello, my wife is CCC" ],
"lastname" : [ "Xiao" ]
}
} ]
}
}
这里指定了检索出lastname
或者 about
域含有 "xiao"的文档文档。
case 2:所有的fields都要满足条件
待写
例如,在get-together
这个index的范围内对文档进行检索
$ curl 'ecs1:9200/get-together/_search?q=elasticsearch&pretty'
例如,在get-together
和other-index
这两个index的范围内对文档进行检索
$ curl 'ecs1:9200/get-together,other-index/_search?q=elasticsearch&pretty'
$ curl 'ecs1:9200/_search?q=elasticsearch&pretty'
$ curl 'ecs1:9200/get-together/group,event/_search?q=elasticsearch&pretty'
$ curl 'ecs1:9200/_all/group,event/_search?q=elasticsearch&pretty'
通过在Query Request Body中构造JSON格式的内容,我们可以实现复杂的查询需求。
在默认情况下,会返回所有满足任意一个query term的文档。
curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"query_string" : {
"query" : "xiao LA",
"default_field" : "my"
}
}
}'
例如,上面的查询会返回以下两个文档:
"hits" : [ {
...
"_source": {
...
"about" : "Hello, my wife is CCC",
}
}, {
...
"_source": {
"about" : "Hello, my wife is CCC",
}
}
]
如果要求查询的域中必须同时包含全部的query terms,则可以加上参数default_operator
,如下:
curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"query_string" : {
"query" : "xiao LA",
"default_field" : "my",
"default_operator" : "AND"
}
}
}'
Filter只关心查询的结果是否与查询条件匹配,而不关心score(返回的所有结果的score都是1.0),因此filter比普通的查询速度更快。
现在,我们将找出about
域含有"am Jack"的所有文档。
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
> "query": {
> "match" : {
> "about" : "am Jack"
> }
> }
> }'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.70710677,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.70710677,
"_source" : {
"firstname" : "Jack",
"lastname" : "Chen",
"age" : 40,
"about" : "Hello, I am Jack",
"interests" : ["sports", "music"]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.02250402,
"_source" : {
"firstname" : "Lucy",
"lastname" : "Liu",
"age" : 50,
"about" : "Hello, I am Lucy",
"interests" : ["tv", "talking"]
}
} ]
}
}
这里,我们使用了match
query 来对about
域进行了全文检索。默认情况下,ES将返回的结果按照相关度_score
进行排序。
从结果可以看到,这次查询返回了两个文档,一个文档的about
域为"Hello, I am Jack",包含了全部的查询词,其相关度为0.70710677;另一个文档的about
域为"Hello, I am Lucy",只含有查询词中的一个词,其相关度为0.02250402。
如果我们要求对整个词组进行匹配(要包含全部的单词),则可以使用match_phrase
:
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
> "query": {
> "match_phrase" : {
> "about" : "am Jack"
> }
> }
> }'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source": {
"firstname" : "Jack",
"lastname" : "Chen",
"age" : 40,
"about" : "Hello, I am Jack",
"interests" : ["sports", "music"]
}
} ]
}
}
在使用match_phrase
时,返回的满足条件的文档的about
域的必须:
- 同时包含 am 和 Jack 这两个单词
- am 和 Jack 这两个单词必须紧挨在一起,中间不能有其他单词,但是可以有符号(例如逗号,中文或者英文符号均可)
- 必须 am 在前,Jack 在后
下列两种查询条件都无法返回结果:
- { "about" : "Jack am" }
- { "about" : "am a Jack" }
通过hightlight
可以返回:文档中的哪一段文本hit了搜索条件
[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
> "query" : {
> "match_phrase" : {
> "about" : "am Jack"
> }
> },
> "highlight" : {
> "fields" : {
> "about" : {}
> }
> }
> }'
{
"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source": {
"firstname" : "Jack",
"lastname" : "Chen",
"age" : 40,
"about" : "Hello, I am Jack",
"interests" : ["sports", "music"]
},
"highlight" : {
"about" : [ "Hello, I <em>am</em> <em>Jack</em>" ]
}
} ]
}
}
下面,我们将对<megacorp,employee>
范围内的4篇文档进行分析,按照interests
这个field进行aggregate。
curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
> "aggs" : {
> "all_interests" : {
> "terms" : { "field" : "interests" }
> }
> }
> } '
{
"took" : 66,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.0,
"_source": {
"firstname" : "Tao",
"lastname" : "Xiao",
"age" : 30,
"about" : "Hello, my wife is CCC",
"interests" : ["coding", "jogging"]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "AVCE4xivMv8zm4P4wh-e",
"_score" : 1.0,
"_source": {
"firstname" : "Tao",
"lastname" : "Xiao",
"age" : 30,
"about" : "Hello, my wife is CCC",
"interests" : ["coding", "jogging"]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source": {
"firstname" : "Jack",
"lastname" : "Chen",
"age" : 40,
"about" : "Hello, I am Jack",
"interests" : ["sports", "music"]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.0,
"_source": {
"firstname" : "Lucy",
"lastname" : "Liu",
"age" : 50,
"about" : "Hello, I am Lucy",
"interests" : ["tv", "talking"]
}
} ]
},
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "coding",
"doc_count" : 2
}, {
"key" : "jogging",
"doc_count" : 2
}, {
"key" : "music",
"doc_count" : 1
}, {
"key" : "sports",
"doc_count" : 1
}, {
"key" : "talking",
"doc_count" : 1
}, {
"key" : "tv",
"doc_count" : 1
} ]
}
}
}
如果希望在aggregate时加上一个限定条件(例如要求lastname
为Xiao),可以如下:
[root@ecs1 ~]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
> "query" : {
> "match" : {
> "lastname" : "Xiao"
> }
> },
> "aggs" : {
> "all_interests" : {
> "terms" : { "field" : "interests" }
> }
> }
> }'
此外,还可以查询:对于拥有相同兴趣的人,他们的平均年龄是多少?
为了说明这个例子,我们首先增加一个人:
curl localhost:9200/megacorp/employee/4?pretty -d '
> {
> "firstname" : "Tao",
> "lastname" : "Xiao",
> "age" : 20,
> "about" : "Hello, my wife is CCC",
> "interests" : ["music", "tv"]
> }'
现在来发起查询请求:
curl localhost:9200/megacorp/employee/_search?pretty -d '
{
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests"},
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age"}
}
}
}
}
}'
--> 返回结果如下
···
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "coding",
"doc_count" : 2,
"avg_age" : {
"value" : 30.0
}
}, {
"key" : "jogging",
"doc_count" : 2,
"avg_age" : {
"value" : 30.0
}
}, {
"key" : "music",
"doc_count" : 2,
"avg_age" : {
"value" : 30.0
}
}, {
"key" : "tv",
"doc_count" : 2,
"avg_age" : {
"value" : 35.0
}
}, {
"key" : "sports",
"doc_count" : 1,
"avg_age" : {
"value" : 40.0
}
}, {
"key" : "talking",
"doc_count" : 1,
"avg_age" : {
"value" : 50.0
}
} ]
}
}