[关闭]
@xtccc 2018-07-14T08:03:33.000000Z 字数 9558 阅读 2618

Query

给我写信
GitHub

此处输入图片的描述

ElasticSearch



结构化查询(Structured Query)


在深入Query之前,先了解以下的概念:

  • Mapping
         How the data is each field is interpreted

  • Analysis:
         How full text is processed to make it searchable

  • Query DSL:
         The flexible, powerful query language used by ElasticSearch

这里我们将使用 这些测试数据

在名为employee的type中,搜索lastnameXiao的文档:

  1. curl localhost:9200/megacorp/employee/_search?q=lastname:Xiao
  2. {
  3. "took":2,
  4. "timed_out":false,
  5. "_shards":{
  6. "total":5,
  7. "successful":5,
  8. "failed":0
  9. },
  10. "hits":{
  11. "total":2,
  12. "max_score":1.0,
  13. "hits":[ {
  14. "_index":"megacorp",
  15. "_type":"employee",
  16. "_id":"AVCE4xivMv8zm4P4wh-e",
  17. "_score":1.0,
  18. "_source": {
  19. "firstname" : "Tao",
  20. "lastname" : "Xiao",
  21. "age" : 30,
  22. "about" : "Hello, my wife is CCC",
  23. "interests" : ["coding", "jogging"]
  24. }
  25. }, {
  26. "_index":"megacorp",
  27. "_type":"employee",
  28. "_id":"1",
  29. "_score":0.30685282,
  30. "_source": {
  31. "firstname" : "Tao",
  32. "lastname" : "Xiao",
  33. "age" : 30,
  34. "about" : "Hello, my wife is CCC",
  35. "interests" : ["coding", "jogging"]
  36. }
  37. }]
  38. }
  39. }

默认返回前10条匹配的记录,我们也可以要求只返回前20个:

  1. GET megacorp/employee/_search
  2. {
  3. "query" : {
  4. "match" : {
  5. "lastname" : "xiao"
  6. }
  7. },
  8. "size" : 20
  9. }

或者返回第20个 ~ 30个文档

  1. GET megacorp/employee/_search
  2. {
  3. "query" : {
  4. "match" : {
  5. "lastname" : "xiao"
  6. }
  7. },
  8. "from" : 20,
  9. "size" : 10
  10. }



其中,_source是找到的文档全部内容(即_source),可以要求只返回部分内容:

  1. GET megacorp/employee/_search
  2. {
  3. "query" : {
  4. "match" : {
  5. "lastname" : "xiao"
  6. }
  7. },
  8. "_source" : ["age", "about"]
  9. }


可以在某些index里面查询

  1. GET index1,index2,index/_search
  2. {
  3. "query" : { "match" : { "lastname" : "xiao"}}
  4. }

或者是全部的indices

  1. GET _all/_search
  2. {
  3. "query" : { "match" : { "lastname" : "xiao"}}
  4. }


Query DSL

包含某个单词

  1. GET localhost:9200/megacorp/employee/_search?pretty -d '
  2. {
  3. "query" : { "match" : { "lastname" : "Xiao" }}
  4. }

如果lastname = "A B Xiao cc DD"通过,如果lastname = "A B tXiaoy cc DD"则不通过,即必须是单词整体匹配

包含至少一个单词

  1. GET localhost:9200/megacorp/employee/_search?pretty -d '
  2. {
  3. "query" : { "match" : { "lastname" : "Xiao lane" }}
  4. }

如果lastname中包含"Xiao"或者"lane"则可以通过。

包含词组

  1. GET localhost:9200/megacorp/employee/_search?pretty -d '
  2. {
  3. "query" : { "match_phrase" : { "lastname" : "Xiao lane" }}
  4. }

必须包含"Xiao lane"这个整体。

Filter

我们想在type为employee的范围中,所有lastname为Xiao、且age大于20的文档找出来,这里我们将使用range filter

  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. > {
  3. > "query" : {
  4. > "filtered" : {
  5. > "filter" : {
  6. > "range" : {
  7. > "age" : {"gt" : 20}
  8. > }
  9. > },
  10. > "query" : {
  11. > "match" : {
  12. > "lastname" : "Xiao"
  13. > }
  14. > }
  15. > }
  16. > }
  17. > }'

查询可以返回正确的结果。




全文检索(Full-Text Query)


在进行全文检索时,有以下几种case:


检索返回的响应信息

对于检索请求
curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=about,lastname&pretty'
返回了以下的结果:

  1. {
  2. "took" : 4,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 5,
  6. "successful" : 5,
  7. "failed" : 0
  8. },
  9. "hits" : {
  10. "total" : 2,
  11. "max_score" : 0.43920785,
  12. "hits" : [ {
  13. "_index" : "megacorp",
  14. "_type" : "employee",
  15. "_id" : "AVCE4xivMv8zm4P4wh-e",
  16. "_score" : 0.43920785,
  17. "fields" : {
  18. "about" : [ "Hello, my wife is CCC" ],
  19. "lastname" : [ "Xiao" ]
  20. }
  21. }, {
  22. "_index" : "megacorp",
  23. "_type" : "employee",
  24. "_id" : "4",
  25. "_score" : 0.3125,
  26. "fields" : {
  27. "about" : [ "Hello, my wife is CCC" ],
  28. "lastname" : [ "Xiao" ]
  29. }
  30. } ]
  31. }
  32. }
  • took: 这次检索花费的时间,单位为milliseconds
  • timed_out:检索是否超时?在默认情况下,检索永远不会超时。但是可以在发起检索请求时通过 timeout=<超时时长> 参数来设置一个超时的时长。如果检索过程超时,只能返回截至超时前获得的部分结果。
  • _shards: 这是检索过程所涉及到的shards的统计情况。如果某些节点down造成部分shards不可用,则搜索的结果可能会不完整
  • hits: 检索返回的结果,ES默认返回所有结果中的前10条。通过在检索请求中添加size=<结果数量>,可以改变返回结果的数量。
  • fields: 查询时指定的检索域。如果在查询请求中没有指定field,则这里会变为_source (即原始的JSON文档内容)


针对指定的fields进行检索

指定多个fields进行检索

这里用GET的方式来演示。

case 1:其中一个field满足条件

  1. $ curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=lastname,about&pretty'
  2. {
  3. "took" : 2,
  4. "timed_out" : false,
  5. "_shards" : {
  6. "total" : 5,
  7. "successful" : 5,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : 2,
  12. "max_score" : 0.43920785,
  13. "hits" : [ {
  14. "_index" : "megacorp",
  15. "_type" : "employee",
  16. "_id" : "AVCE4xivMv8zm4P4wh-e",
  17. "_score" : 0.43920785,
  18. "fields" : {
  19. "about" : [ "Hello, my wife is CCC" ],
  20. "lastname" : [ "Xiao" ]
  21. }
  22. }, {
  23. "_index" : "megacorp",
  24. "_type" : "employee",
  25. "_id" : "4",
  26. "_score" : 0.3125,
  27. "fields" : {
  28. "about" : [ "Hello, my wife is CCC" ],
  29. "lastname" : [ "Xiao" ]
  30. }
  31. } ]
  32. }
  33. }

这里指定了检索出lastname 或者 about域含有 "xiao"的文档文档。


case 2:所有的fields都要满足条件

待写


指定检索的范围(index, type)

在某个index的范围内检索

例如,在get-together这个index的范围内对文档进行检索

  1. $ curl 'ecs1:9200/get-together/_search?q=elasticsearch&pretty'


在多个index的范围内检索

例如,在get-togetherother-index这两个index的范围内对文档进行检索

  1. $ curl 'ecs1:9200/get-together,other-index/_search?q=elasticsearch&pretty'


在所有的index的范围内进行检索

  1. $ curl 'ecs1:9200/_search?q=elasticsearch&pretty'


在一个index的某几个types的范围内进行检索

  1. $ curl 'ecs1:9200/get-together/group,event/_search?q=elasticsearch&pretty'


在全部index的某个指定type内进行检索

  1. $ curl 'ecs1:9200/_all/group,event/_search?q=elasticsearch&pretty'




Query方式


通过在Query Request Body中构造JSON格式的内容,我们可以实现复杂的查询需求。

匹配任意查询词 v.s. 匹配所有查询词

在默认情况下,会返回所有满足任意一个query term的文档。

  1. curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '
  2. {
  3. "query" : {
  4. "query_string" : {
  5. "query" : "xiao LA",
  6. "default_field" : "my"
  7. }
  8. }
  9. }'

例如,上面的查询会返回以下两个文档:

  1. "hits" : [ {
  2. ...
  3. "_source": {
  4. ...
  5. "about" : "Hello, my wife is CCC",
  6. }
  7. }, {
  8. ...
  9. "_source": {
  10. "about" : "Hello, my wife is CCC",
  11. }
  12. }
  13. ]

如果要求查询的域中必须同时包含全部的query terms,则可以加上参数default_operator,如下:

  1. curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '
  2. {
  3. "query" : {
  4. "query_string" : {
  5. "query" : "xiao LA",
  6. "default_field" : "my",
  7. "default_operator" : "AND"
  8. }
  9. }
  10. }'


Filter

Filter只关心查询的结果是否与查询条件匹配,而不关心score(返回的所有结果的score都是1.0),因此filter比普通的查询速度更快。

match

现在,我们将找出about域含有"am Jack"的所有文档。

  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. > {
  3. > "query": {
  4. > "match" : {
  5. > "about" : "am Jack"
  6. > }
  7. > }
  8. > }'
  9. {
  10. "took" : 3,
  11. "timed_out" : false,
  12. "_shards" : {
  13. "total" : 5,
  14. "successful" : 5,
  15. "failed" : 0
  16. },
  17. "hits" : {
  18. "total" : 2,
  19. "max_score" : 0.70710677,
  20. "hits" : [ {
  21. "_index" : "megacorp",
  22. "_type" : "employee",
  23. "_id" : "2",
  24. "_score" : 0.70710677,
  25. "_source" : {
  26. "firstname" : "Jack",
  27. "lastname" : "Chen",
  28. "age" : 40,
  29. "about" : "Hello, I am Jack",
  30. "interests" : ["sports", "music"]
  31. }
  32. }, {
  33. "_index" : "megacorp",
  34. "_type" : "employee",
  35. "_id" : "3",
  36. "_score" : 0.02250402,
  37. "_source" : {
  38. "firstname" : "Lucy",
  39. "lastname" : "Liu",
  40. "age" : 50,
  41. "about" : "Hello, I am Lucy",
  42. "interests" : ["tv", "talking"]
  43. }
  44. } ]
  45. }
  46. }

这里,我们使用了match query 来对about域进行了全文检索。默认情况下,ES将返回的结果按照相关度_score进行排序。

从结果可以看到,这次查询返回了两个文档,一个文档的about域为"Hello, I am Jack",包含了全部的查询词,其相关度为0.70710677;另一个文档的about域为"Hello, I am Lucy",只含有查询词中的一个词,其相关度为0.02250402。


match_phrase

如果我们要求对整个词组进行匹配(要包含全部的单词),则可以使用match_phrase

  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. > {
  3. > "query": {
  4. > "match_phrase" : {
  5. > "about" : "am Jack"
  6. > }
  7. > }
  8. > }'
  9. {
  10. "took" : 6,
  11. "timed_out" : false,
  12. "_shards" : {
  13. "total" : 5,
  14. "successful" : 5,
  15. "failed" : 0
  16. },
  17. "hits" : {
  18. "total" : 1,
  19. "max_score" : 1.0,
  20. "hits" : [ {
  21. "_index" : "megacorp",
  22. "_type" : "employee",
  23. "_id" : "2",
  24. "_score" : 1.0,
  25. "_source": {
  26. "firstname" : "Jack",
  27. "lastname" : "Chen",
  28. "age" : 40,
  29. "about" : "Hello, I am Jack",
  30. "interests" : ["sports", "music"]
  31. }
  32. } ]
  33. }
  34. }

在使用match_phrase时,返回的满足条件的文档的about域的必须:

  • 同时包含 am 和 Jack 这两个单词
  • am 和 Jack 这两个单词必须紧挨在一起,中间不能有其他单词,但是可以有符号(例如逗号,中文或者英文符号均可)
  • 必须 am 在前,Jack 在后

下列两种查询条件都无法返回结果:

  • { "about" : "Jack am" }
  • { "about" : "am a Jack" }


高亮搜索结果

通过hightlight可以返回:文档中的哪一段文本hit了搜索条件

  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. > {
  3. > "query" : {
  4. > "match_phrase" : {
  5. > "about" : "am Jack"
  6. > }
  7. > },
  8. > "highlight" : {
  9. > "fields" : {
  10. > "about" : {}
  11. > }
  12. > }
  13. > }'
  14. {
  15. "took" : 28,
  16. "timed_out" : false,
  17. "_shards" : {
  18. "total" : 5,
  19. "successful" : 5,
  20. "failed" : 0
  21. },
  22. "hits" : {
  23. "total" : 1,
  24. "max_score" : 1.0,
  25. "hits" : [ {
  26. "_index" : "megacorp",
  27. "_type" : "employee",
  28. "_id" : "2",
  29. "_score" : 1.0,
  30. "_source": {
  31. "firstname" : "Jack",
  32. "lastname" : "Chen",
  33. "age" : 40,
  34. "about" : "Hello, I am Jack",
  35. "interests" : ["sports", "music"]
  36. },
  37. "highlight" : {
  38. "about" : [ "Hello, I <em>am</em> <em>Jack</em>" ]
  39. }
  40. } ]
  41. }
  42. }


Aggregation

下面,我们将对<megacorp,employee>范围内的4篇文档进行分析,按照interests这个field进行aggregate。

  1. curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. > {
  3. > "aggs" : {
  4. > "all_interests" : {
  5. > "terms" : { "field" : "interests" }
  6. > }
  7. > }
  8. > } '
  9. {
  10. "took" : 66,
  11. "timed_out" : false,
  12. "_shards" : {
  13. "total" : 5,
  14. "successful" : 5,
  15. "failed" : 0
  16. },
  17. "hits" : {
  18. "total" : 4,
  19. "max_score" : 1.0,
  20. "hits" : [ {
  21. "_index" : "megacorp",
  22. "_type" : "employee",
  23. "_id" : "1",
  24. "_score" : 1.0,
  25. "_source": {
  26. "firstname" : "Tao",
  27. "lastname" : "Xiao",
  28. "age" : 30,
  29. "about" : "Hello, my wife is CCC",
  30. "interests" : ["coding", "jogging"]
  31. }
  32. }, {
  33. "_index" : "megacorp",
  34. "_type" : "employee",
  35. "_id" : "AVCE4xivMv8zm4P4wh-e",
  36. "_score" : 1.0,
  37. "_source": {
  38. "firstname" : "Tao",
  39. "lastname" : "Xiao",
  40. "age" : 30,
  41. "about" : "Hello, my wife is CCC",
  42. "interests" : ["coding", "jogging"]
  43. }
  44. }, {
  45. "_index" : "megacorp",
  46. "_type" : "employee",
  47. "_id" : "2",
  48. "_score" : 1.0,
  49. "_source": {
  50. "firstname" : "Jack",
  51. "lastname" : "Chen",
  52. "age" : 40,
  53. "about" : "Hello, I am Jack",
  54. "interests" : ["sports", "music"]
  55. }
  56. }, {
  57. "_index" : "megacorp",
  58. "_type" : "employee",
  59. "_id" : "3",
  60. "_score" : 1.0,
  61. "_source": {
  62. "firstname" : "Lucy",
  63. "lastname" : "Liu",
  64. "age" : 50,
  65. "about" : "Hello, I am Lucy",
  66. "interests" : ["tv", "talking"]
  67. }
  68. } ]
  69. },
  70. "aggregations" : {
  71. "all_interests" : {
  72. "doc_count_error_upper_bound" : 0,
  73. "sum_other_doc_count" : 0,
  74. "buckets" : [ {
  75. "key" : "coding",
  76. "doc_count" : 2
  77. }, {
  78. "key" : "jogging",
  79. "doc_count" : 2
  80. }, {
  81. "key" : "music",
  82. "doc_count" : 1
  83. }, {
  84. "key" : "sports",
  85. "doc_count" : 1
  86. }, {
  87. "key" : "talking",
  88. "doc_count" : 1
  89. }, {
  90. "key" : "tv",
  91. "doc_count" : 1
  92. } ]
  93. }
  94. }
  95. }

如果希望在aggregate时加上一个限定条件(例如要求lastname为Xiao),可以如下:

  1. [root@ecs1 ~]# curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. > {
  3. > "query" : {
  4. > "match" : {
  5. > "lastname" : "Xiao"
  6. > }
  7. > },
  8. > "aggs" : {
  9. > "all_interests" : {
  10. > "terms" : { "field" : "interests" }
  11. > }
  12. > }
  13. > }'

此外,还可以查询:对于拥有相同兴趣的人,他们的平均年龄是多少?
为了说明这个例子,我们首先增加一个人:

  1. curl localhost:9200/megacorp/employee/4?pretty -d '
  2. > {
  3. > "firstname" : "Tao",
  4. > "lastname" : "Xiao",
  5. > "age" : 20,
  6. > "about" : "Hello, my wife is CCC",
  7. > "interests" : ["music", "tv"]
  8. > }'

现在来发起查询请求:

  1. curl localhost:9200/megacorp/employee/_search?pretty -d '
  2. {
  3. "aggs" : {
  4. "all_interests" : {
  5. "terms" : { "field" : "interests"},
  6. "aggs" : {
  7. "avg_age" : {
  8. "avg" : { "field" : "age"}
  9. }
  10. }
  11. }
  12. }
  13. }'
  14. --> 返回结果如下
  15. ···
  16. "aggregations" : {
  17. "all_interests" : {
  18. "doc_count_error_upper_bound" : 0,
  19. "sum_other_doc_count" : 0,
  20. "buckets" : [ {
  21. "key" : "coding",
  22. "doc_count" : 2,
  23. "avg_age" : {
  24. "value" : 30.0
  25. }
  26. }, {
  27. "key" : "jogging",
  28. "doc_count" : 2,
  29. "avg_age" : {
  30. "value" : 30.0
  31. }
  32. }, {
  33. "key" : "music",
  34. "doc_count" : 2,
  35. "avg_age" : {
  36. "value" : 30.0
  37. }
  38. }, {
  39. "key" : "tv",
  40. "doc_count" : 2,
  41. "avg_age" : {
  42. "value" : 35.0
  43. }
  44. }, {
  45. "key" : "sports",
  46. "doc_count" : 1,
  47. "avg_age" : {
  48. "value" : 40.0
  49. }
  50. }, {
  51. "key" : "talking",
  52. "doc_count" : 1,
  53. "avg_age" : {
  54. "value" : 50.0
  55. }
  56. } ]
  57. }
  58. }
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注