@xtccc 2018-07-14T00:03:33.000000Z 字数 9558 阅读 2948

Query

此处输入图片的描述

ElasticSearch

结构化查询（Structured Query）

在深入Query之前，先了解以下的概念：

Mapping
     How the data is each field is interpreted

Analysis：
     How full text is processed to make it searchable

Query DSL：
     The flexible, powerful query language used by ElasticSearch

这里我们将使用这些测试数据。

Lightweight Search / Query-string Search

在名为employee的type中，搜索lastname为Xiao的文档：

curl localhost:9200/megacorp/employee/_search?q=lastname:Xiao
{
  "took":2,
  "timed_out":false,
  "_shards":{
    "total":5,
    "successful":5,
    "failed":0
   }, 
   "hits":{
     "total":2,
     "max_score":1.0,
     "hits":[ {
        "_index":"megacorp",
        "_type":"employee",
        "_id":"AVCE4xivMv8zm4P4wh-e",
        "_score":1.0,
        "_source": {
          "firstname" : "Tao",
          "lastname"  : "Xiao",
          "age"       : 30,
          "about"     : "Hello, my wife is CCC",
          "interests" : ["coding", "jogging"]
        }
     }, {
        "_index":"megacorp",
        "_type":"employee",
        "_id":"1",
        "_score":0.30685282,
        "_source": { 
          "firstname" : "Tao",
          "lastname"  : "Xiao",
          "age"       : 30,
          "about"     : "Hello, my wife is CCC",
          "interests" : ["coding", "jogging"]
        }
     }]
  }
}

默认返回前10条匹配的记录，我们也可以要求只返回前20个:

GET megacorp/employee/_search
{
  "query" : {
    "match" : {
      "lastname" : "xiao"
    }
  },
  "size" : 20
}

或者返回第20个 ~ 30个文档

GET megacorp/employee/_search
{
  "query" : {
    "match" : {
      "lastname" : "xiao"
    }
  },
  "from" : 20,
  "size" : 10
}

其中，_source是找到的文档全部内容（即_source），可以要求只返回部分内容：

GET megacorp/employee/_search
{
  "query" : {
    "match" : {
      "lastname" : "xiao"
    }
  },
  "_source" : ["age", "about"]
}

可以在某些index里面查询

GET index1,index2,index/_search
{
  "query" : { "match" : { "lastname" : "xiao"}}
}

或者是全部的indices

GET _all/_search
{
  "query" : { "match" : { "lastname" : "xiao"}}
}

Query DSL

包含某个单词

GET localhost:9200/megacorp/employee/_search?pretty -d '
{
    "query" : { "match" : { "lastname" : "Xiao" }}
}

如果lastname = "A B Xiao cc DD"通过，如果lastname = "A B tXiaoy cc DD"则不通过，即必须是单词整体匹配

包含至少一个单词

GET localhost:9200/megacorp/employee/_search?pretty -d '
{
    "query" : { "match" : { "lastname" : "Xiao lane" }}
}

如果lastname中包含"Xiao"或者"lane"则可以通过。

包含词组

GET localhost:9200/megacorp/employee/_search?pretty -d '
{
    "query" : { "match_phrase" : { "lastname" : "Xiao lane" }}
}

必须包含"Xiao lane"这个整体。

Filter

我们想在type为employee的范围中，所有lastname为Xiao、且age大于20的文档找出来，这里我们将使用range filter：

[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
>    "query" : {
>       "filtered" : {
>         "filter" : {
>           "range" : {
>             "age" : {"gt" : 20}
>           }
>         },
>         "query" : {
>            "match" : {
>               "lastname" : "Xiao"
>            }
>         }
>       }
>    }
> }'

查询可以返回正确的结果。

全文检索（Full-Text Query）

在进行全文检索时，有以下几种case：

对指定的某几个fields进行检索
对文档的全部fields进行检索
指定检索的范围（index, type）

检索返回的响应信息

对于检索请求
curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=about,lastname&pretty'
返回了以下的结果：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.43920785,
    "hits" : [ {
          "_index" : "megacorp",
          "_type" : "employee",
          "_id" : "AVCE4xivMv8zm4P4wh-e",
          "_score" : 0.43920785,
          "fields" : {
            "about" : [ "Hello, my wife is CCC" ],
            "lastname" : [ "Xiao" ]
          }
      }, {
          "_index" : "megacorp",
          "_type" : "employee",
          "_id" : "4",
          "_score" : 0.3125,
          "fields" : {
            "about" : [ "Hello, my wife is CCC" ],
            "lastname" : [ "Xiao" ]
          }
    } ]
  }
}

took: 这次检索花费的时间，单位为milliseconds

timed_out：检索是否超时？在默认情况下，检索永远不会超时。但是可以在发起检索请求时通过 timeout=<超时时长> 参数来设置一个超时的时长。如果检索过程超时，只能返回截至超时前获得的部分结果。

_shards: 这是检索过程所涉及到的shards的统计情况。如果某些节点down造成部分shards不可用，则搜索的结果可能会不完整。

hits: 检索返回的结果，ES默认返回所有结果中的前10条。通过在检索请求中添加size=<结果数量>，可以改变返回结果的数量。

fields: 查询时指定的检索域。如果在查询请求中没有指定field，则这里会变为_source (即原始的JSON文档内容)

针对指定的fields进行检索

指定多个fields进行检索

这里用GET的方式来演示。

case 1：其中一个field满足条件

$ curl 'ecs1:9200/megacorp/employee/_search?q=xiao&fields=lastname,about&pretty'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.43920785,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "AVCE4xivMv8zm4P4wh-e",
      "_score" : 0.43920785,
      "fields" : {
        "about" : [ "Hello, my wife is CCC" ],
        "lastname" : [ "Xiao" ]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "4",
      "_score" : 0.3125,
      "fields" : {
        "about" : [ "Hello, my wife is CCC" ],
        "lastname" : [ "Xiao" ]
      }
    } ]
  }
}

这里指定了检索出lastname 或者 about域含有 "xiao"的文档文档。

case 2：所有的fields都要满足条件

待写

指定检索的范围（index, type）

在某个index的范围内检索

例如，在get-together这个index的范围内对文档进行检索

$ curl 'ecs1:9200/get-together/_search?q=elasticsearch&pretty'

在多个index的范围内检索

例如，在get-together和other-index这两个index的范围内对文档进行检索

$ curl 'ecs1:9200/get-together,other-index/_search?q=elasticsearch&pretty'

在所有的index的范围内进行检索

$ curl 'ecs1:9200/_search?q=elasticsearch&pretty'

在一个index的某几个types的范围内进行检索

$ curl 'ecs1:9200/get-together/group,event/_search?q=elasticsearch&pretty'

在全部index的某个指定type内进行检索

$ curl 'ecs1:9200/_all/group,event/_search?q=elasticsearch&pretty'

Query方式

通过在Query Request Body中构造JSON格式的内容，我们可以实现复杂的查询需求。

匹配任意查询词 v.s. 匹配所有查询词

在默认情况下，会返回所有满足任意一个query term的文档。

curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "query_string" : {
            "query" : "xiao LA",
            "default_field" : "my"
        }
    }
}'

例如，上面的查询会返回以下两个文档：

   "hits" : [ {
     ...    
     "_source": {
        ...
        "about"     : "Hello, my wife is CCC",
      }
     }, {
       ...
       "_source": {
       "about"     : "Hello, my wife is CCC",
       }
     }
   ]

如果要求查询的域中必须同时包含全部的query terms，则可以加上参数default_operator，如下：

curl 'ecs1:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "query_string" : {
            "query" : "xiao LA",
            "default_field" : "my",
            "default_operator" : "AND"
        }
    }
}'

Filter

Filter只关心查询的结果是否与查询条件匹配，而不关心score(返回的所有结果的score都是1.0)，因此filter比普通的查询速度更快。

match

现在，我们将找出about域含有"am Jack"的所有文档。

[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
>    "query": {
>      "match" : {
>         "about" : "am Jack"
>      }
>    }
> }'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.70710677,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 0.70710677,
      "_source" : { 
        "firstname" : "Jack",
        "lastname"  : "Chen",
        "age"       : 40,
        "about"     : "Hello, I am Jack",
        "interests" : ["sports", "music"]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "3",
      "_score" : 0.02250402,
      "_source" : { 
        "firstname" : "Lucy",
        "lastname"  : "Liu",
        "age"       : 50,
        "about"     : "Hello, I am Lucy",
        "interests" : ["tv", "talking"]
       }
    } ]
  }
}

这里，我们使用了match query 来对about域进行了全文检索。默认情况下，ES将返回的结果按照相关度_score进行排序。

从结果可以看到，这次查询返回了两个文档，一个文档的about域为"Hello, I am Jack"，包含了全部的查询词，其相关度为0.70710677；另一个文档的about域为"Hello, I am Lucy"，只含有查询词中的一个词，其相关度为0.02250402。

match_phrase

如果我们要求对整个词组进行匹配（要包含全部的单词），则可以使用match_phrase：

[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
>    "query": {
>       "match_phrase" : {
>          "about" : "am Jack"
>       }
>    }
> }'
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 1.0,
      "_source": { 
         "firstname" : "Jack",
         "lastname"  : "Chen",
         "age"       : 40,
         "about"     : "Hello, I am Jack",
         "interests" : ["sports", "music"]
       }
    } ]
  }
}

在使用match_phrase时，返回的满足条件的文档的about域的必须：

同时包含 am 和 Jack 这两个单词

am 和 Jack 这两个单词必须紧挨在一起，中间不能有其他单词，但是可以有符号（例如逗号，中文或者英文符号均可）

必须 am 在前，Jack 在后

下列两种查询条件都无法返回结果：

{ "about" : "Jack am" }

{ "about" : "am a Jack" }

高亮搜索结果

通过hightlight可以返回：文档中的哪一段文本hit了搜索条件

[root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
>    "query" : {
>       "match_phrase" : {
>          "about" : "am Jack"
>       }
>    },
>    "highlight" : {
>       "fields" : {
>          "about" : {}
>       }
>    }
> }'
{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 1.0,
      "_source": { 
         "firstname" : "Jack",
         "lastname"  : "Chen",
         "age"       : 40,
         "about"     : "Hello, I am Jack",
         "interests" : ["sports", "music"]
      },
      "highlight" : {
        "about" : [ "Hello, I <em>am</em> <em>Jack</em>" ]
      }
    } ]
  }
}

Aggregation

下面，我们将对<megacorp,employee>范围内的4篇文档进行分析，按照interests这个field进行aggregate。

curl localhost:9200/megacorp/employee/_search?pretty -d '
> {
>    "aggs" : {
>      "all_interests" : {
>         "terms" : { "field" : "interests" }
>      }
>    }
> } '
{
  "took" : 66,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "1",
      "_score" : 1.0,
      "_source": { 
         "firstname" : "Tao",
         "lastname"  : "Xiao",
         "age"       : 30,
         "about"     : "Hello, my wife is CCC",
         "interests" : ["coding", "jogging"]
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "AVCE4xivMv8zm4P4wh-e",
      "_score" : 1.0,
      "_source": {
         "firstname" : "Tao",
         "lastname"  : "Xiao",
         "age"       : 30,
         "about"     : "Hello, my wife is CCC",
         "interests" : ["coding", "jogging"]
       }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "2",
      "_score" : 1.0,
      "_source": { 
         "firstname" : "Jack",
         "lastname"  : "Chen",
         "age"       : 40,
         "about"     : "Hello, I am Jack",
         "interests" : ["sports", "music"]
       }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "3",
      "_score" : 1.0,
      "_source": { 
         "firstname" : "Lucy",
         "lastname"  : "Liu",
         "age"       : 50,
         "about"     : "Hello, I am Lucy",
         "interests" : ["tv", "talking"]
      }
    } ]
  },
  "aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "coding",
        "doc_count" : 2
      }, {
        "key" : "jogging",
        "doc_count" : 2
      }, {
        "key" : "music",
        "doc_count" : 1
      }, {
        "key" : "sports",
        "doc_count" : 1
      }, {
        "key" : "talking",
        "doc_count" : 1
      }, {
        "key" : "tv",
        "doc_count" : 1
      } ]
    }
  }
}

如果希望在aggregate时加上一个限定条件（例如要求lastname为Xiao），可以如下：

[root@ecs1 ~]# curl localhost:9200/megacorp/employee/_search?pretty -d '
>  {
>     "query" : {
>        "match" : {
>           "lastname" : "Xiao"
>        }
>     },
>     "aggs" : {
>        "all_interests" : {
>           "terms" : { "field" : "interests" }
>        }
>     }
>  }'

此外，还可以查询：对于拥有相同兴趣的人，他们的平均年龄是多少？
为了说明这个例子，我们首先增加一个人：

curl localhost:9200/megacorp/employee/4?pretty -d '
> {
>     "firstname" : "Tao",
>     "lastname"  : "Xiao",
>     "age"       : 20,
>     "about"     : "Hello, my wife is CCC",
>     "interests" : ["music", "tv"]
> }'

现在来发起查询请求：

curl localhost:9200/megacorp/employee/_search?pretty -d '
{ 
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests"},
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age"}
                }
            }
        }
    }
}'
--> 返回结果如下
···
"aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "coding",
        "doc_count" : 2,
        "avg_age" : {
          "value" : 30.0
        }
      }, {
        "key" : "jogging",
        "doc_count" : 2,
        "avg_age" : {
          "value" : 30.0
        }
      }, {
        "key" : "music",
        "doc_count" : 2,
        "avg_age" : {
          "value" : 30.0
        }
      }, {
        "key" : "tv",
        "doc_count" : 2,
        "avg_age" : {
          "value" : 35.0
        }
      }, {
        "key" : "sports",
        "doc_count" : 1,
        "avg_age" : {
          "value" : 40.0
        }
      }, {
        "key" : "talking",
        "doc_count" : 1,
        "avg_age" : {
          "value" : 50.0
        }
      } ]
   }
}

Query

结构化查询（Structured Query）

Lightweight Search / Query-string Search

Query DSL

包含某个单词

包含至少一个单词

包含词组

Filter

全文检索（Full-Text Query）

检索返回的响应信息

针对指定的fields进行检索

指定多个fields进行检索

指定检索的范围（index, type）

在某个index的范围内检索

在多个index的范围内检索

在所有的index的范围内进行检索

在一个index的某几个types的范围内进行检索

在全部index的某个指定type内进行检索

Query方式

匹配任意查询词 v.s. 匹配所有查询词

Filter

match

match_phrase

高亮搜索结果

Aggregation

内容目录