@xtccc 2015-12-09T10:37:42.000000Z 字数 12807 阅读 2316





1. 存储与搜索文档

1.1 存储文档 (indexing)

1.1.1 存储文档的本质

存储数据的过程,称为indexing。To index a document, we must tell ElasticSearch which type in the index it should go to.


By default, when you index a document, it’s first sent to one of the primary shards, which is chosen based on a hash of the document’s ID. Then, the document is sent to be indexed in all of that primary shard’s replicas.


The Elasticsearch node that receives your indexing request first selects the shard to index the document to. By default, documents are distributed evenly between shards.

1.1.2 存储文档的例子


  1. [root@ecs1 elasticsearch-1.7.2]# curl -XPUT localhost:9200/megacorp/employee/1?pretty -d '
  2. > {
  3. > "firstname" : "Tao",
  4. > "lastname" : "Xiao",
  5. > "age" : 30,
  6. > "about" : "Hello, my wife is CCC",
  7. > "interests" : ["coding", "jogging"]
  8. > }'
  9. {
  10. "_index" : "megacorp",
  11. "_type" : "employee",
  12. "_id" : "1",
  13. "_version" : 1,
  14. "created" : true
  15. }

在indexing document的过程中,尚不存在的index、type都会被自动创建。

从上面的PUT Request的返回响应来看,它的返回包含了被indexed的文档的index、type、id、version以及created,其中,version指的是该文档的版本号。每当一个document发生变化时(包括DELETE),它的metadata中的version就会自增1;而created为false,则表明这个文档是第一次被创建。

1.2 查询文档


When you search an index, Elasticsearch has to look in a complete set of shards for that index . Those shards can be either primary or replicas because primary and replica shards typically contain the same documents. Elasticsearch distributes the search load between the primary and replica shards of the index you’re searching, making replicas useful for both search performance and fault tolerance.

Elasticsearch uses a round-robin format to forward the request to the cluster’s nodes and shards. As shown in figure 2.9, Elasticsearch then gathers results from those shards, aggregates them into a single reply, and forwards the reply back to the client application.


1.2.0 multi-index, multi-type

1.2.1 查询文档的全部内容及metadata

  1. [root@ecs1 elasticsearch-1.7.2]# curl -i localhost:9200/megacorp/employee/1?pretty
  2. HTTP/1.1 200 OK
  3. Content-Type: application/json; charset=UTF-8
  4. Content-Length: 283
  5. {
  6. "_index" : "megacorp",
  7. "_type" : "employee",
  8. "_id" : "1",
  9. "_version" : 1,
  10. "found" : true,
  11. "_source":
  12. {
  13. "firstname" : "Tao",
  14. "lastname" : "Xiao",
  15. "age" : 30,
  16. "about" : "Hello, my wife is CCC",
  17. "interests" : ["coding", "jogging"]
  18. }
  19. }

可以看到,response返回了该文档的metadata,并通过_source域返回了 full JSON document。

1.2.2 只查询文档全文,不返回metadata


  1. curl localhost:9200/megacorp/employee/1/_source?pretty

1.2.3 只查询文档的部分fields

  1. curl localhost:9200/megacorp/employee/1?_source=firstname,interests

1.2.4 批量获取多个文档

  1. [root@ecs1 tmp]# curl localhost:9200/_mget?pretty -d '
  2. > {
  3. > "docs" : [
  4. > {
  5. > "_index" : "megacorp",
  6. > "_type" : "employee",
  7. > "_id" : 2
  8. > },
  9. > {
  10. > "_index" : "megacorp",
  11. > "_type" : "employee",
  12. > "_id" : 3
  13. > }
  14. > ]
  15. > } '
  16. {
  17. "docs" : [ {
  18. "_index" : "megacorp",
  19. "_type" : "employee",
  20. "_id" : "2",
  21. "_version" : 1,
  22. "found" : true,
  23. "_source": {
  24. "firstname" : "Jack",
  25. "lastname" : "Chen",
  26. "age" : 40,
  27. "about" : "Hello, I am Jack",
  28. "interests" : ["sports", "music"]
  29. }
  30. }, {
  31. "_index" : "megacorp",
  32. "_type" : "employee",
  33. "_id" : "3",
  34. "_version" : 1,
  35. "found" : true,
  36. "_source": {
  37. "firstname" : "Lucy",
  38. "lastname" : "Liu",
  39. "age" : 50,
  40. "about" : "Hello, I am Lucy",
  41. "interests" : ["tv", "talking"]
  42. }
  43. } ]
  44. }


  1. [root@ecs1 tmp]# curl localhost:9200/megacorp/employee/_mget -d '
  2. > {
  3. > "docs" : [
  4. > { "_id" : 2 },
  5. > { "_id" : 3 }
  6. > ]
  7. > }'

1.2.5 分页查询


1.3 更新文档

1.3.1 文档只能整体替换,不能局部更新

ES中的文档是不可变的(immutable),因此如果要更新一个文档,只能replace it。例如,这里我们更新坐标为megacorp/employee/1的文档:

  1. [root@ecs1 ~]# curl -XPUT localhost:9200/megacorp/employee/1?pretty -d '
  2. > {
  3. > "age" : 31,
  4. > "about" : "I amm updated"
  5. > }'
  6. {
  7. "_index" : "megacorp",
  8. "_type" : "employee",
  9. "_id" : "1",
  10. "_version" : 2,
  11. "created" : false
  12. }


1.3.2 如果真的希望实现局部更新文档的效果

通过_update可以实现,但是本质上依然遵循了retrieve -> change -> reindex的路线。

通过 partial update,可以增加新的fields,也可以修改已有的fields,如下:

  1. [root@ecs1 ~]# curl localhost:9200/megacorp/employee/8?pretty
  2. {
  3. "_index" : "megacorp",
  4. "_type" : "employee",
  5. "_id" : "8",
  6. "_version" : 20,
  7. "found" : true,
  8. "_source": {
  9. "name" : "Tao",
  10. "city" : "NJ"
  11. }
  12. }
  13. [root@ecs1 ~]# curl localhost:9200/megacorp/employee/8/_update?pretty -d '
  14. > {
  15. > "doc" : {
  16. > "tags" : ["A", "B"],
  17. > "gender" : "male",
  18. > "city" : "Taixing"
  19. > }
  20. > }'
  21. {
  22. "_index" : "megacorp",
  23. "_type" : "employee",
  24. "_id" : "8",
  25. "_version" : 21
  26. }
  27. [root@ecs1 ~]# curl localhost:9200/megacorp/employee/8?pretty
  28. {
  29. "_index" : "megacorp",
  30. "_type" : "employee",
  31. "_id" : "8",
  32. "_version" : 21,
  33. "found" : true,
  34. "_source":{
  35. "name":"Tao",
  36. "city":"Taixing",
  37. "tags":["A","B"],
  38. "gender":"male"
  39. }
  40. }

1.4 防止覆盖已有文档

如果在index document时,希望做到:只有当该文档尚未存在(即同样的 index/type/id)时,才能index新的文档;如果已存在,就不能将旧的文档覆盖。


  1. [root@ecs1 ~]# curl -i -XPUT localhost:9200/megacorp/employee/1/_create -d '
  2. > {
  3. > "age" : 31,
  4. > "about" : "I amm updated"
  5. > }'
  6. HTTP/1.1 409 Conflict
  7. Content-Type: application/json; charset=UTF-8
  8. Content-Length: 109
  9. {"error":"DocumentAlreadyExistsException[[megacorp][4] [employee][5]: document already exists]","status":409}


  1. [root@ecs1 ~]# curl -i -XPUT localhost:9200/megacorp/employee/1?op_type=create -d '
  2. > {
  3. > "age" : 31,
  4. > "about" : "I amm updated"
  5. > }'
  6. HTTP/1.1 409 Conflict
  7. Content-Type: application/json; charset=UTF-8
  8. Content-Length: 109
  9. {"error":"DocumentAlreadyExistsException[[megacorp][6] [employee][7]: document already exists]","status":409}

1.5 删除文档


  1. [root@ecs1 elasticsearch-1.7.2]# curl -i -XDELETE localhost:9200/megacorp/employee/1?pretty
  2. HTTP/1.1 200 OK
  3. Content-Type: application/json; charset=UTF-8
  4. Content-Length: 103
  5. {
  6. "found" : true,
  7. "_index" : "megacorp",
  8. "_type" : "employee",
  9. "_id" : "1",
  10. "_version" : 2
  11. }
  12. [root@ecs1 elasticsearch-1.7.2]# curl -i -XHEAD localhost:9200/megacorp/employee/1?pretty
  13. HTTP/1.1 404 Not Found
  14. Content-Type: text/plain; charset=UTF-8
  15. Content-Length: 0


1.6 测试文档是否存在


  1. [root@ecs1 elasticsearch-1.7.2]# curl -i -XHEAD localhost:9200/megacorp/employee/1?pretty
  2. HTTP/1.1 200 OK
  3. Content-Type: text/plain; charset=UTF-8
  4. Content-Length: 0

HEAD Request不返回body,只返回HTTP header

1.7 随机生成文档ID


  1. [root@ecs1 elasticsearch-1.7.2]# curl -XPOST localhost:9200/megacorp/employee?pretty -d '
  2. > {
  3. > "firstname" : "Tao",
  4. > "lastname" : "Xiao",
  5. > "age" : 30,
  6. > "about" : "Hello, my wife is CCC",
  7. > "interests" : ["coding", "jogging"]
  8. > }'
  9. {
  10. "_index" : "megacorp",
  11. "_type" : "employee",
  12. "_id" : "AVCE4xivMv8zm4P4wh-e",
  13. "_version" : 1,
  14. "created" : true
  15. }

注意,此时的 -XPUT 要换成 -XPOST。ES生成的随机ID是一个: 22-character long, URL-safe, Base64-encoded string universally unique identifier, or UUID。

1.8 解决读写不一致的冲突

In the database world, two approaches are commonly used to ensure that changes are not lost when making concurrent updates:

  1. Pessimistic concurrency control

    Widely used by relational databases, this approach assumes that conflicting changes are likely to happen and so blocks access to a resource in order to prevent conflicts. A typical example is locking a row before reading its data, ensuring that only the thread that placed the lock is able to make changes to the data in that row.

  2. Optimistic concurrency control

    Used by Elasticsearch, this approach assumes that conflicts are unlikely to happen and doesn’t block operations from being attempted. However, if the underlying data has been modified between reading and writing, the update will fail. It is then up to the application to decide how it should resolve the conflict. For instance, it could reattempt the update, using the fresh data, or it could report the situation to the user.

参考 Optimistic Concurrency Control

1.9 通过version参数实现乐观锁

1.9.1 Internal Version Number


  1. [root@ecs1 ~]# curl -XPUT localhost:9200/megacorp/employee/7?pretty -d '
  2. > {
  3. > "name" : "Tao",
  4. > "city" : "NJ"
  5. > }'
  6. {
  7. "_index" : "megacorp",
  8. "_type" : "employee",
  9. "_id" : "7",
  10. "_version" : 1,
  11. "created" : true
  12. }


  1. [root@ecs1 ~]# curl -XPUT localhost:9200/megacorp/employee/7?version=1 -d '
  2. > {
  3. > "name" : "Tao",
  4. > "city" : "SH"
  5. > }'
  6. {"_index":"megacorp","_type":"employee","_id":"7","_version":2,"created":false}



  1. [root@ecs1 ~]# curl -XPUT localhost:9200/megacorp/employee/7?version=1 -d '
  2. > {
  3. > "name" : "Tao",
  4. > "city" : "BJ"
  5. > }'
  6. {"error":"VersionConflictEngineException[[megacorp][3] [employee][7]: version conflict, current [2], provided [1]]","status":409}

1.9.2 External Version

有时候,我们会将外部存储系统(例如MySQL)的数据导入到ES中,那我们就可以将这些数据的一些属性(例如时间戳)作为ES中文档的Version Number。此时,我们可以通过参数version_type=external来声明,如下:

  1. [root@ecs1 ~]# curl -XPUT "localhost:9200/megacorp/employee/8?version=10&version_type=external" -d '
  2. > {
  3. > "name" : "Tao",
  4. > "city" : "BJ"
  5. > }'
  6. {"_index":"megacorp","_type":"employee","_id":"8","_version":10,"created":true}

对于external version number,在更新同一个文档时,指定的version number必须大于该文档原来的version number(不必严格等于),且在范围(0, 9.2e+18)内。下面验证:

  1. [root@ecs1 ~]# curl -XPUT "localhost:9200/megacorp/employee/8?version=20&version_type=external" -d '
  2. > {
  3. > "name" : "Tao",
  4. > "city" : "NJ"
  5. > }'
  6. {"_index":"megacorp","_type":"employee","_id":"8","_version":20,"created":false}

1.9.3 冲突重试



  1. [root@ecs1 ~]# curl -XPOST localhost:9200/megacorp/employee/7/_update?retry_on_conflict=6 -d '
  2. {
  3. "script" : "ctx._source.city=NY"
  4. }'
  5. {
  6. "_index":"megacorp",
  7. "_type":"employee",
  8. "_id":"7",
  9. "_version":3
  10. }

1.10 Script

1.10.1 利用script实现局部更新


  1. [root@ecs1 elasticsearch-1.7.2]# curl -XPUT localhost:9200/megacorp/employee/8?pretty -d '
  2. > {
  3. > "tags" : ["A", "B"],
  4. > "gender" : "male",
  5. > "city" : "Taixing",
  6. > "age" : 30
  7. > }'
  8. {
  9. "_index" : "megacorp",
  10. "_type" : "employee",
  11. "_id" : "8",
  12. "_version" : 32,
  13. "created" : true
  14. }


  1. curl -XPOST localhost:9200/megacorp/employee/8/_update -d '
  2. > {
  3. > "script" : "ctx._source.age+=1"
  4. > }'


  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/8?pretty
  2. {
  3. "_index" : "megacorp",
  4. "_type" : "employee",
  5. "_id" : "8",
  6. "_version" : 33,
  7. "found" : true,
  8. "_source":{
  9. "tags":["A","B"],
  10. "gender":"male",
  11. "city":"Taixing",
  12. "age":31}
  13. }


  1. [root@ecs1 elasticsearch-1.7.2]# curl -XPOST localhost:9200/megacorp/employee/8/_update -d '
  2. {
  3. "script" : "ctx._source.tags+=new_tag",
  4. > "params" : {
  5. > "new_tag" : "C"
  6. > }
  7. > }'
  8. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/8?pretty
  9. {
  10. "_index" : "megacorp",
  11. "_type" : "employee",
  12. "_id" : "8",
  13. "_version" : 34,
  14. "found" : true,
  15. "_source":{
  16. "tags":["A","B","C"],
  17. "gender":"male",
  18. "city":"Taixing",
  19. "age":31}
  20. }

1.10.2 基于内容删除一个文档


  1. [root@ecs1 elasticsearch-1.7.2]# curl -XPOST localhost:9200/megacorp/employee/8/_update -d '
  2. > {
  3. > "script" : "ctx.op = ctx._source.age == target_age ? 'delete' : 'none'",
  4. > "params" : {
  5. > "target_age" : 31
  6. > }
  7. > }'


  1. {
  2. "error":"ElasticsearchIllegalArgumentException[failed to execute script];
  3. nested: GroovyScriptExecutionException[MissingPropertyException[No such property: delete for class: da51a7f9a0e50de83432eb2d5f50321c8bce1178]]; ",
  4. "status":400
  5. }


1.10.3 更新一个尚不存在的文档


  1. [root@ecs1 ~]# curl -XPOST localhost:9200/megacorp/employee/9/_update -d '
  2. > {
  3. > "script" : "ctx._source.views+=1",
  4. > "upsert" : {
  5. > "views" : 1
  6. > }
  7. > }'
  8. {
  9. "_index":"megacorp",
  10. "_type":"employee",
  11. "_id":"9",
  12. "_version":1
  13. }
  14. [root@ecs1 ~]# curl localhost:9200/megacorp/employee/9?pretty
  15. {
  16. "_index" : "megacorp",
  17. "_type" : "employee",
  18. "_id" : "9",
  19. "_version" : 1,
  20. "found" : true,
  21. "_source":{"views":1}
  22. }
  23. [root@ecs1 ~]# curl -XPOST localhost:9200/megacorp/employee/9/_update -d '
  24. > {
  25. > "script" : "ctx._source.views+=1",
  26. > "upsert" : {
  27. > "views" : 1
  28. > }
  29. > }'
  30. {
  31. "_index":"megacorp",
  32. "_type":"employee",
  33. "_id":"9",
  34. "_version":2
  35. }
  36. [root@ecs1 ~]# curl localhost:9200/megacorp/employee/9?pretty
  37. {
  38. "_index" : "megacorp",
  39. "_type" : "employee",
  40. "_id" : "9",
  41. "_version" : 2,
  42. "found" : true,
  43. "_source":{"views":2}
  44. }


  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/1?pretty -d '
  2. > {
  3. > "firstname" : "Tao",
  4. > "lastname" : "Xiao",
  5. > "age" : 30,
  6. > "about" : "Hello, my wife is CCC",
  7. > "interests" : ["coding", "jogging"]
  8. > }'
  9. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/2?pretty -d '
  10. > {
  11. > "firstname" : "Jack",
  12. > "lastname" : "Chen",
  13. > "age" : 40,
  14. > "about" : "Hello, I am Jack",
  15. > "interests" : ["sports", "music"]
  16. > }'
  17. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/3?pretty -d '
  18. > {
  19. > "firstname" : "Lucy",
  20. > "lastname" : "Liu",
  21. > "age" : 50,
  22. > "about" : "Hello, I am Lucy",
  23. > "interests" : ["tv", "talking"]
  24. > }'


  1. [root@ecs1 elasticsearch-1.7.2]# curl localhost:9200/megacorp/employee/_search?pretty
  2. {
  3. "took" : 3,
  4. "timed_out" : false,
  5. "_shards" : {
  6. "total" : 5,
  7. "successful" : 5,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : 4,
  12. "max_score" : 1.0,
  13. "hits" : [ {
  14. "_index" : "megacorp",
  15. "_type" : "employee",
  16. "_id" : "1",
  17. "_score" : 1.0,
  18. "_source":
  19. {
  20. "firstname" : "Tao",
  21. "lastname" : "Xiao",
  22. "age" : 30,
  23. "about" : "Hello, my wife is CCC",
  24. "interests" : ["coding", "jogging"]
  25. }
  26. }, {
  27. "_index" : "megacorp",
  28. "_type" : "employee",
  29. "_id" : "AVCE4xivMv8zm4P4wh-e",
  30. "_score" : 1.0,
  31. "_source":
  32. {
  33. "firstname" : "Tao",
  34. "lastname" : "Xiao",
  35. "age" : 30,
  36. "about" : "Hello, my wife is CCC",
  37. "interests" : ["coding", "jogging"]
  38. }
  39. }, {
  40. "_index" : "megacorp",
  41. "_type" : "employee",
  42. "_id" : "2",
  43. "_score" : 1.0,
  44. "_source":
  45. {
  46. "firstname" : "Jack",
  47. "lastname" : "Chen",
  48. "age" : 40,
  49. "about" : "Hello, I am Jack",
  50. "interests" : ["sports", "music"]
  51. }
  52. }, {
  53. "_index" : "megacorp",
  54. "_type" : "employee",
  55. "_id" : "3",
  56. "_score" : 1.0,
  57. "_source":
  58. {
  59. "firstname" : "Lucy",
  60. "lastname" : "Liu",
  61. "age" : 50,
  62. "about" : "Hello, I am Lucy",
  63. "interests" : ["tv", "talking"]
  64. }
  65. } ]
  66. }
  67. }


