@xtccc
2015-11-19T09:15:04.000000Z
字数 5158
阅读 3301
HBase
HBase handles basically two kinds of file types: one is used for the write-ahead log and the other for the actual data storage. 这两类文件都由 HRegionServer 负责处理。
首先client会与ZK集群联系,从ZK那里询问哪一个节点持有
-ROOT-
这个region? 然后,client向该RS询问,.META.
表中包含target rowkey的region处于哪一个节点上?(以上这些查询的结果都将被client缓存起来)。最后,client访问target server,与RS通讯,进行真正的数据查询。由于client会缓存以上查询的元数据,随着时间的推移,client将会知道越来越多的元数据,也就愈发不用去查询
.META.
表了。
HRegionServer会打开region,并相应地创建一个
HRegion
实例。当HRegion打开后,HRegionServer会为每一个table的每一个family相应地创建一个
Store
实例。每一个
Store
实例都拥有若干个StoreFile
实例 ——StoreFile
实例其实是HFile的wrapper。
当client向HBase写入一条数据时,首先会将数据写入到WAL(write-ahead log)中。(WAL的作用是在RS崩溃后,还能恢复尚未被持久化的数据。)
当数据被写入到WAL后,就会被放入到
MemStore中
。同时,会检查MemStore是否已满(缓冲区大小由hbase.hregion.memstore.flush.size
配置,默认值是64MB),如果已满则将将其flush到磁盘上。该flush请求由RS中一个单独的线程负责,该线程会将数据写入到HDFS中的一个新的HFile文件中。
在HDFS的/hbase/WALs
下,有若干个目录,每个RS对应着一个这样的目录,目录里有若干HLog文件。一个RS中的全部regions共用同一套HLog files。
当log file中的数据被持久化到store file后,它将被移至/hbase/oldWALs
目录下,里面的文件每隔10分钟会被master删除(时间间隔由hbase.master.logcleaner.ttl
配置)。
/hbase/hbase.id
及/hbase/hbase.version
这两个文件则分别是HBase集群的ID以及文件格式的版本。
每一张表在都有自己单独的目录,例如testtable
对应的HDFS目录为/hbase/data/default/testtable
(testtable是默认的namespace中)。每个目录下有子目录.tabledesc
,里面的文件包含了关于table及family的元数据。
此外,在table的目录下,还有若干子目录,每个子目录对应着该表的一个region。每个子目录的名称是该region的名称的md5哈希值部分。例如,从HBase Web UI中可以看出,testtable
有5个region,它们的名字分别是:
testtable,,1444656645598.f655a770d069d70bce5a3c85826c550a.
testtable,row-300,1444656645598.9a55b2955f0e98a79fceadef74331ebb.
testtable,row-500,1444656645598.c75ed551d1b7895505fbea08d82e137d.
testtable,row-700,1444656645598.3c6450f6bf407275a623ba9faa08fa5f.
testtable,row-900,1444656645598.af90f4069bc0a763bc424cdfee4dd2bc.
Region Name的构成: table name
+ start key
+ time
而我们从HDFS中查询testable
表的几个目录:
# hdfs dfs -ls /hbase/data/default/testtable
/hbase/data/default/testtable/.tabledesc
/hbase/data/default/testtable/.tmp
/hbase/data/default/testtable/3c6450f6bf407275a623ba9faa08fa5f
/hbase/data/default/testtable/9a55b2955f0e98a79fceadef74331ebb
/hbase/data/default/testtable/af90f4069bc0a763bc424cdfee4dd2bc
/hbase/data/default/testtable/c75ed551d1b7895505fbea08d82e137d
/hbase/data/default/testtable/f655a770d069d70bce5a3c85826c550a
在每一个目录下面其他的子目录,例如:
# hdfs dfs -ls /hbase/data/default/testtable/3c6450f6bf407275a623ba9faa08fa5f
/hbase/data/default/testtable/3c6450f6bf407275a623ba9faa08fa5f/.regioninfo
/hbase/data/default/testtable/3c6450f6bf407275a623ba9faa08fa5f/.tmp
/hbase/data/default/testtable/3c6450f6bf407275a623ba9faa08fa5f/colfam1
其中,
HBase中有两个catalog tables,分别为 -root-
与 .META.
The -ROOT-
table is used to refer to all regions in the .META.
table. The design considers only one root region, that is, the root region is never split to guarantee a three-level, B+ tree-like lookup scheme: the first level is a node stored in ZooKeeper that contains the location of the root table's region—in other words, the name of the region server hosting that specific region. The second level is the lookup of a matching meta region from the -ROOT-
table, and the third is the retrieval of the user table region from the .META.
table.
实际上,在HBase 0.98中,表-ROOT-
不存在了;表.META.
也不存在了,它变为了表hbase:meta
。
在表hbase:meta
中,每条数据的rowkey是region name,如下:
由于ZK的分布式特性,其中会有非常频繁的状态转换,例如每个Region都可能会经历 Offline -> Pending Open -> Opening -> Open -> Pending Close -> Closing -> Closed -> Splitting -> Split
这些状态的转变,所以HBase需要通过ZK的znode来追踪这些状态的变化。
以下摘自 What are HBase znodes?:
In Apache HBase, ZooKeeper coordinates, communicates, and shares state between the Masters and RegionServers. HBase has a design policy of using ZooKeeper only for transient data (that is, for coordination and state communication). Thus if the HBase's ZooKeeper data is removed, only the transient operations are affected – data can continue to be written and read to/from HBase.
实际上,如果把ZK中的/hbase
这个znode删掉,HBase重启后依然可以正常运行。
HBase使用ZK来做以下事情:
tracking region servers
where the root region is hosted
HBase会在ZK的根节点(默认是/hbase
,通过zookeeper.znode.parent
来配置)下创建一系列的znode:
meta-region-server:
.META.
region所在的server name
backup-masters
table: Used by the master to track the table state during assignments (disabling/enabling states, for example).
draining: Used to decommission more than one RegionServer at a time by creating sub-znodes with the form serverName,port,startCode (for example, /hbase/draining/m1.host,60020,1338936306752). This lets you decommission multiple RegionServers without having the risk of regions temporarily moved to a RegionServer that will be decommissioned later. Read this to learn more about /hbase/draining.region-in-transition
table-lock
running
balancer
master: master所在的server name
namespace
hbaseid: cluster ID,与HDFS文件/hbase/hbase.id
的内容相同
online-snapshot
replication
splitWAL
recovering-regions
rs
如果开启了安全机制(例如Kerberos),还可能有其他的znodes。
The Access Control List (ACL) and the Token Provider coprocessors add two more znodes: one to synchronize access to table ACLs and the other to synchronize the token encryption keys across the cluster nodes.
/hbase/acl: The acl znode is used for synchronizing the changes made to the
acl
table by the grant/revoke commands. Each table will have a sub-znode (/hbase/acl/tableName) containing the ACLs of the table. (Read this for more information about the access controller and the ZooKeeper interaction.)/hbase/tokenauth: The token provider is usually used to allow a MapReduce job to access the HBase cluster. When a user asks for a new token the information will be stored in a sub-znode created for the key (/hbase/tokenauth/keys/key-id).