@xtccc 2015-12-17T01:55:44.000000Z 字数 6225 阅读 4833

Table and View

此处输入图片的描述

Phoenix

Phoenix创建table

在sqlline.py中创建一个table，表名为TA1，主键名为MYKEY，两列分别为C1和C2，然后插入一条记录：

create table ta1(mykey varchar not null primary key, c1 varchar, c2 integer);
upsert into ta1 values ('row-1', 'c1-val', 100);

通过HBase shell来扫描该表：
此处输入图片的描述

可见：在Phoenix SQL中，如果列名或者表名没有加上双引号，则将被自动转换为全大写字母。

可以注意到针对每一个row，HBase中都存在一个column name为_0的列，且它的value为空值。

Empty cell is created for each row. It's used to enforce PRIMARY KEY constraints because HBase doesn't store cells with NULL values.

Phoenix的主键 & HBase的rowkey

在上例中，可以看到，Phoenix通过mykey varchar not null primary key指定的主键mykey就是HBase中的rowkey。

还有另外一种对Phoenix table进行主键约束（Constraint PK）的方法，如下：

> create table "test" ("row" varchar not null, "c1" varchar, "c2" varchar CONSTRAINT PK primary key ("row"));

这里创建了一个含有3个column（rowkey, c1, c2）的表，并且通过CONSTRAINT PK指定了列rowkey作为该表的主键。

下面通过Phoenix SQL插入两条记录，然后查看该表内容：

> upsert into "test" values ('row-1', 'c1-1', 'c2-1');
> upsert into "test" values ('row-2', 'c1-2', 'c2-2');
> select * from "test";
+---------------------------+---------------------------+---------------------------+
|    row                    |           c1              |             c2            |
+---------------------------+---------------------------+---------------------------+
| row-1                     | c1-1                      | c2-1                      |
| row-2                     | c1-2                      | c2-2                      |
+---------------------------+---------------------------+---------------------------+

现在再通过HBase shell看看HBase table中的实际内容：

hbase(main):002:0> scan 'test'
ROW                                   COLUMN+CELL                                                                                               
 row-1                                column=0:_0, timestamp=1445235997184, value=                                                              
 row-1                                column=0:c1, timestamp=1445235997184, value=c1-1                                                          
 row-1                                column=0:c2, timestamp=1445235997184, value=c2-1                                                          
 row-2                                column=0:_0, timestamp=1445236002241, value=                                                              
 row-2                                column=0:c1, timestamp=1445236002241, value=c1-2                                                          
 row-2                                column=0:c2, timestamp=1445236002241, value=c2-2                                                          
2 row(s) in 0.0600 seconds

Namespace

HBase中的table有一个相关联的namespace（默认的namespace为default），我们在Phoenix中在创建table时也可以指定namespace，例如，下面我们在namespace spark 中创建 table test：

create table "spark:test" ("pk" varchar not null primary key, "c" varchar);

Family

Phoenix中只指定了列名，那么HBase中的family name就默认是0。我们也可以在Phoenix中指定一个名为F的family name，如下：

create table ta2(mykey integer not null primary key, f.c1 varchar);
upsert into ta2 values (101, 'hello');

此处输入图片的描述

所以，Phoenix在创建table时，table name中不要含有字符. ，否则.前面的字符串会被当做family name。

也可以指定多个family name，如下：

create table ta3(mykey integer not null primary key, f1.c1 varchar, f1.c2 varchar, f2.c1 varchar, f2.c2 varchar);
upsert into ta3 values (1, 'hello', 'hbase', 'hi', 'phoenix');

此处输入图片的描述

Compression

开启压缩后，可以提高大表IO的效率。

> create table "ta1" ("pk" varchar not null primary key, "f1"."c" varchar, "f2"."c" varchar) COMPRESSION='SNAPPY';

此处输入图片的描述

为已有的HBase Table创建Phoenix View

如果HBase中已经存在了某个table，则可以在Phoenix中为该HBase table建立一个view。

这里要注意：HBase table中的数据的序列化方式，必须与Phoenix table/view中的数据的序列化方式是一致的。对于varchar、char和unsigned_*类型的数据，使用了HBase Bytes方法。

例子：

在HBase shell中创建table并插入数据
此处输入图片的描述

在Phoenix shell中创建映射到testtable的view

> create view "testtable" ( pk varchar primary key, "f1"."val" varchar);
> select * from "testtable";
+------------------------------------------+------------------------------------------+
|                    PK                    |                   val                    |
+------------------------------------------+------------------------------------------+
| row-1                                    | value-1                                  |
| row-2                                    | value-2                                  |
| row-3                                    |                                          |
| row-4                                    |                                          |
+------------------------------------------+------------------------------------------+
> upsert into "testtable" values("r1", "v1");
Error: ERROR 505 (42000): Table is read only. (state=42000,code=505)

可以看到：
1. Phoenix view的名字必须与HBase table的名字相同
2. 对Phoenix view使用select * from {view} 只能查询出view中已经映射的列
3. Phoenix view中的primary key 就是HBase table中的row key
4. View是只读的，不能修改（增加、删除、修改数据）

Salted Table

创建salted table时，可以指定buckets的数量，也可以指定split points。

通过指定buckets的数量来创建salted table

用关键字 SALT_BUCKETS来指定region的数量。

下面创建一个名为test的salted table，并要求其拥有10个pre-split region。

create table "test" ("pk" varchar not null primary key, "c" varchar) SALT_BUCKETS=10;

从HBase的web中可以看出，表test在创建后拥有了10个pre-split region，并且各个region之间的分隔key是\0x01 ~ \0x09。

此处输入图片的描述

对于使用SALT_BUCKETS创建的salted table，当使用Phoenix的接口向表中写入数据时，rowkey的前面会自动地被加上1个随机字节。由于只会加上1个字节，因此，buckets 的数量最多为256，并且，被创建的表的每个region的start key和end key都只是1个字节。

例：向表中插入数据

首先，向表中插入数据：

> create table "test" ("pk" varchar not null primary key, "c" varchar) SALT_BUCKETS=10;
> upsert into "test" values('EF123', 'hello');
> upsert into "test" values('EGC', 'phoenix');
> upsert into "test" values('AVE', 'hi');
> upsert into "test" values('XZCC', 'hbase');

然后，通Phoenix SQL来查询上面插入的数据:

> select * from "test";
+------------------------------------------+------------------------------------------+
|                    pk                    |                    c                     |
+------------------------------------------+------------------------------------------+
| AVE                                      | hi                                       |
| EF123                                    | hello                                    |
| EGC                                      | phoenix                                  |
| XZCC                                     | hbase                                    |
+------------------------------------------+------------------------------------------+

再通过HBase shell来查询这些数据：

hbase(main):010:0> scan 'test'
ROW                                   COLUMN+CELL                                                                                               
 \x00EF123                            column=0:_0, timestamp=1445160796559, value=                                                              
 \x00EF123                            column=0:c, timestamp=1445160796559, value=hello                                                          
 \x01AVE                              column=0:_0, timestamp=1445160809493, value=                                                              
 \x01AVE                              column=0:c, timestamp=1445160809493, value=hi                                                             
 \x03XZCC                             column=0:_0, timestamp=1445160816462, value=                                                              
 \x03XZCC                             column=0:c, timestamp=1445160816462, value=hbase                                                          
 \x08EGC                              column=0:_0, timestamp=1445160802829, value=                                                              
 \x08EGC                              column=0:c, timestamp=1445160802829, value=phoenix                                                        
4 row(s) in 0.3840 seconds

通过指定split points创建salted table

用关键字SPLIT ON来指定split points。

下面创建一个名为test的表，并且要求以'CS'、'ED'和'XYZ'来作为各个region之间的split point：

create table "test" ("pk" varchar not null primary key, "c" varchar) SPLIT ON ('CS', 'ED', 'XYZ');

从HBase的web可以看出，表test在创建后拥有了4个pre-split region，并且各个region之间就是按照预期来分隔的。

此处输入图片的描述

对于使用SPLIT ON创建的salted table，当使用Phoenix的接口向表中写入数据时，rowkey的前面不会自动地被加上1个随机字节。

例：向表中插入数据

首先，通过Phoenix插入数据：

> create table "test" ("pk" varchar not null primary key, "c" varchar) SPLIT ON ('CS', 'ED', 'XYZ');
> upsert into "test" values('EF123', 'hello');
> upsert into "test" values('EGC', 'phoenix');
> upsert into "test" values('AVE', 'hi');
> upsert into "test" values('XZCC', 'hbase');

然后，通Phoenix SQL来查询上面插入的数据：

> select * from "test";
+------------------------------------------+------------------------------------------+
|                    pk                    |                    c                     |
+------------------------------------------+------------------------------------------+
| AVE                                      | hi                                       |
| EF123                                    | hello                                    |
| EGC                                      | phoenix                                  |
| XZCC                                     | hbase                                    |
+------------------------------------------+------------------------------------------+

再通过HBase shell来查询这些数据：

hbase(main):009:0> scan 'test'
ROW                                   COLUMN+CELL                                                                                               
 AVE                                  column=0:_0, timestamp=1445160002033, value=                                                              
 AVE                                  column=0:c, timestamp=1445160002033, value=hi                                                             
 EF123                                column=0:_0, timestamp=1445159966190, value=                                                              
 EF123                                column=0:c, timestamp=1445159966190, value=hello                                                          
 EGC                                  column=0:_0, timestamp=1445159983411, value=                                                              
 EGC                                  column=0:c, timestamp=1445159983411, value=phoenix                                                        
 XZCC                                 column=0:_0, timestamp=1445160025898, value=                                                              
 XZCC                                 column=0:c, timestamp=1445160025898, value=hbase                                                          
4 row(s) in 0.0490 seconds

Table and View

Phoenix创建table

Phoenix的主键 & HBase的rowkey

Namespace

Family

Compression

为已有的HBase Table创建Phoenix View

Salted Table

通过指定buckets的数量来创建salted table

例：向表中插入数据

通过指定split points创建salted table

例：向表中插入数据

内容目录