@xtccc 2017-05-20T15:49:41.000000Z 字数 5541 阅读 6590

CQL

给我写信
 GitHub

此处输入图片的描述

`Cassandra`

参考链接：

CQL
- Cassandra
1. 关于CQL
2. 启动cqlsh
3. Keyspace
4. Table
5. Indexing
- 5.1 何时使用索引
- 5.2 使用索引
6. WHERE clause

1. 关于CQL

CQL（Cassandra Query Language）类似于SQL，可以对Cassandra中的数据发起条件查询，是与Cassandra进行交互的最主要的接口。

它与SQL的主要区别在于：CQL不支持join和subqueries，除了通过HIVE进行批量分析。

使用命令行工具： cqlsh
使用可视化工具： DataStax DevCenter
各种编程语言的Driver： Open-source Drivers
set_cql_version: 基于Thrift的访问方法

2. 启动cqlsh

如果在安装了Cassandra服务的节点上：

bin/cqlsh

如果Cassandra服务安装在其他节点上，还需要指定IP与Port：

bin/cqlsh {host} {port}

怎样允许监听多个网卡？
默认情况下，cassandra只在localhost这个网卡上监听来自客户端的连接。
如果希望它能监听来自本机所有网卡上的客户端连接，则需要需改cassandra.yaml的两个配置：

rpc_address: 0.0.0.0
broadcast_rpc_address: 127.0.0.1

其中，broadcast_rpc_address改成什么值我不确定，但是经过测试按照上述配置是可以的。

我的需求是：
宿主机是Mac，然后虚拟机(VMWare）里面安装了Windows。我需要从Windows里面访问安装在Mac上的Cassandra。
这时，在虚拟机里怎样获得Mac的ip呢？用ipconfig查看windows的ip，然后将其IP的最后一位换成1或者2即可。
例如： 192.168.166.60 -> 192.168.166.1

3. Keyspace

什么是KeySpace

相当于MySQL中的database。Keyspace是一个命名空间，它定义了其中的数据将怎样冗余地分布在各节点上。每个keyspace可以定义自己的replication factor。

创建keyspace

参考 CREATE KEYSPACE

在创建新的keyspace时，要为它指定strategy class。如果为了评估Cassandra，可以使用SimpleStrategy —— SimpleStrategy只能用于单数据中心的情况。如果是生产环境，则应该使用多数据中心（NetworkTopologyStrategy）。

如果在单节点的情况下使用NetworkTopologyStrategy，则需要指定一个默认的data center name。查询缺省的datacenter name可以用命令 bin/nodetool status ：

QQ20160111-2@2x.png-118.4kB

如果在生产环境中使用NetworkTopologyStrategy，则需要将默认的Snitch（SimpleSnitch）改为一个network-aware snitch，并在snitch属性文件中定义若干个data center names，然后用这些data center names来定义keyspace。否则的话，Cassandra将无法完成任何write request，会输出错误： Unable to complete request: one or more nodes were unavailable.

开始创建

-- 在single data center中使用“NetworkTopologyStrategy”
create keyspace if not exists cycling with replication = {
    'class' : 'NetworkTopologyStrategy',
    'datacenter1' : 3
};
-- 如果使用“SimpleStrategy”
create keyspace if not exists cycling with replication = {
    'class' : 'SimpleStrategy',
    'replication_factor' : 1
};

注意：最好将system_auth的副本因子数设置得与每个data center中的节点数量相同。因为，system_auth的副本因子数默认为1，如果相关数据丢失，我们就无法再登录进集群了。

最后，使用刚才创建的keyspace：

use cycling;

修改Keyspace

首先执行如下的CQL命令：

alter keyspace system_auth with replication = {
    'class' : 'NetworkTopologyStrategy',
    'dc1'  : 3, 'dc2' : 2
} ;

或者

alter keyspace "Excuse" with replication = {
    'class' : 'SimpleStrategy',
    'replication_factor' : 3
} ;

然后，在每一个受到影响的节点上，运行命令 nodetool repair；当一个节点上的该命令完成后，到下一个节点上继续执行该命令。

列出keyspace中的全部tables

use [key_space_name];
describe tables;

删除Keyspace

drop keyspace [key_space_name];

查询全部的Keyspaces

describe keyspaces;

查询某个keyspace

describe keyspace [key_space_name];

4. Table

4.1 CRUD

在创建表时，需要定义primary key、columns以及table properties。通过使用WITH子句，可以配置表的属性（caching、compaction等）。

创建表

use xt_space;
create table people (
  id uuid primary key, -- 主键
  name text,
  age int,
  home text ) ;
create table cars (
  brand text,
  series text,
  volume double,
  price double, 
  primary key (brand, series) ); -- 复合主键

插入数据

insert into people (id, name, age, home) 
values (62c36092-82a1-3a00-93d1-46196ee77204, 'Jack', 28, 'New York');
insert into people (id, name, age, home)
values (7db1a490-5878-11e2-bcfd-0800200c9a66, 'Tom', 26, 'Silicon Valley');

The UUID is handy for sequencing the data or automatically incrementing synchronization across multiple machines.

查询数据

> select * from people;
 id                                   | age | home           | name
--------------------------------------+-----+----------------+------
 7db1a490-5878-11e2-bcfd-0800200c9a66 |  26 | Silicon Valley |  Tom
 62c36092-82a1-3a00-93d1-46196ee77204 |  28 |       New York | Jack
(2 rows)
> select age, home from people where id = 7db1a490-5878-11e2-bcfd-0800200c9a66 ;
 age | home
-----+----------------
  26 | Silicon Valley
(1 rows)

此时，还不能将age或者name作为过滤条件进行查询，因为这个Query会要求进行sequential scan，而且这两个字段既不是partition key也不是clustering column：

QQ20160314-1@2x.png-90.6kB

解决方法：为这两个字段创建索引，同时在CQL语句中指定“ALLOW FILTERING”，然后再查询。

> select age, home from people where age > 25 and age < 28 allow filtering ;
 age | home
-----+----------------
  26 | Silicon Valley
(1 rows)

参考 :
Range query on secondary index in cassandra
A deep look at the CQL WHERE clause

更新数据

update cars set price = 34.8 
    where brand = 'Audi' and series = 'A6' ;

删除数据

1 . 删除某个表的全部数据

truncate [table];

2 . 删除表中的某些数据

delete from [table] where ··· ;

改变table的schema

1 . 增加新的column

add table [table_name] add [column_name] [column_type];

2 . 删除已有的column

4.2 Data Types

可以为Column定义类型，目前Cassandra支持的内置column type包括：uuid, ascii, bigint, blob, boolean, date, decimal, text, time, set, list, map ···

容器类型
Cassandra支持的 Collection Types：set, list, map

基本类型
uuid, ascii, bigint, blob, boolean, date, decimal, text, time, tuple

自定义的类型
参考 Using a user-defined type

系统表
参考 Querying a System Table

4.3 Loading Data

加载CSV数据
可以用CQLSH的 COPY FROM 命令实现从CSV文件导入数据，参考 CQLSH Command : COPY FROM

例：

$ use xt_space ;
$ copy persons (name, age, home) 
  from '/disk1/users/tao/data/persons.txt' 
  with delimiter=',' and header=true ;

COPY FROM只适合从CSV文件导入少量记录（几百万，或者更少）到Cassandra中，如果需要导入大量的数据，请使用 bulk loader。

Bulk Loading
参考 Cassandra bulk loader (sstableloader)

4.4 Batch Operations

批量操作数据
参考 Batch Statement Reference

5. Indexing

可以为column values建立索引，索引数据存在另一张隐藏的表中。

hot rebuild of index
使用命令： nodetool rebuild_index

参考

5.1 何时使用索引

适合索引的场景
column中的unique values数量越多，索引带来的overhead就越大。例如：对于playlists表，包含artist列和song列，如果一个artist对应很多不同的songs，那么就很适合为artist建立索引。

不适合索引的场景

5.2 使用索引

为column创建索引

create index [index_name] on [table]([column]) ;

索引表的名字(index_name)可以省略，Cassandra会自动指定一个名称。但是如果指定的话，它在keyspace中必须是唯一的。

单列索引
建立好了之后，就可以直接针对这个column进行查询了。

create index volumn_idx on cars(volumn) ;
select * from cars where volumn = 3.0 ;

多列索引

create index on cars(volumn);
create index on cars(price);
select * from cars 
    where volumn = 3.0 and price = 48.8
    allow filtering ;

6. WHERE clause

参考 A deep look at the CQL WHERE clause

由于Partition Key、Clustering Columns以及Normal Columns这三类columns扮演的角色不同，因此它们在where clause中受到的限制也不同。另外，针对SELECT、UPDATE和DELETE这三种不同的Queries，这些限制也有所不同。

6.1 SELECT Statements

Partition Key Restrictions
Partition key只支持两种操作： = 和 IN

CQL

`Cassandra`

1. 关于CQL

2. 启动cqlsh

3. Keyspace

4. Table

4.1 CRUD

4.2 Data Types

4.3 Loading Data

4.4 Batch Operations

5. Indexing

5.1 何时使用索引

5.2 使用索引

6. WHERE clause

6.1 SELECT Statements

6.2 UPDATE Statements

6.3 DELETE Statements

CQL

Cassandra

1. 关于CQL

2. 启动cqlsh

3. Keyspace

4. Table

4.1 CRUD

4.2 Data Types

4.3 Loading Data

4.4 Batch Operations

5. Indexing

5.1 何时使用索引

5.2 使用索引

6. WHERE clause

6.1 SELECT Statements

6.2 UPDATE Statements

6.3 DELETE Statements

内容目录

`Cassandra`