[关闭]
@xtccc 2015-10-18T18:22:04.000000Z 字数 2210 阅读 2185

Introduction to Apache Phoenix

给我写信
GitHub

此处输入图片的描述


Phoenix


参考连接:


What is Apache Phoenix


Apache Phoenix is a SQL skin over Aapche HBase and has joined Cloudera Labs, its design and implementation are heavily customized to leverage HBase features including coprocessors and skip scans.



How does Phoenix work


Internally, Phoenix takes your SQL query, compiles it into a series of native HBase API calls, and pushes as much work as possible onto the cluster for parallel execution. It automatically creates a metadata repository that provides typed access to data stored in HBase tables. Phoenix's direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.



多租户(Multi-tenancy)


Phoenix provides multi-tenancy via a combination of multi-tenant tables and tenant-specific connections. With tenant-specific connections, tenants can only access data that belongs to them, and with multi-tenant tables, they can only see their own data in those tables and all data in regular tables.



Limitations


  1. Phoenix doesn't support cross-row transactions yet.

  2. Its query optimizer and join mechanisms are less sophisticated than most COTS DBMSs.

  3. As secondary indexes are implemented using a separate index table, they can get out of sync with the primary table (although perhaps only for very short periods.) These indexes are therefore not fully-ACID compliant.

  4. Multi-tenancy is constrained—internally, Phoenix uses a single HBase table.



Comparisons to Hive and Impala


  1. The main goal of Phoenix is to provide a high-performance relational database layer over HBase for low-latency applications. Impala's primary focus is to enable interactive exploration of large data sets by providing high-performance, low-latency SQL queries on data stored in popular Hadoop file formats. Hive is mainly concerned with providing data warehouse infrastructure, especially for long-running batch-oriented tasks.

  2. Phoenix is a good choice, for example, in CRUD applications where you need the scalability of HBase along with the facility of SQL access. In contrast, Impala is a better option for strictly analytic workloads and Hive is well suited for batch-oriented tasks like ETL.

  3. Phoenix is comparatively lightweight since it doesn't need an additional server.

  4. Phoenix supports advanced functionality like multiple secondary-index implementations optimized for different workloads, flashback queries, and so on. Neither Impala nor Hive have any provision for supporting secondary index lookups yet.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注