@xtccc
2015-10-18T18:22:04.000000Z
字数 2210
阅读 2185
Phoenix
参考连接:
Apache Phoenix is a SQL skin over Aapche HBase and has joined Cloudera Labs, its design and implementation are heavily customized to leverage HBase features including coprocessors and skip scans.
Internally, Phoenix takes your SQL query, compiles it into a series of native HBase API calls, and pushes as much work as possible onto the cluster for parallel execution. It automatically creates a metadata repository that provides typed access to data stored in HBase tables. Phoenix's direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.
Phoenix provides multi-tenancy via a combination of multi-tenant tables and tenant-specific connections. With tenant-specific connections, tenants can only access data that belongs to them, and with multi-tenant tables, they can only see their own data in those tables and all data in regular tables.
Phoenix doesn't support cross-row transactions yet.
Its query optimizer and join mechanisms are less sophisticated than most COTS DBMSs.
As secondary indexes are implemented using a separate index table, they can get out of sync with the primary table (although perhaps only for very short periods.) These indexes are therefore not fully-ACID compliant.
Multi-tenancy is constrained—internally, Phoenix uses a single HBase table.
The main goal of Phoenix is to provide a high-performance relational database layer over HBase for low-latency applications. Impala's primary focus is to enable interactive exploration of large data sets by providing high-performance, low-latency SQL queries on data stored in popular Hadoop file formats. Hive is mainly concerned with providing data warehouse infrastructure, especially for long-running batch-oriented tasks.
Phoenix is a good choice, for example, in CRUD applications where you need the scalability of HBase along with the facility of SQL access. In contrast, Impala is a better option for strictly analytic workloads and Hive is well suited for batch-oriented tasks like ETL.
Phoenix is comparatively lightweight since it doesn't need an additional server.
Phoenix supports advanced functionality like multiple secondary-index implementations optimized for different workloads, flashback queries, and so on. Neither Impala nor Hive have any provision for supporting secondary index lookups yet.