Jump to content

Apache HBase

From Wikipedia, the free encyclopedia

Apache HBase
Original author(s)Powerset
Developer(s)Apache Software Foundation
Initial release28 March 2008;16 years ago(2008-03-28)
Stable release
2.4.x2.4.14 / 29 August 2022;21 months ago(2022-08-29)[1]
2.5.x2.5.3 / 5 February 2023;16 months ago(2023-02-05)[1]
Preview release
3.0.0-alpha-3 / 27 June 2022;23 months ago(2022-06-27)[1]
RepositoryGitHub Repository,Gitbox Repository
Written inJava
Operating systemCross-platform
TypeDistributed database
LicenseApache License 2.0
Websitehbase.apache.org

HBaseis anopen-sourcenon-relationaldistributed databasemodeled afterGoogle'sBigtableand written inJava.It is developed as part ofApache Software Foundation'sApache Hadoopproject and runs on top ofHDFS (Hadoop Distributed File System)orAlluxio,providing Bigtable-like capabilities for Hadoop. That is, it provides afault-tolerantway of storing large quantities ofsparsedata (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).

HBase features compression, in-memory operation, andBloom filterson a per-column basis as outlined in the original Bigtable paper.[2]Tables in HBase can serve as the input and output forMapReducejobs run in Hadoop, and may be accessed through the Java API but also throughREST,AvroorThriftgateway APIs. HBase is awide-column storeand has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency.

HBase is not a direct replacement for a classicSQLdatabase,howeverApache Phoenixproject provides a SQL layer for HBase as well asJDBCdriver that can be integrated with variousanalyticsandbusiness intelligenceapplications. TheApache Trafodionproject provides a SQL query engine withODBCandJDBCdrivers anddistributed ACID transaction protectionacross multiple statements, tables and rows that use HBase as a storage engine.

HBase is now serving several data-driven websites[3]butFacebook's Messaging Platform migrated from HBase toMyRocksin 2018.[4][5]Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.

In the parlance of Eric Brewer'sCAP Theorem,HBase is a CP type system.

History[edit]

Apache HBase began as a project by the companyPowersetout of a need to process massive amounts of data for the purposes ofnatural-language search.Since 2010 it is a top-level Apache project.

Facebookelected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018.[4]

The 2.4.x series is the current stable release line, it supersedes earlier release lines.

Use cases & production deployments[edit]

Enterprises that use HBase[edit]

The following is a list of notable enterprises that have used or are using HBase:

See also[edit]

References[edit]

  1. ^abc"Apache HBase – Apache HBase Downloads".Retrieved27 September2022.
  2. ^Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data
  3. ^"Apache HBase – Powered By Apache HBase".hbase.apache.org.Retrieved8 April2018.
  4. ^ab"Migrating Messenger storage to optimize performance".www.facebook.com.26 June 2018.Retrieved5 July2018.
  5. ^Facebook: Why our 'next-gen' comms ditched MySQLRetrieved: 17 December 2010
  6. ^HBaseCon (2 August 2016)."Apache HBase at Airbnb".slideshare.net.Retrieved8 April2018.
  7. ^"Near Real Time Search Indexing".4 January 2018.
  8. ^"Is data locality always out of the box in Hadoop?".10 March 2018.
  9. ^"Why Imgur Dropped MySQL in Favor of HBase - DZone Database".dzone.com.Retrieved8 April2018.
  10. ^"Tech Tuesday: Imgur Notifications: From MySQL to HBase - The Imgur Blog".blog.imgur.com.Retrieved8 April2018.
  11. ^Doyung Yoon."S2Graph: A Large-Scale Graph Database with HBase".
  12. ^Cheolsoo Park and Ashwin Shankar."Netflix: Integrating Spark at Petabyte Scale".
  13. ^Engineering, Pinterest (30 March 2018)."Improving HBase backup efficiency at Pinterest".Medium.Retrieved14 April2020.{{cite web}}:|first=has generic name (help)
  14. ^"Hbase at Salesforce.com".
  15. ^Josh Baer."How Apache Drives Spotify's Music Recommendations".
  16. ^"Tuenti Group Chat: Simple, yet complex".Archived fromthe originalon 24 November 2012.Retrieved29 September2015.
  17. ^"Tuenti Asyncthrift".GitHub.6 November 2013.

Bibliography[edit]

External links[edit]