Apache HBase
Original author(s) | Powerset |
---|---|
Developer(s) | Apache Software Foundation |
Initial release | 28 March 2008 |
Stable release | |
Preview release | 3.0.0-alpha-3
/ 27 June 2022[1] |
Repository | GitHub Repository,Gitbox Repository |
Written in | Java |
Operating system | Cross-platform |
Type | Distributed database |
License | Apache License 2.0 |
Website | hbase |
HBaseis anopen-sourcenon-relationaldistributed databasemodeled afterGoogle'sBigtableand written inJava.It is developed as part ofApache Software Foundation'sApache Hadoopproject and runs on top ofHDFS (Hadoop Distributed File System)orAlluxio,providing Bigtable-like capabilities for Hadoop. That is, it provides afault-tolerantway of storing large quantities ofsparsedata (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).
HBase features compression, in-memory operation, andBloom filterson a per-column basis as outlined in the original Bigtable paper.[2]Tables in HBase can serve as the input and output forMapReducejobs run in Hadoop, and may be accessed through the Java API but also throughREST,AvroorThriftgateway APIs. HBase is awide-column storeand has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency.
HBase is not a direct replacement for a classicSQLdatabase,howeverApache Phoenixproject provides a SQL layer for HBase as well asJDBCdriver that can be integrated with variousanalyticsandbusiness intelligenceapplications. TheApache Trafodionproject provides a SQL query engine withODBCandJDBCdrivers anddistributed ACID transaction protectionacross multiple statements, tables and rows that use HBase as a storage engine.
HBase is now serving several data-driven websites[3]butFacebook's Messaging Platform migrated from HBase toMyRocksin 2018.[4][5]Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.
In the parlance of Eric Brewer'sCAP Theorem,HBase is a CP type system.
History
[edit]Apache HBase began as a project by the companyPowersetout of a need to process massive amounts of data for the purposes ofnatural-language search.Since 2010 it is a top-level Apache project.
Facebookelected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018.[4]
The 2.4.x series is the current stable release line, it supersedes earlier release lines.
Use cases & production deployments
[edit]Enterprises that use HBase
[edit]The following is a list of notable enterprises that have used or are using HBase:
- 23andMe
- Adobe
- Airbnbuses HBase as part of its AirStream realtime stream computation framework[6]
- Alibaba Group
- Amadeus IT Group,as its main long-term storage DB.
- Bloomberg,for time series data storage
- Facebookused HBase for its messaging platform between 2010 and 2018
- Flipkartuses HBase for its search index[7]and user insights.[8]
- Flurry
- HubSpot
- Imguruses HBase to power its notifications system[9][10]
- Kakao[11]
- Netflix[12]
- Pinterest[13]
- Quicken Loans
- Rocket Fuel
- Salesforce.com[14]
- Sears
- Sophos,for some of their back-end systems.
- Spotifyuses HBase as base for Hadoop and machine learning jobs.[15]
- Tuentiuses HBase for its messaging platform.[16][17]
- Xiaomi
- Yahoo!
See also
[edit]- NoSQL
- Wide column store
- Bigtable
- Apache Cassandra
- Oracle NOSQL
- Hypertable
- Apache Accumulo
- MongoDB
- Project Voldemort
- Riak
- Sqoop
- Elasticsearch
- Apache Phoenix
References
[edit]- ^abc"Apache HBase – Apache HBase Downloads".Retrieved27 September2022.
- ^Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data
- ^"Apache HBase – Powered By Apache HBase".hbase.apache.org.Retrieved8 April2018.
- ^ab"Migrating Messenger storage to optimize performance".www.facebook.com.26 June 2018.Retrieved5 July2018.
- ^Facebook: Why our 'next-gen' comms ditched MySQLRetrieved: 17 December 2010
- ^HBaseCon (2 August 2016)."Apache HBase at Airbnb".slideshare.net.Retrieved8 April2018.
- ^"Near Real Time Search Indexing".4 January 2018.
- ^"Is data locality always out of the box in Hadoop?".10 March 2018.
- ^"Why Imgur Dropped MySQL in Favor of HBase - DZone Database".dzone.com.Retrieved8 April2018.
- ^"Tech Tuesday: Imgur Notifications: From MySQL to HBase - The Imgur Blog".blog.imgur.com.Retrieved8 April2018.
- ^Doyung Yoon."S2Graph: A Large-Scale Graph Database with HBase".
- ^Cheolsoo Park and Ashwin Shankar."Netflix: Integrating Spark at Petabyte Scale".
- ^Engineering, Pinterest (30 March 2018)."Improving HBase backup efficiency at Pinterest".Medium.Retrieved14 April2020.
{{cite web}}
:|first=
has generic name (help) - ^"Hbase at Salesforce.com".
- ^Josh Baer."How Apache Drives Spotify's Music Recommendations".
- ^"Tuenti Group Chat: Simple, yet complex".Archived fromthe originalon 24 November 2012.Retrieved29 September2015.
- ^"Tuenti Asyncthrift".GitHub.6 November 2013.
Bibliography
[edit]- Dimiduk, Nick; Khurana, Amandeep (28 November 2012).HBase in Action(1st ed.).Manning Publications.p. 350.ISBN978-1617290527.
- George, Lars (20 September 2011).HBase: The Definitive Guide(1st ed.).O'Reilly Media.p. 556.ISBN978-1449396107.
- Jiang, Yifeng (16 August 2012).HBase Administration Cookbook(1st ed.).Packt Publishing.p. 332.ISBN978-1849517140.