HPCC
Developer(s) | HPCC Systems,LexisNexis Risk Solutions |
---|---|
Initial release | 15-06-2011 |
Stable release | 7.4.18-1
/ 13-09-2019 |
Repository | https://github /hpcc-systems |
Written in | C++,ECL |
Operating system | Linux |
License | Apache License2.0 |
Website | hpccsystems |
HPCC(High-Performance Computing Cluster), also known asDAS(Data Analytics Supercomputer), is an open source,data-intensive computingsystem platform developed byLexisNexis Risk Solutions.The HPCC platform incorporates asoftware architectureimplemented oncommodity computing clustersto provide high-performance, data-parallel processing for applications utilizingbig data.[1]The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie).[2]The HPCC platform also includes a data-centric declarative programming language for parallel data processing calledECL.[3]
The public release of HPCC wasannouncedin 2011, after ten years of in-house development (according to LexisNexis). It is an alternative toHadoop[4]and otherBig dataplatforms.[5]
System architecture
[edit]The HPCC system architecture includes two distinct cluster processing environmentsThorandRoxie,each of which can be optimized independently for its parallel data processing purpose.
The first of these platforms is calledThor,adata refinerywhose overall purpose is the general processing of massive volumes of raw data of any type for any purpose but typically used for data cleansing and hygiene, ETL (extract, transform, load) processing of the raw data, record linking and entity resolution, large-scale ad-hoc complex analytics, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications. The data refinery nameThoris a reference to the mythical Norse god of thunder with the large hammer symbolic of crushing large amounts of raw data into useful information. A Thor cluster is similar in its function, execution environment, filesystem, and capabilities to the Google andHadoopMapReduceplatforms.
Figure 2 shows a representation of a physical Thor processing cluster which functions as a batch job execution engine for scalable data-intensive computing applications. In addition to the Thor master and slave nodes, additional auxiliary and common components are needed to implement a complete HPCC processing environment.
The second of the parallel data processing platforms is calledRoxieand functions as arapid data delivery engine.This platform is designed as an online high-performance structured query and analysis platform or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times. Roxie utilizes adistributed indexed filesystemto provide parallel processing of queries using an optimized execution environment and filesystem for high-performance online processing. A Roxie cluster is similar in its function and capabilities toElasticSearchand Hadoop withHBaseandHivecapabilities added, and provides for near real time predictable query latencies. Both Thor and Roxie clusters utilize the ECL programming language for implementing applications, increasing continuity and programmer productivity.
Figure 3 shows a representation of a physical Roxie processing cluster which functions as an online query execution engine for high-performance query and data warehousing applications. A Roxie cluster includes multiple nodes with server and worker processes for processing queries; an additional auxiliary component called an ESP server which provides interfaces for external client access to the cluster; and additional common components which are shared with a Thor cluster in an HPCC environment. Although a Thor processing cluster can be implemented and used without a Roxie cluster, an HPCC environment which includes a Roxie cluster should also include a Thor cluster. The Thor cluster is used to build the distributed index files used by the Roxie cluster and to develop online queries which will be deployed with the index files to the Roxie cluster.
Software architecture
[edit]The HPCC software architecture incorporates the Thor and Roxie clusters as well as commonmiddlewarecomponents, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources. Usually a HPCC environment includes only Thor clusters, or both Thor and Roxie clusters, although Roxie occasionally is used to build its own indexes. The overall HPCC software architecture is shown in Figure 4.
HPCC Systems
[edit]HPCC Systems(High Performance Computing Cluster) is part ofLexisNexis Risk Solutionsand was formed to promote and sell the HPCC software. In June 2011, it announced the offering of the software under an open source dual license model.[6][7][8][9]
HPCC Systems offers both a Community Edition and an Enterprise Edition. The Community Edition is free to download, includes the source code and is released under theApache License2.0. The Enterprise Edition is available under a paid commercial license and includes training, support, indemnification and additional modules. In November 2011, HPCC Systems announced the availability of its Thor Data Refinery Cluster onAmazon Web Services.[10] In January 2012, HPCC Systems announced distributedmachine learningalgorithms.[11]
See also
[edit]- Apache Hadoop
- Apache Spark
- Aster Data Systems
- ECL (data-centric programming language)
- ElasticSearch
- Sector/Sphere
- Machine learning
- MapReduce
References
[edit]- ^Handbook of Cloud Computing,"Data-Intensive Technologies for Cloud Computing," by A.M. Middleton. Handbook of Cloud Computing. Springer, 2010.
- ^"HPCC Systems: Introduction to HPCC (High-Performance Computing Cluster)". 24 May 2011.CiteSeerX10.1.1.456.3571.
- ^Handbook of Data Intensive Computing,"ECL/HPCC: A Unified Approach to Big Data," by A.M. Middleton. Handbook of Data Intensive Computing. Springer, 2011.
- ^"LexisNexis Will Open-Source Its Hadoop Alternative for Handling Big Data".ReadWrite.15 June 2011.Retrieved20 November2014.
- ^"9 Useful Open Source Big Data Tools".EnterpriseAppsToday.11 Nov 2015.Retrieved18 November2015.
- ^"LexisNexis open-sources its Hadoop killer".GigaOM.15 June 2011.Retrieved8 November2014.
- ^"LexisNexis Will Open-Source Its Hadoop Alternative for Handling Big Data".ReadWrite.15 June 2011.Retrieved20 November2014.
- ^"HPCC A New/Old Kid In Town To Take On Hadoop".NetworkWorld.16 June 2011.Retrieved2 December2014.
- ^"LexisNexis Joins Linux Foundation".The Linux Foundation.17 June 2011.Retrieved29 November2014.
- ^"HPCC Announces Availability of ETL Cluster On Amazon Web Services".Cloud Computing Today.17 December 2012.Retrieved30 November2014.
- ^"HPCC Systems Intros Machine Learning Beta".Datanami.31 January 2012.Retrieved29 November2014.
External links
[edit]- Sandia sees data management challenges spiral
- Sandia National Laboratories Leverages the Data Analytics Supercomputer (DAS) by LexisNexis Risk & Information Analytics Group, Which Offers Breakthrough High Performance Computing to Address Data Management and Analysis Challenges
- Programming models for the LexisNexis High Performance Computing Cluster
- LexisNexis Data Analytics Supercomputer
- LexisNexis HPCC Systems
- Reference to the term BORPS (Billions of Records Per Second)
- LexisNexis Brings Its Data Management Magic To Bear on Scientific Data
- High Performance Computing Clusters (HPCC) and Big Data Analytics Certificate - Stand-Alone
- FAU Receives National Science Foundation Rapid Response Grant to Develop Innovative Computer Model for Ebola Spread
- CPL Online delivers added value for clients through its Big Data Platform
- HPCC Systems