Big Data in the HP Cloud

Enterprises are drowning in information - too much data and no way to efficiently process it. HP Cloud provides an elastic cloud computing and cloud storage platform to analyze and index large data volumes hundreds of petabytes in size.  Distributed queries run across multiple data sets and are returned in near real time.

Big data characteristics

  • High volume, velocity, and variety of information assets
  • Requires cost-effective information processing due to sheer data volume
  • Massive in size, 100+ petabytes, with data volumes typically over 10 terabytes
  • Typically unstructured or semi-structured data
  • Data from both internal and external sources
  • Used to adapt traditional solutions to the big data demands such as machine data, social data, widely varied data, and unpredictable velocity
  • Collected through various sources (enterprise, social, web transactions, etc.)

HP Cloud provides the underlying infrastructure required to process big data. We partner with third party solution providers who enable enterprises to better configure, manage, manipulate, and analyze their data affordably.

Through non-relational database systems such as Hadoop and Cassandra, large volumes of unstructured data are analyzed and indexed into smaller, more manageable volumes.  These smaller volumes allow for more efficient collaboration and improve productivity and strategic decision making. Hadoop and Cassandra both provide high transaction rates and low latency lookups. Both allow map-reduce processing to run against the database when aggregation or parallel processing is required.

Big data using Cassandra

Apache Cassandra is an open source distributed NoSQL database that uses Google’s BigTable data model. 

Cassandra advantages

  • High volume, high-velocity real-time processing (e.g., for complex, read-write workloads)
  • Deploy across data multiple, geographically dispersed data centers resulting in high-level redundancy, failover, superior backup and recovery, and optimized performance and stability.
  • Industries:  Finance, Banking, Retail, Advertising, Fraud Detection, Risk Management

Big data using Hadoop

Hadoop is used for storing and processing huge data sets distributed across large clusters of commodity computers. It excels at more batch-oriented analytical solutions. The Hadoop Distributed Filesystem (HDFS) which can store massive distributed unstructured data sets. Data can be stored directly in HDFS, or it can be stored in a semi-structured format in HBase, which allows rapid record-level data access and is modeled after Google's BigTable system. Hadoop also uses MapReduce framework to distribute and restart applications.

Hadoop advantages

  • Data consolidation:  Moves complex and relational data into a single repository
  • Flexible:  Can handle structured and unstructured data, at high volume, for all types of analytic applications.
  • Scalable:  Proven technology at petabyte scale
  • Cost effective:  Proven affordability; data is stored inexpensively and maintained so that the raw data is readily available.
  • Open:  No lock-in, regardless of vendor choice
  • Process at the source:  Eliminates Extract Transform Load (ETL) bottlenecks by allowing users to mine data first, govern later.
  • Industries:  Travel, e-commerce, mobile, energy, IT security, health care, image processing, infrastructure management, finance