- The rise of Big Data applications where data collection has grown
tremendously and is beyond the ability of commonly used software tools to capture, manage, and process within a “tolerable elapsed time.” The most fundamental challenge for Big Data applications is to explore the large volumes of data and extract useful information or knowledge for future actions. In many situations, the knowledge extraction process has to be very efficient and close to real time because storing all observed data is nearly infeasible.
- The unprecedented data volumes require an effective data analysis and prediction platform to achieve fast response and real-time classification for such Big Data.
- The challenges at Tier I focus on data accessing and arithmetic computing procedures. Because Big Data are often stored at different locations and data volumes may continuously grow, an effective computing platform will have to take distributed large-scale data storage into consideration for computing.
- The challenges at Tier II center around semantics and domain knowledge for different Big Data applications. Such information can provide additional benefits to the mining process, as well as add technical barriers to the Big Data access (Tier I) and mining algorithms (Tier III).
- At Tier III, the
data mining challenges concentrate on algorithm designs in tackling the
difficulties raised by the Big Data volumes, distributed data
distributions, and by complex and dynamic data characteristics.
- We propose a HACE theorem to model Big Data characteristics. The characteristics of HACH make it an extreme challenge for discovering useful knowledge from the Big Data.
- The HACE theorem suggests that the key characteristics of the Big Data are 1) huge with heterogeneous and diverse data sources, 2) autonomous with distributed and decentralized control, and 3) complex and evolving in data and knowledge associations.
- To support Big Data mining, high-performance computing platforms are required, which impose systematic designs to unleash the full power of the Big Data.