Big Data era is greatly challenging people's data storing and processing capacity. And there are so many different businesses in Big Data field which need different platforms. Despite all the platforms are based on distributed environments, are no one platform can be applied to all the businesses. In order to maximize the use of resources, we need to develop different platforms for different application scenarios. This thesis presents the Massive Structured Data Fast Query System which is designed to address faster retrieval of Big Data with more query patterns. The current popular KV database could only be retrieved by the primary key, and the current popular data warehouse platform is only good at the data analysis, statistics, and data mining which are all based on brute-force scan of the whole table. This thesis focuses compound queries and provides extended functionality enables the system being easily integrated with popular data warehouse systems which could to data analysis, statistics, and data mining. The specific related work are: (1) Data’s organization: Every table is partitioned into many tablelets, which could be easily stored on different data nodes. The reliability of data is ensured by replications. When a table is queried, the system would broadcast the query to all tablelets of the table. (2) Compound query patterns for a tablelet: The system supports SQL-like (Structured Query Language, Structured Query Language) language as query interface. It first analyzes the query pattern and generates a syntax tree, and then it transforms the syntax tree to the final retrieval execution tree. (3) Multi-master node for the cluster system: The system supports multiple master nodes with an election mechanisms. Once the main master crashes, the election mechanism can ensure that another master could quickly take over the cluster and become the new main master. This could minimizes the impact on the business (4) Provides expanded functionality for MapReduce (distributed computing platform of the Hadoop ecosystem) and Hive (the data warehouse platform of the Hadoop ecosystem): This could make the system support for data analysis, statistics and data mining. The tests shows that the Massive Structured Data Fast Query System can quickly respond to compound query patterns, and could also finish computing tasks through the MapReduce platform or Hive platform.
修改评论