Topic 1 Question 50

Professional Data Engineer

Topic 1 Question 50
You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required. You need to analyze the data by querying against individual fields. Which three databases meet your requirements?

3 つ選択
- Redis
- HBase
- MySQL
- MongoDB
- Cassandra
- HDFS with Hive
ユーザの投票
コメント(17)
- BDE. Hive is not for NoSQL
  
  👍 36
  jvg6372020/03/15
- Answer is BDE - A. Redis - Redis is an in-memory non-relational key-value store. Redis is a great choice for implementing a highly available in-memory cache to decrease data access latency, increase throughput, and ease the load off your relational or NoSQL database and application. Since the question does not ask cache, A is discarded. B. HBase - Meets reqs C. MySQL - they do not need ACID, so not needed. D. MongoDB - Meets reqs E. Cassandra - Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. F. HDFS with Hive - Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data. HIVE IS NOT A DATABSE.
  
  👍 31
  awssp123452021/07/04
- 正解だと思う選択肢: BDE
  BDE: A. Redis is a key-value store (and in many cases used as in-memory and non persistent cache). It is not designed for "100TB per year" of highly available storage. B. HBase is similar to Google Bigtable, fits the requirements perfectly: highly available, scalable and with very low latency. C. MySQL is a relational DB, designed precisely for ACID transactions and not for the stated requirements. Also, growth may be an issue. D. MongoDB is a document-db used for high volume data and maintains currently used data in RAM, so performance is usually really good. Should also fit the requirements well. E. Cassandra is designed precisely for highly available massive datasets, and a fine tuned cluster may offer low latency in reads. Fits the requirements. F. HDFS with Hive is great for OLAP and data-warehouse scenarios, allowing to solve map-reduce problems using an SQL subset, but the latency is usually really high (we may talk about seconds, not milliseconds, when obtaining results), so this does not complies with the requirements.
  
  👍 12
  hendrixlives2021/12/17
シャッフルモード

ユーザの投票

コメント(17)