Topic 1 Question 132
You work on a data science team at a bank and are creating an ML model to predict loan default risk. You have collected and cleaned hundreds of millions of records worth of training data in a BigQuery table, and you now want to develop and compare multiple models on this data using TensorFlow and Vertex AI. You want to minimize any bottlenecks during the data ingestion state while considering scalability. What should you do?
Use the BigQuery client library to load data into a dataframe, and use tf.data.Dataset.from_tensor_slices() to read it.
Export data to CSV files in Cloud Storage, and use tf.data.TextLineDataset() to read them.
Convert the data into TFRecords, and use tf.data.TFRecordDataset() to read them.
Use TensorFlow I/O’s BigQuery Reader to directly read the data.
ユーザの投票
コメント(5)
- 正解だと思う選択肢: D👍 6hiromi2022/12/22
- 正解だと思う選択肢: D
Vote on D. This will allow to directly access the data from BigQuery without having to first load it into a dataframe or export it to files in Cloud Storage.
👍 4mil_spyro2022/12/13 - 正解だと思う選択肢: D
D. Use TensorFlow I/O’s BigQuery Reader to directly read the data.
The reason for this choice is that using TensorFlow I/O’s BigQuery Reader is the most efficient and scalable option for reading data directly from BigQuery into TensorFlow models. It allows for distributed processing and avoids unnecessary data duplication, which can cause bottlenecks and consume large amounts of storage. Additionally, the BigQuery Reader is optimized for reading data in parallel from BigQuery tables and streaming them directly into TensorFlow. This eliminates the need for any intermediate file formats or data copies, reducing latency and increasing performance.
👍 2TNT872023/03/07
シャッフルモード