Topic 1 Question 25
Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach. What should you do?
Use Cloud Scheduler to schedule the jobs to run.
Use Cloud Tasks to schedule and run the jobs asynchronously.
Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.
Create directed acyclic graphs (DAGs) in Apache Airflow deployed on Google Kubernetes Engine. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.
ユーザの投票
コメント(1)
- 正解だと思う選択肢: C
The best fully managed solution for scheduling and automating complex data pipelines is C. Use Cloud Composer with DAGs and appropriate operators. Cloud Composer, being a fully managed Apache Airflow service, is specifically designed for orchestrating complex workflows with dependencies and offers built-in operators to connect to Cloud Storage, Spark (via Dataproc), and BigQuery. Option D (Airflow on GKE) is not fully managed and adds operational overhead. Options A (Cloud Scheduler) and B (Cloud Tasks) are not designed for complex workflow orchestration and dependency management. Therefore, Option C is the optimal choice for a fully managed, robust, and feature-rich solution for data pipeline orchestration.
👍 1n21837128472025/02/27
シャッフルモード