Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Google Cloud Certified Professional-Data-Engineer Questions and answers with Dumpstech

Exam Professional-Data-Engineer Premium Access

View all detail and faqs for the Professional-Data-Engineer exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 12 out of 12 pages

Viewing questions 111-120 out of questions

Questions # 111:

You are designing a pipeline that publishes application events to a Pub/Sub topic. You need to aggregate events across hourly intervals before loading the results to BigQuery for analysis. Your solution must be scalable so it can process and load large volumes of events to BigQuery. What should you do?

Options:

Create a streaming Dataflow job to continually read from the Pub/Sub topic and perform the necessary aggregations using tumbling windows

Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub-Sub topic and performing the necessary aggregations

Schedule a Cloud Function to run hourly, pulling all avertable messages from the Pub/Sub topic and performing the necessary aggregations

Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.

Questions # 112:

You orchestrate ETL pipelines by using Cloud Composer One of the tasks in the Apache Airflow directed acyclic graph (DAG) relies on a third-party service. You want to be notified when the task does not succeed. What should you do?

Options:

Configure a Cloud Monitoring alert on the sla_missed metric associated with the task at risk to trigger a notification.

Assign a function with notification logic to the sla_miss_callback parameter for the operator responsible for the task at risk.

Assign a function with notification logic to the on_retry_callback parameter for the operator responsible for the task at risk.

Assign a function with notification logic to the on_failure_callback parameter for the operator responsible for the task at risk.

Questions # 113:

You recently deployed several data processing jobs into your Cloud Composer 2 environment. You notice that some tasks are failing in Apache Airflow. On the monitoring dashboard, you see an increase in the total workers’ memory usage, and there were worker pod evictions. You need to resolve these errors. What should you do?

Choose 2 answers

Options:

Increase the directed acyclic graph (DAG) file parsing interval.

Increase the memory available to the Airflow workers.

Increase the maximum number of workers and reduce worker concurrency.

Increase the memory available to the Airflow triggerer.

Increase the Cloud Composer 2 environment size from medium to large.

Questions # 114:

You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. The jobs are expected to run for many minutes up to several hours. If the steps fail, they must be retried a fixed number of times. Which service should you use to manage the execution of these jobs?

Options:

Cloud Scheduler

Cloud Dataflow

Cloud Functions

Cloud Composer

Questions # 115:

You want to create a machine learning model using BigQuery ML and create an endpoint foe hosting the model using Vertex Al. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

Options:

Create a new BigOuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the "ingestion' dataset as the training data.

Use BigQuery streaming inserts to land the data from multiple vendors whore your BigQuery dataset ML model is deployed.

Create a Pub'Sub topic and send all vendor data to it Connect a Cloud Function to the topic to process the data and store it in BigQuery.

Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

Questions # 116:

You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?

Options:

Create a table in BigQuery, and append the new samples for CPU and memory to the table

Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second

Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second

Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.

Questions # 117:

You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.

How should you securely run this workload?

Options:

Restrict the Google Cloud Storage bucket so only you can see the files

Grant the Project Owner role to a service account, and run the job with it

Use a service account with the ability to read the batch files and to write to BigQuery

Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery

Questions # 118:

A TensorFlow machine learning model on Compute Engine virtual machines (n2-standard -32) takes two days to complete framing. The model has custom TensorFlow operations that must run partially on a CPU You want to reduce the training time in a cost-effective manner. What should you do?

Options:

Change the VM type to n2-highmem-32

Change the VM type to e2 standard-32

Train the model using a VM with a GPU hardware accelerator

Train the model using a VM with a TPU hardware accelerator

Questions # 119:

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Options:

Subsample your test dataset.

Subsample your training dataset.

Increase the number of input features to your model.

Increase the number of layers in your neural network.

Questions # 120:

As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? Choose 2 answers.

Options:

Use Cloud Deployment Manager to automate access provision.

Introduce resource hierarchy to leverage access control policy inheritance.

Create distinct groups for various teams, and specify groups in Cloud IAM policies.

Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.

For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.

Viewing page 12 out of 12 pages

Viewing questions 111-120 out of questions