Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Google Cloud Certified Professional-Data-Engineer Questions and answers with Dumpstech

Exam Professional-Data-Engineer Premium Access

View all detail and faqs for the Professional-Data-Engineer exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 5 out of 12 pages

Viewing questions 41-50 out of questions

Questions # 41:

You have created an external table for Apache Hive partitioned data that resides in a Cloud Storage bucket, which contains a large number of files. You notice that queries against this table are slow. You want to improve the performance of these queries What should you do?

Options:

Migrate the Hive partitioned data objects to a multi-region Cloud Storage bucket.

Create an individual external table for each Hive partition by using a common table name prefix Use wildcard table queries to reference the partitioned data.

Change the storage class of the Hive partitioned data objects from Coldline to Standard.

Upgrade the external table to a BigLake table Enable metadata caching for the table.

Questions # 42:

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You’ve loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.

What should you do?

Options:

Select random samples from the tables using the RAND() function and compare the samples.

Select random samples from the tables using the HASH() function and compare the samples.

Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.

Create stratified random samples using the OVER() function and compare equivalent samples from each table.

Questions # 43:

You work for an advertising company, and you’ve developed a Spark ML model to predict click-through rates at advertisement blocks. You’ve been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

Options:

Use Cloud ML Engine for training existing Spark ML models

Rewrite your models on TensorFlow, and start using Cloud ML Engine

Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery

Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery

Questions # 44:

You recently deployed several data processing jobs into your Cloud Composer 2 environment. You notice that some tasks are failing in Apache Airflow. On the monitoring dashboard, you see an increase in the total workers’ memory usage, and there were worker pod evictions. You need to resolve these errors. What should you do?

Choose 2 answers

Options:

Increase the directed acyclic graph (DAG) file parsing interval.

Increase the memory available to the Airflow workers.

Increase the maximum number of workers and reduce worker concurrency.

Increase the memory available to the Airflow triggerer.

Increase the Cloud Composer 2 environment size from medium to large.

Questions # 45:

You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?

Options:

Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.

Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.

Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert.

Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below 4000, send an alert.

Questions # 46:

Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table namedevents_partitioned. To reduce the cost of queries, your organization created a view calledevents, which queries only the last 14 days of data. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read theeventsdata via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)

Options:

Create a new view over events using standard SQL

Create a new partitioned table using a standard SQL query

Create a new view over events_partitioned using standard SQL

Create a service account for the ODBC connection to use for authentication

Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared “events”

Questions # 47:

You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do?

Options:

Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

Deploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://

Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://

Questions # 48:

You are administering a BigQuery on-demand environment. Your business intelligence tool is submitting hundreds of queries each day that aggregate a large (50 TB) sales history fact table at the day and month levels. These queries have a slow response time and are exceeding cost expectations. You need to decrease response time, lower query costs, and minimize maintenance. What should you do?

Options:

Build materialized views on top of the sales table to aggregate data at the day and month level.

Build authorized views on top of the sales table to aggregate data at the day and month level.

Enable Bl Engine and add your sales table as a preferred table.

Create a scheduled query to build sales day and sales month aggregate tables on an hourly basis.

Answer

Questions # 49:

You need to look at BigQuery data from a specific table multiple times a day. The underlying table you are querying is several petabytes in size, but you want to filter your data and provide simple aggregations to downstream users. You want to run queries faster and get up-to-date insights quicker. What should you do?

Options:

Run a scheduled query to pull the necessary data at specific intervals daily.

Create a materialized view based off of the query being run.

Use a cached query to accelerate time to results.

Limit the query columns being pulled in the final result.

Questions # 50:

You have several different unstructured data sources, within your on-premises data center as well as in the cloud. The data is in various formats, such as Apache Parquet and CSV. You want to centralize this data in Cloud Storage. You need to set up an object sink for your data that allows you to use your own encryption keys. You want to use a GUI-based solution. What should you do?

Options:

Use Cloud Data Fusion to move files into Cloud Storage.

Use Storage Transfer Service to move files into Cloud Storage.

Use Dataflow to move files into Cloud Storage.

Use BigQuery Data Transfer Service to move files into BigQuery.

Viewing page 5 out of 12 pages

Viewing questions 41-50 out of questions