Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Google Cloud Certified Professional-Data-Engineer Questions and answers with Dumpstech

Exam Professional-Data-Engineer Premium Access

View all detail and faqs for the Professional-Data-Engineer exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 6 out of 12 pages

Viewing questions 51-60 out of questions

Questions # 51:

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?

Options:

cron

Cloud Composer

Cloud Scheduler

Workflow Templates on Cloud Dataproc

Questions # 52:

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Options:

Subsample your test dataset.

Subsample your training dataset.

Increase the number of input features to your model.

Increase the number of layers in your neural network.

Questions # 53:

You are using BigQuery with a regional dataset that includes a table with the daily sales volumes. This table is updated multiple times per day. You need to protect your sales table in case of regional failures with a recovery point objective (RPO) of less than 24 hours, while keeping costs to a minimum. What should you do?

Options:

Schedule a daily BigQuery snapshot of the table.

Schedule a daily export of the table to a Cloud Storage dual or multi-region bucket.

Schedule a daily copy of the dataset to a backup region.

Modify ETL job to load the data into both the current and another backup region.

Answer

Questions # 54:

You are designing a fault-tolerant architecture to store data in a regional BigOuery dataset. You need to ensure that your application is able to recover from a corruption event in your tables that occurred within the past seven days. You want to adopt managed services with the lowest RPO and most cost-effective solution. What should you do?

Options:

Export the data from BigQuery into a new table that excludes the corrupted data.

Migrate your data to multi-region BigQuery buckets.

Access historical data by using time travel in BigQuery.

Create a BigQuery table snapshot on a daily basis.

Questions # 55:

You have uploaded 5 years of log data to Cloud Storage A user reported that some data points in the log data are outside of their expected ranges, which indicates errors You need to address this issue and be able to run the process again in the future while keeping the original data for compliance reasons. What should you do?

Options:

Import the data from Cloud Storage into BigQuery Create a new BigQuery table, and skip the rows with errors.

Create a Compute Engine instance and create a new copy of the data in Cloud Storage Skip the rows with errors

Create a Cloud Dataflow workflow that reads the data from Cloud Storage, checks for values outside the expected range, sets the value to an appropriate default, and writes the updated records to a new dataset inCloud Storage

Questions # 56:

You need to choose a database for a new project that has the following requirements:

Fully managed

Able to automatically scale up

Transactionally consistent

Able to scale up to 6 TB

Able to be queried using SQL

Which database do you choose?

Options:

Cloud SQL

Cloud Bigtable

Cloud Spanner

Cloud Datastore

Questions # 57:

You've migrated a Hadoop job from an on-premises cluster to Dataproc and Good Storage. Your Spark job is a complex analytical workload fiat consists of many shuffling operations, and initial data are parquet toes (on average 200-400 MB size each) You see some degradation in performance after the migration to Dataproc so you'd like to optimize for it. Your organization is very cost-sensitive so you'd Idee to continue using Dataproc on preemptibles (with 2 non-preemptibles workers only) for this workload. What should you do?

Options:

Switch from HODs to SSDs override the preemptible VMs configuration to increase the boot disk size

Increase the see of your parquet files to ensure them to be 1 GB minimum

Switch to TFRecords format (appr 200 MB per We) instead of parquet files

Switch from HDDs to SSDs. copy initial data from Cloud Storage to Hadoop Distributed File System (HDFS) run the Spark job and copy results back to Cloud Storage

Questions # 58:

You are updating the code for a subscriber to a Put/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. You subscriber is not set up to retain acknowledged messages. What should you do to ensure that you can recover from errors after deployment?

Options:

Use Cloud Build for your deployment if an error occurs after deployment, use a Seek operation to locate a tmestamp logged by Cloud Build at the start of the deployment

Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to re-deliver messages that became available after the snapshot was created

Set up the Pub/Sub emulator on your local machine Validate the behavior of your new subscriber togs before deploying it to production

Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successful acknowledged if an error occurs after deployment, re-deliver any messages captured by the dead-letter queue

Questions # 59:

You store and analyze your relational data in BigQuery on Google Cloud with all data that resides in US regions. You also have a variety of object stores across Microsoft Azure and Amazon Web Services (AWS), also in US regions. You want to query all your data in BigQuery daily with as little movement of data as possible. What should you do?

Options:

Load files from AWS and Azure to Cloud Storage with Cloud Shell gautil rsync arguments.

Create a Dataflow pipeline to ingest files from Azure and AWS to BigQuery.

Use the BigQuery Omni functionality and BigLake tables to query files in Azure and AWS.

Use BigQuery Data Transfer Service to load files from Azure and AWS into BigQuery.

Questions # 60:

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have

asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

Options:

Increase the CPU size on your server.

Increase the size of the Google Persistent Disk on your server.

Increase your network bandwidth from your datacenter to GCP.

Increase your network bandwidth from Compute Engine to Cloud Storage.

Viewing page 6 out of 12 pages

Viewing questions 51-60 out of questions