Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Google Cloud Certified Professional-Data-Engineer Questions and answers with Dumpstech

Exam Professional-Data-Engineer Premium Access

View all detail and faqs for the Professional-Data-Engineer exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 9 out of 12 pages

Viewing questions 81-90 out of questions

Questions # 81:

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

Options:

Your gcloud does not have access to the BigQuery resources

BigQuery cannot be accessed from local machines

You are missing gcloud on your machine

Pipelines cannot be run locally

Questions # 82:

What are the minimum permissions needed for a service account used with Google Dataproc?

Options:

Execute to Google Cloud Storage; write to Google Cloud Logging

Write to Google Cloud Storage; read to Google Cloud Logging

Execute to Google Cloud Storage; execute to Google Cloud Logging

Read and write to Google Cloud Storage; write to Google Cloud Logging

Questions # 83:

Cloud Bigtable is Google's ______ Big Data database service.

Options:

Relational

mySQL

NoSQL

SQL Server

Questions # 84:

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

Options:

dataflow.worker

dataflow.compute

dataflow.developer

dataflow.viewer

Questions # 85:

Which of the following statements about Legacy SQL and Standard SQL is not true?

Options:

Standard SQL is the preferred query language for BigQuery.

If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.

One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name).

You need to set a query language for each dataset and the default is Standard SQL.

Questions # 86:

How can you get a neural network to learn about relationships between categories in a categorical feature?

Options:

Create a multi-hot column

Create a one-hot column

Create a hash bucket

Create an embedding column

Questions # 87:

Scaling a Cloud Dataproc cluster typically involves ____.

Options:

increasing or decreasing the number of worker nodes

increasing or decreasing the number of master nodes

moving memory to run more applications on a single node

deleting applications from unused nodes periodically

Questions # 88:

Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

Options:

Put the data into Google Cloud Storage.

Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.

Tune the Cloud Dataproc cluster so that there is just enough disk for all data.

Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.

Questions # 89:

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

Options:

Include ORDER BY DESK on timestamp column and LIMIT to 1.

Use GROUP BY on the unique ID column and timestamp column and SUM on the values.

Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.

Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Questions # 90:

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in thedashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

Options:

Check the dashboard application to see if it is not displaying correctly.

Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.

Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.

Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.

Viewing page 9 out of 12 pages

Viewing questions 81-90 out of questions