Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with Dumpstech

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

View all detail and faqs for the Databricks-Certified-Data-Engineer-Associate exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 4 out of 6 pages

Viewing questions 31-40 out of questions

Questions # 31:

A data engineer needs to combine sales data from an on-premises PostgreSQL database with customer data in Azure Synapse for a comprehensive report. The goal is to avoid data duplication and ensure up-to-date information

How should the data engineer achieve this using Databricks?

Options:

Develop custom ETL pipelines to ingest data into Databricks

Use Lakehouse Federation to query both data sources directly

Manually synchronize data from both sources into a single database

Export data from both sources to CSV files and upload them to Databricks

Questions # 32:

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

Question # 32

Options:

Option A

Option B

Option C

Option D

Option E

Questions # 33:

A data engineer needs to provide access to a group named manufacturing-team. The team needs privileges to create tables in the quality schema.

Which set of SQL commands will grant a group named manufacturing-team to create tables in a schema named production with the parent catalog named manufacturing with the least privileges?

Question # 33

Options:

Option A

Option B

Option C

Option D

Questions # 34:

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Options:

SELECT * FROM my_table WHERE age > 25;

UPDATE my_table WHERE age > 25;

DELETE FROM my_table WHERE age > 25;

UPDATE my_table WHERE age <= 25;

DELETE FROM my_table WHERE age <= 25;

Questions # 35:

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

pyspark.sql.types.DateType

datetime

pyspark.sql.types.TimestampType

Cron syntax

There is no way to represent and submit this information programmatically

Questions # 36:

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by "purchase_date" a date column which helps with time-based queries but does not optimize searches on user statistics "customer_id", a high-cardinality column.

The table is usually queried with filters on "customer_i

d" within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

Options:

Alter table implementing liquid clustering on "customerid" while keeping the existing partitioning.

Alter the table to partition by "customer_id".

Enable delta caching on the cluster so that frequent reads are cached for performance.

Alter the table implementing liquid clustering by "customer_id" and "purchase_date".

Questions # 37:

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

Questions # 38:

A data engineer is maintaining an ETL pipeline code with a GitHub repository linked to their Databricks account. The data engineer wants to deploy the ETL pipeline to production as a databricks workflow.

Which approach should the data engineer use?

Options:

Databricks Asset Bundles (DAB) + GitHub Integration

Maintain workflow_config.j son and deploy it using Databricks CLI

Manually create and manage the workflow in Ul

Maintain workflow_conf ig. json and deploy it using Terraform

Questions # 39:

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

Options:

They can set up an Alert with a custom template.

They can set up an Alert with a new email alert destination.

They can set up an Alert with one-time notifications.

They can set up an Alert with a new webhook alert destination.

They can set up an Alert without notifications.

Questions # 40:

A data engineer is writing a script that is meant to ingest new data from cloud storage. In the event of the Schema change, the ingestion should fail. It should fail until the changes downstream source can be found and verified as intended changes.

Which command will meet the requirements?

Options:

addNewColumns

failOnNewColumns

rescue

none

Viewing page 4 out of 6 pages

Viewing questions 31-40 out of questions