Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with Dumpstech

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

View all detail and faqs for the Databricks-Certified-Data-Engineer-Associate exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 5 out of 6 pages

Viewing questions 41-50 out of questions

Questions # 41:

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:

DROP TABLE IF EXISTS my_table;

After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.

Which of the following describes why all of these files were deleted?

Options:

The table was managed

The table's data was smaller than 10 GB

The table's data was larger than 10 GB

The table was external

The table did not have a location

Questions # 42:

A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark Ul and can see the ERRORS as follows:

Question # 42

Which two corrective actions should the data engineer perform to resolve this issue?

Choose 2 answers - (Q) Narrow the filters in order to collect less data in the query

Options:

Upsize the worker nodes and activate autoshuffle partitions

Upsize the driver node and deactivate autoshuffle partitions

Cache the dataset in order to boost the query performance

Fix the shuffle partitions to 50 to ensure the allocation

Questions # 43:

Which of the following describes the relationship between Bronze tables and raw data?

Options:

Bronze tables contain less data than raw data files.

Bronze tables contain more truthful data than raw data.

Bronze tables contain aggregates while raw data is unaggregated.

Bronze tables contain a less refined view of data than raw data.

Bronze tables contain raw data with a schema applied.

Questions # 44:

An organization is looking for an optimized storage layer that supports ACID transactions and schema enforcement. Which technology should the organization use?

Options:

Cloud File Storage

Unity Catalog

Data lake

Delta Lake

Questions # 45:

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which of the following approaches can the data engineer take to identify the table that is dropping the records?

Options:

They can set up separate expectations for each table when developing their DLT pipeline.

They cannot determine which table is dropping the records.

They can set up DLT to notify them via email when records are dropped.

They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.

Questions # 46:

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:

Worker node

JDBC data source

Databricks web application

Databricks Filesystem

Driver node

Answer

Explanation

The Databricks web application is the user interface that allows you to create and manage workspaces, clusters, notebooks, jobs, and other resources. It is hosted completely in the control plane of the classic Databricks architecture, which includes the backend services that Databricks manages in your Databricks account. The other options are part of the compute plane, which is where your data is processed by compute resources such as clusters. The compute plane is in your own cloud account and network. References: Databricks architecture overview, Security and Trust Center QUESTION NO: 4

Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?

A. The ability to manipulate the same data using a variety of languages

B. The ability to collaborate in real time on a single notebook

C. The ability to set up alerts for query failures

D. The ability to support batch and streaming workloads

E. The ability to distribute complex data operations

Answer: D

Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse. Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale1. Delta Lake supports upserts using the merge operation, which enables you to efficiently update existing data or insert new data into your Delta tables2. Delta Lake also provides time travel capabilities, which allow you to query previous versions of your data or roll back to a specific point in time3. References: 1: What is Delta Lake? | Databricks on AWS 2: Upsert into a table using merge | Databricks on AWS 3: [Query an older snapshot of a table (time travel) | Databricks on AWS]

Learn more

Questions # 47:

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

O They can reduce the cluster size of the SQL endpoint.

Q They can turn on the Auto Stop feature for the SQL endpoint.

O They can set up the dashboard's SQL endpoint to be serverless.

0 They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

Questions # 48:

Which type of workloads are compatible with Auto Loader?

Options:

Streaming workloads

Machine learning workloads

Serverless workloads

Batch workloads

Questions # 49:

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

Options:

Cloud-specific integrations

Simplified governance

Ability to scale storage

Ability to scale workloads

Avoiding vendor lock-in

Questions # 50:

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

They can turn on the Auto Stop feature for the SQL endpoint.

They can increase the cluster size of the SQL endpoint.

They can turn on the Serverless feature for the SQL endpoint.

They can increase the maximum bound of the SQL endpoint's scaling range

Viewing page 5 out of 6 pages

Viewing questions 41-50 out of questions