Big 11.11 Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple70

Pass the Databricks Certification Databricks-Certified-Data-Engineer-Associate Questions and answers with Dumpstech

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

View all detail and faqs for the Databricks-Certified-Data-Engineer-Associate exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 1 out of 5 pages

Viewing questions 1-10 out of questions

Questions # 1:

Which of the following describes the type of workloads that are always compatible with Auto Loader?

Options:

Dashboard workloads

Streaming workloads

Machine learning workloads

Serverless workloads

Batch workloads

Questions # 2:

Which of the following benefits is provided by the array functions from Spark SQL?

Options:

An ability to work with data in a variety of types at once

An ability to work with data within certain partitions and windows

An ability to work with time-related data in specified intervals

An ability to work with complex, nested data ingested from JSON files

An ability to work with an array of tables for procedural automation

Questions # 3:

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Questions # 4:

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:

There was a type mismatch between the specific schema and the inferred schema

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Auto Loader cannot infer the schema of ingested data

Questions # 5:

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.

They can set up the dashboard’s SQL endpoint to be serverless.

They can turn on the Auto Stop feature for the SQL endpoint.

They can reduce the cluster size of the SQL endpoint.

They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Questions # 6:

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Options:

When the location of the data needs to be changed

When the target table is an external table

When the source table can be deleted

When the target table cannot contain duplicate records

When the source is not a Delta table

Answer

Explanation

The MERGE INTO command is used to perform upserts, which are a combination of insertions and updates, based on a source table into a target Delta table1. The MERGE INTO command can handle scenarios where the target table cannot contain duplicate records, such as when there is a primary key or a unique constraint on the target table. The MERGE INTO command can match the source and target rows based on a merge condition and perform different actions depending on whether the rows are matched or not. For example, the MERGE INTO command can update the existing target rows with the new source values, insert the new source rows that do not exist in the target table, or delete the target rows that do not exist in the source table1.

The INSERT INTO command is used to append new rows to an existing table or create a new table from a query result2. The INSERT INTO command does not perform any updates or deletions on the existing target table rows. The INSERT INTO command can handle scenarios where the location of the data needs to be changed, such as when the data needs to be moved from one table to another, or when the data needs to be partitioned by a certain column2. The INSERT INTO command can also handle scenarios where the target table is an external table, such as when the data is stored in an external storage system like Amazon S3 or Azure Blob Storage3. The INSERT INTO command can also handle scenarios where the source table can be deleted, such as when the source table is a temporary table or a view4. The INSERT INTO command can also handle scenarios where the source is not a Delta table, such as when the source is a Parquet, CSV, JSON, or Avro file5.

Questions # 7:

A data engineer is working on a Databricks project that utilizes cloud storage. The data engineer wants to load several json files from containers on a storage account as soon as the file arrives within the storage account.

Which syntax should the data engineer follow to first load the files into a dataframe and check that it is working as expected using Python?

Options:

df = spark.readStream.format("json").load("input/path")

df = spark.readStream.format("cloud"),option("json").load("/input/path")

df = spark.readStream.format("cloudFiles") .option("cloudFiles.format", "json") .load("/input/path")

df = spark.read.json("inp i./path")

Questions # 8:

A data engineer needs to parse only png files in a directory that contains files with different suffixes. Which code should the data engineer use to achieve this task?

Question # 8

Options:

Option A

Option B

Option C

Option D

Questions # 9:

A data engineer has written a function in a Databricks Notebook to calculate the population of bacteria in a given medium.

Question # 9

Analysts use this function in the notebook and sometimes provide input arguments of the wrong data type, which can cause errors during execution.

Which Databricks feature will help the data engineer quickly identify if an incorrect data type has been provided as input?

Options:

The Data Engineer should add print statements to find out what the variable is.

The Databricks debugger enables breakpoints that will raise an error if the wrong data type is submitted

The Spark User interface has a debug tab that contains the variables that are used in this session.

The Databricks debugger enables the use of a variable explorer to see at a glance the value of the variables.

Questions # 10:

A data engineering project involves processing large batches of data on a daily schedule using ETL. The jobs are resource-intensive and vary in size, requiring a scalable, cost-efficient compute solution that can automatically scale based on the workload.

Which compute approach will satisfy the needs described?

Options:

Databricks SQL Serverless

Dedicated Cluster

All-Purpose Cluster

Job Cluster

Viewing page 1 out of 5 pages

Viewing questions 1-10 out of questions