Summer Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the Amazon Web Services AWS Certified Associate MLA-C01 Questions and answers with Dumpstech

Exam MLA-C01 Premium Access

View all detail and faqs for the MLA-C01 exam

Go to Exam

Practice at least 50% of the questions to maximize your chances of passing.

Viewing page 3 out of 8 pages

Viewing questions 21-30 out of questions

Questions # 21:

A healthcare company wants to detect irregularities in patient vital signs that could indicate early signs of a medical condition. The company has an unlabeled dataset that includes patient health records, medication history, and lifestyle changes.

Which algorithm and hyperparameter should the company use to meet this requirement?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to greater than 100 to regulate tree complexity.

Use the Amazon SageMaker AI k-means clustering algorithm. Set k to determine the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to the number of training iterations.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set num_trees to greater than 100.

Questions # 22:

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model's performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model's final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Question # 22

Options:

Questions # 23:

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model's F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

Concept drift occurred in the underlying customer data that was used for predictions.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

The original baseline data had a data quality issue of missing values.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Questions # 24:

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

AWS::SageMaker::Model

AWS::SageMaker::Endpoint

AWS::SageMaker::NotebookInstance

AWS::SageMaker::Pipeline

Questions # 25:

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Which AWS service or feature can aggregate the data from the various data sources?

Options:

Amazon EMR Spark jobs

Amazon Kinesis Data Streams

Amazon DynamoDB

AWS Lake Formation

Answer

Explanation

Problem Description:

The dataset includes multiple data sources:

Transaction logs and customer profiles in Amazon S3.

Tables in an on-premises MySQL database.

There is a class imbalance in the dataset and interdependencies among features that need to be addressed.

The solution requires data aggregation from diverse sources for centralized processing.

Why AWS Lake Formation?

AWS Lake Formation is designed to simplify the process of aggregating, cataloging, and securing data from various sources, including S3, relational databases, and other on-premises systems.

It integrates with AWS Glue for data ingestion and ETL (Extract, Transform, Load) workflows, making it a robust choice for aggregating data from Amazon S3 and on-premises MySQL databases.

How It Solves the Problem:

Data Aggregation: Lake Formation collects data from diverse sources, such as S3 and MySQL, and consolidates it into a centralized data lake.

Cataloging and Discovery: Automatically crawls and catalogs the data into a searchable catalog, which the ML engineer can query for analysis or modeling.

Data Transformation: Prepares data using Glue jobs to handle preprocessing tasks such as addressing class imbalance (e.g., oversampling, undersampling) and handling interdependencies among features.

Security and Governance: Offers fine-grained access control, ensuring secure and compliant data management.

Steps to Implement Using AWS Lake Formation:

Step 1: Set up Lake Formation and register data sources, including the S3 bucket and on-premises MySQL database.

Step 2: Use AWS Glue to create ETL jobs to transform and prepare data for the ML pipeline.

Step 3: Query and access the consolidated data lake using services such as Athena or SageMaker for further ML processing.

Why Not Other Options?

Amazon EMR Spark jobs: While EMR can process large-scale data, it is better suited for complex big data analytics tasks and does not inherently support data aggregation across sources like Lake Formation.

Amazon Kinesis Data Streams: Kinesis is designed for real-time streaming data, not batch data aggregation across diverse sources.

Amazon DynamoDB: DynamoDB is a NoSQL database and is not suitable for aggregating data from multiple sources like S3 and MySQL.

Conclusion: AWS Lake Formation is the most suitable service for aggregating data from S3 and on-premises MySQL databases, preparing the data for downstream ML tasks, and addressing challenges like class imbalance and feature interdependencies.

AWS Lake Formation Documentation

AWS Glue for Data Preparation

Questions # 26:

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Questions # 27:

An ML engineer is analyzing potential biases in a customer dataset before training an ML model. The dataset contains customer age (numeric), product reviews (text), and purchase outcomes (categorical).

Which statistical metrics should the ML engineer use to identify potential biases in the dataset before model training?

Options:

Calculate the statistical mean and standard deviation of customer age distribution. Count word frequencies in product reviews.

Calculate the class imbalance metric of purchase outcomes. Use product reviews to check sentiment distribution to capture bias.

Calculate the class imbalance metric of purchase outcomes and the difference in proportions of labels (DPL) across customer age groups.

Calculate the correlation coefficient between customer age and purchase outcomes. Calculate unique word counts in product reviews.

Answer

Questions # 28:

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

Options:

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Questions # 29:

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

Options:

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Questions # 30:

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

Use a multi-model serverless endpoint. Enable caching.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Viewing page 3 out of 8 pages

Viewing questions 21-30 out of questions