Spring Sale Limited Time 75% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code = simple75

Pass the NVIDIA-Certified Professional NCP-AII Questions and answers with Dumpstech

Exam NCP-AII Premium Access

View all detail and faqs for the NCP-AII exam

Practice at least 50% of the questions to maximize your chances of passing.
Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions
Questions # 21:

A system engineer needs to set the vGPU scheduling behavior for all GPUs to share the scheduling equally with the default time slice length. What command should be used?

Options:

A.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"

B.

esxcli graphics module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"

C.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=FRL=0x01"

D.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x00"

Questions # 22:

When configuring an out-of-core HPL burn-in for a 40B matrix on 8x H100 nodes, which environment variable prevents GPU out-of-memory errors while reserving space for drivers?

Options:

A.

export HPL_OOC_SAFE_SIZE=4.0

B.

export HPL_OOC_MODE=0

C.

export HPL_OOC_NUM_STREAMS=8

D.

export HPL_OOC_MAX_GPU_MEM=90

Questions # 23:

A leaf switch shows " FW Version Mismatch " alerts for transceivers after cluster expansion. Which tool validates transceiver firmware against expected versions?

Options:

A.

flint

B.

iblinkinfo

C.

mlxconfig

D.

ethtool

Questions # 24:

A system administrator needs to configure a BlueField DPU and enable RShim on the baseboard management controller (BMC). Which command should be executed?

Options:

A.

ipmitool raw 0x32 0x6a 1

B.

systemctl restart rshim

C.

systemctl enable bmc-rshim.service

D.

scp < path_to_bfb > root@ < bmc_ip > :/dev/rshim0/boot

Questions # 25:

A cluster administrator is preparing to update the firmware on a DGX H100 system, including the GPU tray (baseboard). What is the correct sequence of steps to perform a safe and successful firmware upgrade?

Options:

A.

Update the BMC and skip the GPU tray and motherboard tray updates if the system appears healthy.

B.

Perform a cold reset, stop all GPU activity, update and reboot the BMC, update motherboard and tray components, and verify completion.

C.

Update the GPU tray first, then the motherboard tray, and reboot the BMC after all updates are complete.

D.

Stop all GPU activity, update and reboot the BMC, update motherboard and tray components, perform a cold reset, and verify completion.

Questions # 26:

After upgrading to HPL-AI 2.0 on a DGX A100 cluster, a 2x performance gain is observed. Which optimization is primarily responsible for this improvement?

Options:

A.

Reduction of problem size (N) to accelerate computation.

B.

MPI-aware GPU communication that reduces CPU bottlenecks and GPU idle time.

C.

Doubling of GPU clock speeds through firmware updates and relevant configuration.

D.

Automatic NVLink bandwidth doubling via driver updates.

Questions # 27:

Which of the following steps are essential components of a recommended DGX cluster installation procedure?

Pick the 2 correct responses below.

Options:

A.

Group nodes by function during initial setup and assign them to relevant categories in the cluster management tool.

B.

Configure networking by validating all interfaces on each node, ensuring proper InfiniBand and Ethernet connectivity prior to installing cluster software.

C.

Install Slurm on the head node and then configure the compute nodes’ default OS images.

D.

Complete application containerization, run distributed jobs, and skip validation of node health or storage availability.

Questions # 28:

After a recent OS upgrade, you need to reinstall NVIDIA GPU and DOCA drivers to support both AI training and accelerated networking. What best practice ensures successful installation and full hardware capability?

Options:

A.

Download and install only the specific versions of GPU and DOCA drivers listed as compatible with the current OS and hardware.

B.

Apply legacy drivers for hardware released within the last two years to maintain maximum compatibility across versions.

C.

Install the latest available drivers directly from the NVIDIA website.

D.

Use the default drivers provided by the Linux distribution, unless an installation fails during system boot.

Questions # 29:

A single-node stress test fails during the PCIe bandwidth validation phase. Which troubleshooting step is recommended first?

Options:

A.

Reduce PCIe Gen4 speed to Gen3 speed in BIOS settings.

B.

Reseat the GPU, then rerun the test.

C.

Disable NVLink in BIOS to isolate PCIe performance.

D.

Reinstall NVIDIA drivers using apt-get install nvidia-driver-550.

Questions # 30:

After ClusterKit reports " GPU-Host latency exceeds threshold, " which NVIDIA diagnostic tool should be used to isolate hardware faults?

Options:

A.

Re-run ClusterKit with --stress=gpu -Y 60 to extend test duration

B.

nvidia-smi topo -m to inspect GPU topology connections

C.

DCGM Diags dcgmi diag -r 2

D.

ib_write_bw to measure InfiniBand bandwidth between nodes

Viewing page 3 out of 4 pages
Viewing questions 21-30 out of questions