How to pass the Databricks Spark Developer Associate in Python Certification Exam

Manasreddy
3 min readJul 17, 2023

--

By Manas Reddy

I recently passed the Databricks Certified Associate Developer for Apache Spark after only two weeks of preparation! In this article, I’ll walk through what is and isn’t covered in the exam, the specific exam topics, study materials, and the types of questions you’d see on the exam.

What is covered by the exam

  • Understanding the basics of the Spark architecture, including Adaptive Query Execution
  • Apply the Spark DataFrame API to complete individual data manipulation tasks, including:
  • selecting, renaming, and manipulating columns
  • filtering, dropping, sorting, and aggregating rows
  • joining, reading, writing, and partitioning DataFrames
  • working with UDFs and Spark SQL functions

What’s not covered by the Exam

  • Anything related to structured streaming, Spark ML, or Performance Tuning
  • Ability to tune Apache Spark Jobs
  • Ability to create data visualizations
  • Ability to build, evaluate, deploy, and manage machine learning models
  • Understanding of data engineering and machine learning pipelines
  • Ability to set up real-time data streams

How the exam is structured

The exam consists of 60 multiple-choice questions, with a unique twist — it’s open notes, but there’s a catch. You will have access to a PDF of the Databricks documentation, but you won’t be able to use the ‘ctrl + F function to search for specific keywords. The hyperlinks within the document, leading to various sections, will remain functional, though.

The questions in the exam will be split into three distinct categories:

  • 17% of the questions will be about Spark Architecture — Conceptual Understanding. These questions will cover topics such as data partitioning, processing, evaluation, transformations versus actions, and so on.
  • 11% will be about Spark Architecture — Applied Understanding, where you’ll need to apply your knowledge of storage levels, coalescing, and partitioning, among others.
  • The bulk of the exam, 72%, will focus on Spark DataFrame API. Here, you will be tested on your ability to manipulate data using common data manipulation terminology.

Most of the exam is dedicated to the DataFrame API; therefore, the majority of your preparation time should be spent on this area.

What study materials did I use?

Additional Materials

If you have access to the Databricks Customer Academy, these two courses were super helpful!

Example Exam Questions

Below are some types of questions you’d see on the exam for Spark Architecture and the DataFrame API

Spark Architecture-type questions

1.

2.

3.

4.

5.

Registering for the exam

Follow the instructions on the Databricks Certification website for the Databricks Certified Associate Developer for Apache Spark. Select the correct language (Python or Scala) when you register for the exam.

Happy learning!

Manas

--

--

No responses yet