How to pass the Databricks Spark Developer Associate in Python Certification Exam
By Manas Reddy
I recently passed the Databricks Certified Associate Developer for Apache Spark after only two weeks of preparation! In this article, I’ll walk through what is and isn’t covered in the exam, the specific exam topics, study materials, and the types of questions you’d see on the exam.
What is covered by the exam
- Understanding the basics of the Spark architecture, including Adaptive Query Execution
- Apply the Spark DataFrame API to complete individual data manipulation tasks, including:
- selecting, renaming, and manipulating columns
- filtering, dropping, sorting, and aggregating rows
- joining, reading, writing, and partitioning DataFrames
- working with UDFs and Spark SQL functions
What’s not covered by the Exam
- Anything related to structured streaming, Spark ML, or Performance Tuning
- Ability to tune Apache Spark Jobs
- Ability to create data visualizations
- Ability to build, evaluate, deploy, and manage machine learning models
- Understanding of data engineering and machine learning pipelines
- Ability to set up real-time data streams
How the exam is structured
The exam consists of 60 multiple-choice questions, with a unique twist — it’s open notes, but there’s a catch. You will have access to a PDF of the Databricks documentation, but you won’t be able to use the ‘ctrl + F function to search for specific keywords. The hyperlinks within the document, leading to various sections, will remain functional, though.
The questions in the exam will be split into three distinct categories:
- 17% of the questions will be about Spark Architecture — Conceptual Understanding. These questions will cover topics such as data partitioning, processing, evaluation, transformations versus actions, and so on.
- 11% will be about Spark Architecture — Applied Understanding, where you’ll need to apply your knowledge of storage levels, coalescing, and partitioning, among others.
- The bulk of the exam, 72%, will focus on Spark DataFrame API. Here, you will be tested on your ability to manipulate data using common data manipulation terminology.
Most of the exam is dedicated to the DataFrame API; therefore, the majority of your preparation time should be spent on this area.
What study materials did I use?
- Read over Chapters 1–7 from Learning Spark
- Skimmed sections I, II, and IV from Spark: The Definitive Guide
- Used this blog to get an in-depth understanding of how Spark works: blog
- I recommend watching this brilliant video that explains how Spark works under the hood: Farooq_Spark
- Using this practice exam to gauge readiness and identify weaknesses: practice-test
- Using this practice exam just before the exam to really prepare: databricks_test_dumps
Additional Materials
If you have access to the Databricks Customer Academy, these two courses were super helpful!
Example Exam Questions
Below are some types of questions you’d see on the exam for Spark Architecture and the DataFrame API
Spark Architecture-type questions
1.
2.
3.
4.
5.
Registering for the exam
Follow the instructions on the Databricks Certification website for the Databricks Certified Associate Developer for Apache Spark. Select the correct language (Python or Scala) when you register for the exam.
Happy learning!
Manas