Finally after a few month spent studying I decided to jump the fence and register for the exam. Here is how I prepared and what I found the most useful.

First I studied the cursus offered on Coursera, the 5 modules. As I never touched a cloud before and had basically 0 knowledge about it, it was benefitial to have access to the cloud it self and play around in the labs. The lessons are not so great and not really focused on the exam. Can’t hurt to go throught it once though.

I then practiced the official practice exam (link), the questions are very similar to the exam so it is worth it to do it over and over again until you get everything right and you understand why.

I found an additional practice exam bank on www.braincert.com (Google Cloud Certified – Professional Data Engineer Practice Exams). I had access to 4 tests of 50 questions each and questions slightly changing from one run to another. This was massively helpful as you get detailed explanation for each question. I learn best by doing so I did those over and over again until scoring consistently 90-95% on each test.

A friend of mine passed the exam in 2021 and he used the following question bank, he found it very useful: https://www.examtopics.com/exams/google/professional-data-engineer/view/

Then I bought Dan Sullivan’s “Professional Data Engineer Study Guide”, which was very helpful as it focuses on the exam itself and is designed to cover every topic. It also offers a good question bank online in order to practice 100 questions + all the quizzes in the book. I found the book well designed and a good final overview of the whole course.

While helpful I found a few questions in the official exam hard to answer with the book alone (update/insert performance on BQ, archiving of datastore, cluster routing on bigtable, Kafka connect and mirroring, DLM quotas for BQ).

Finally, you have to remain curious about all the topics you will cover and you have to dig deep into the documentation from google. In case something is not clear just read about it in the google doc. The information is always up to date and clearly explained.

In the exam I was able to answer 33/50 questions with 100% accuracy and then had to think harder for the remaining 17. Most of the 33 “easy” questions were already covered in the various online test banks I did, which made the answering quite straight forward.

If you do an online course, read the book and then practice the 500 questions from the different sources you should get the certification. There is no magic here, the exam will be about building and operating data pipelines and some ML. A good understanding about all the products involved is key along being able to list the caracteristics and best use cases for each product (Data warehouse with analytics and SQL -> BigQuery, time series data with low latency and high write performance -> BigTable etc..) is essential.

What worked best for me was practicing over and over all the dummy tests online and trying to understand exactly the answers and the rationale behind. A certain dose of theory is of course mandatory before..

And finally, please find attached the notes from my friend Aurélien Lefeuvre who passed the exam in February 2021 and was kind enough to share them with me along some of my own notes.