Getting experience in new technologies is a lot like the chicken and the egg; no one will put you on a project until you have experience, but it’s difficult to get experience without being on a project. This was my conundrum with Big Data, and I chose to “hatch” some experience by gaining a CCDH credential. My mindset was to truly learn the technologies around Hadoop, focusing less on test requirements. Of course I also wanted to pass the test, but by having the test preparation be a guide that I occasionally strayed from, it made the process fun as well as rewarding.
Here is my advice on how to prepare for CCD-410:
- Start by going through all the Hortonworks tutorials. Hadoop is the same regardless of its distributor and you should take advantage of training materials from all of them. You will need to install the Hortonworks sandbox first. These tutorials give you a great introduction, as well as real experience using Hive/Pig/the CLI, etc. They also show much of the Hadoop eco-system, which Cloudera will test you on.
- Download and install the Cloudera QuickStart VM. I installed it on VMWare player hosted by windows 7 and it ran fine with 4 GB of RAM.
- Get Tom White’s book, “Hadoop: The Definitive Guide” and make sure to understand chapters 1-3 in particular. The rest of the book is useful, but it is pretty dry reading. Try switching to Alex Holmes’ book “Hadoop in Practice” when you feel you need a change.
- Hire a tutor to teach you MapReduce. I found an excellent one on eLance, and had about 10 sessions with him. We went into great depth on the “Wordcount” program which is in Tom White’s book. These sessions helped me understand the Writable, Serializable and other interfaces of MapReduce, which was invaluable for passing the test. And much more of Tom White’s book became understandable.
- If you need some help with programming Java, I found this series by thenewboston on YouTube very useful. And the presenter has a great sense of humor.
- Once you have some background with HDFS and the MapReduce API, you will be ready for these two video’s which are excellent, by Tom White. They show the old API, but the concepts are spot-on for many test questions:
- There will be questions on YARN and MRv2, and you will need to find resources to learn these important changes to Hadoop. This article from Apache explains it generally, but you should find additional resources to learn it in greater detail.
- Go through the Coudera study guide for the exam, watch all the videos it provides, and be sure to brush up on topics you do not fully understand.
- As I went through the study guide from Cloudera, I watched many YouTube videos from Hadoop Summit 2013.
- This article, “24 Interview Questions & Answers for Hadoop MapReduce developers,” was a great study guide, and served as a good practice test.
I hope this helps you learn Hadoop and pass CCD-410.