This two day course, created by Dean Wampler, Ph.D., is designed to teach developers how to implement data processing pipelines and analytics using Apache Spark . Developers will use hands on exercises to learn the Spark Core, SQL/DataFrame, Streaming, and MLlib (machine learning) APIs. Developers will also learn about Spark internals and tips for improving application performance.
Additional coverage includes integration with Mesos, Hadoop, and Reactive frameworks like Akka.
- Understand how to use the Spark Scala APIs to implement various data analytics algorithms for offline (batchmode) and eventstreaming applications
- Understand Spark internals
- Understand Spark performance considerations
- Understand how to test and deploy Spark applications
- Understand the basics of integrating Spark with Mesos, Hadoop, and Akka
- Experience with Scala, such as completion of Fast Track to Scala course
- Experience with SQL, machine learning, and other Big Data tools will be helpful, but not required.
- Developers wishing to learn how to write datacentric applications using Spark.