title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.
Spark V2 for Developers
Course Description
Overview
This Spark for Developers course is designed to to use Spark for data analysis and write Spark applications. This course covers the latest Spark v2 features.Objectives
- Spark Shell
- Spark internals
- Spark Data structures : RDDs, Dataframes, Datasets
- Spark APIs
- Spark SQL
- Spark and Hadoop
- Spark MLLib
- Spark Graphx
- Spark streaming
Audience
- Developer / Data Analysts
Prerequisites
- Familiarity with either Java, Scala, Python (labs in Scala and Python)
- Basic understanding of Linux development environment (command line navigation / running commands)
Topics
- A quick introduction to Scala
- Labs : Getting know Scala
- Spark Basics
- Big Data, Hadoop, Spark
- What’s new in Spark v2
- Spark concepts and architecture
- Spark eco system (core, spark sql, mlib, streaming)
- Labs : Installing and running Spark
- Spark Shell
- Spark shell
- Spark web UIs
- Analyzing dataset – part 1
- Labs: Spark shell exploration
- RDDs concepts
- Partitions
- RDD Operations / transformations
- More detailed coverage if required : RDD types, Key-Value pair RDDs, MapReduce on RDD
- Labs : Unstructured data analytics using RDDs
- Partitions
- Distributed processing
- Failure handling
- Caching and Persistence
- Instroduction to Dataframe / Dataset
- Programming in Dataframe / Dataset API
- Loading structured data using Dataframes
- Labs : Dataframes, Datasets, Caching
- Spark SQL concepts and overview
- Defining tables and importing datasets
- Querying data using SQL
- Handling various storage formats : JSON / Parquet / ORC
- Labs : querying structured data using SQL; evaluating data formats
- Introduction to Spark API
- Submitting the first program to Spark
- Debugging / logging
- Configuration properties
- Labs : Programming in Spark API, Submitting jobs
- Hadoop Primer : HDFS / YARN
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
- Spark & Hive
- Machine Learning primer
- Machine Learning in Spark : MLib / ML
- Spark ML overview (newer Spark2 version)
- Algorithms : Clustering, Classifications, Recommendations
- Labs : Writing ML applications
- GraphX library overview
- GraphX APIs
- Labs : Processing graph data using Spark
- Streaming concepts
- Evaluating Streaming platforms
- Spark Streaming library overview
- Streaming operations
- Sliding window operations
- Structured Streaming
- Continuous Streaming
- Spark & Kafka Streaming
- Labs : Writing spark streaming applications
- Highlight some Spark use cases in real world
Related Courses
-
Solr for Developers
SESR-100- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
-
Machine Learning Essentials
DCSK-105- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
Self-Paced Training Info
Learn at your own pace with anytime, anywhere training
- Same in-demand topics as instructor-led public and private classes.
- Standalone learning or supplemental reinforcement.
- e-Learning content varies by course and technology.
- View the Self-Paced version of this outline and what is included in the SPVC course.
- Learn more about e-Learning
Course Added To Shopping Cart
bla
bla
bla
bla
bla
bla
Self-Paced Training Terms & Conditions
Exam Terms & Conditions
Sorry, there are no classes that meet your criteria.
Please contact us to schedule a class.
STOP! Before You Leave
Save 0% on this course!
Take advantage of our online-only offer & save 0% on any course !
Promo Code skip0 will be applied to your registration
Purchase Information
title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.