title

Date: xxx

Location: xxx

Time: xxx

Price: xxx

Please take a moment to fill out this form. We will get back to you as soon as possible.

All fields marked with an asterisk (*) are mandatory.

Introduction to Spark Programming

Price

2,800 USD

Duration

4 Days

Course

DCSK-130

Available Formats

Classroom Training, Online Training

View Class Schedule Course Description Group Training Questions? Contact Us

AWS Training Pass

Take advantage of flexible training options with the AWS Training Pass and get Authorized AWS Training for a full year.

Learn More

Class Schedule

Delivery Formats

Sort & Filter

×

Sort results

Filter Classes

Guaranteed to Run

Modality

Location

Language

Date

Sorry, there are no public classes currently scheduled in your country.

Please complete this form, and a Training Advisor will be in touch with you shortly to address your training needs.

First Name* Last Name* Company Email* Phone Training Option

Comments Yes, I'd like to receive special offers from LearnQuest. I have read and agree to LearnQuest's Terms and Conditions and Privacy Policy, and I consent to have my submitted information stored.

View Global Schedule

Course Description

Overview

This course introduces the Apache Spark distributed computing engine, and is suitable for developers, data analysts, architects, technical managers, and anyone who needs to use Spark in a hands-on manner. It is based on the Spark 2.x release.
The course provides a solid technical introduction to the Spark architecture and how Spark works. It covers the basic building blocks of Spark (e.g. RDDs and the distributed compute engine), as well as higher-level constructs that provide a simpler and more capable interface (e.g. DataSets/DataFrames and Spark SQL). It includes in-depth coverage of Spark SQL, DataFrames, and DataSets, which are now the preferred programming API. This includes exploring possible performance issues and strategies for optimization.
The course also covers more advanced capabilities such as the use of Spark Streaming to process streaming data, and integrating with the Kafka server.
The course is very hands-on, with many labs. Participants will interact with Spark through the Spark shell (for interactive, ad-hoc processing) as well as through programs using the Spark API. After taking this course, you will be ready to work with Spark in an informed and productive manner.
Labs are supported in both Python and Scala

Objectives

Upon completion of the Introduction to Spark Programming course, students will be able to:

Know the difference between “data-at-rest” and “data-in-motion”
Understand what map-reduce / Hadoop is, and what it can do
Be aware of query technologies for easily querying with Hadoop (e.g. Hive, Pig, and others)
Understand what NoSQL databases are and what they can do
Become familiar with the choices in the NoSQL landscape
Understand the strengths and weaknesses of different NoSQL technologies
Be well-informed on your choices in Big Data processing, and evaluate them for your needs
Understand what Big Data is

Audience

Developers, data analysts, architects, technical managers, and anyone who needs to use Spark in a hands-on manner

Prerequisites

Topics

1. (Optional): Scala Ramp Up

Scala Introduction, Variables, Data Types, Control Flow
The Scala Interpreter
Collections and their Standard Methods (e.g. map())
Functions, Methods, Function Literals
Class, Object, Trait, case Class

2. Introduction to Spark

Overview, Motivations, Spark Systems
Spark Ecosystem
Spark vs. Hadoop
Acquiring and Installing Spark
The Spark Shell, SparkContext

3. RDDs and Spark Architecture

RDD Concepts, Lifecycle, Lazy Evaluation
RDD Partitioning and Transformations
Working with RDDs - Creating and Transforming (map, filter, etc.)

4. Spark SQL, DataFrames, and DataSets

Overview
SparkSession, Loading/Saving Data, Data Formats (JSON, CSV, Parquet, text ...)
Introducing DataFrames and DataSets (Creation and Schema Inference)
Supported Data Formats (JSON, Text, CSV, Parquet)
Working with the DataFrame (untyped) Query DSL (Column, Filtering, Grouping, Aggregation)
SQL-based Queries
Working with the DataSet (typed) API
Mapping and Splitting (flatMap(), explode(), and split())
DataSets vs. DataFrames vs. RDDs

5. Shuffling Transformations and Performance

Grouping, Reducing, Joining
Shuffling, Narrow vs. Wide Dependencies, and Performance Implications
Exploring the Catalyst Query Optimizer (explain(), Query Plans, Issues with lambdas)
The Tungsten Optimizer (Binary Format, Cache Awareness, Whole-Stage Code Gen)

6. Performance Tuning

Caching - Concepts, Storage Type, Guidelines
Minimizing Shuffling for Increased Performance
Using Broadcast Variables and Accumulators
General Performance Guidelines

7. Creating Standalone Applications

Core API, SparkSession.Builder
Configuring and Creating a SparkSession
Building and Running Applications - sbt/build.sbt and spark-submit
Application Lifecycle (Driver, Executors, and Tasks)
Cluster Managers (Standalone, YARN, Mesos)
Logging and Debugging

8. Spark Streaming

Introduction and Streaming Basics
Spark Streaming (Spark 1.0+)
DStreams, Receivers, Batching
Stateless Transformation
Windowed Transformation
Stateful Transformation
Structured Streaming (Spark 2+)
Continuous Applications
Table Paradigm, Result Table
Steps for Structured Streaming
Sources and Sinks
Consuming Kafka Data
Kafka Overview
Structured Streaming - 'kafka' format
Processing the Stream

Spark V2 for Developers

DCSK-100
- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
Solr for Developers

SESR-100
- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD

Top 20 Training Industry Company - IT Training

Need Help?

Call us at 877-206-0106 or e-mail us at info@learnquest.com

Personalized Solutions

Need a personalized solution for your Training? Contact us, and one of our training advisors will help you find the best solution.

Contact Us

Need Help?

Do you have a question about the courses, instruction, or materials covered? Do you need help finding which course is best for you? We are here to help!

Talk to us

Self-Paced Training Info

Learn at your own pace with anytime, anywhere training

Same in-demand topics as instructor-led public and private classes.
Standalone learning or supplemental reinforcement.
e-Learning content varies by course and technology.
View the Self-Paced version of this outline and what is included in the SPVC course.
Learn more about e-Learning

Course Added To Shopping Cart

bla

Self-Paced Training Terms & Conditions

??spvc-wbt-warning??

Exam Terms & Conditions

??exam-warning??

??group-training-form-area??

??how-can-we-help-you-area??

??personalized-form-area??

??request-quote-area??

Purchase Information

??elearning-coursenumber?? ??coursename??

View Cart

title

Date: xxx

Location: xxx

Time: xxx

Price: xxx

Please take a moment to fill out this form. We will get back to you as soon as possible.

All fields marked with an asterisk (*) are mandatory.

First Name* Last Name* Company Email* Phone Country* How many people need group training?* Student email addresses (optional) Comments Yes, I'd like to receive special offers from LearnQuest. I have read and agree to LearnQuest's Terms and Conditions and Privacy Policy, and I consent to have my submitted information stored.

Thank you for your interest in LearnQuest.

Thank you for your interest in Private Training.

Thank you for your interest in LearnQuest!

title

Introduction to Spark Programming

AWS Training Pass

Class Schedule

Sort results

Filter Classes

Guaranteed to Run

Modality

Location

Language

Date

The self-paced version of this course is also available through our IBM Learning Subscription

Course Description

Overview

Objectives

Audience

Prerequisites

Topics

Recognition

Spark V2 for Developers

Solr for Developers

Need Help?

Personalized Solutions

Need Help?

Self-Paced Training Info

Course Added To Shopping Cart

Self-Paced Training Terms & Conditions

Exam Terms & Conditions

STOP! Before You Leave

Save 0% on this course!

Purchase Information

title

Need more Information?

title

Introduction to Spark Programming

AWS Training Pass

Class Schedule

Sort results

Filter Classes

Guaranteed to Run

Modality

Location

Language

Date

Course Description

Overview

Objectives

Audience

Prerequisites

Topics

Recognition

Related Courses

Spark V2 for Developers

Solr for Developers

Need Help?

Personalized Solutions

Need Help?

Self-Paced Training Info

Course Added To Shopping Cart

Self-Paced Training Terms & Conditions

Exam Terms & Conditions

STOP! Before You Leave

Save 0% on this course!

Purchase Information

title

Need more Information?