title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.
Spark V2 for Data Analysts
Course Description
Overview
This Spark V2 for Data Analysts course is designed to introduce Apache Spark. Students will learn how Spark version 2.x fits into the Big Data ecosystem and how to use Spark for data analysis.Objectives
- Scala primer
- Spark Shell
- Spark Data structures (RDD / Dataframe / Dataset)
- Spark SQL
- Spark & Hadoop
- Spark MLLib
- Spark Graphx
Audience
- Data Analysts, Business Analysts
Prerequisites
- Analyst background (familiarity with SQL, Scripting ..etc)
- Basic understanding of Linux development environment (basic command line navigation / editing files / running programs)
Topics
- A quick introduction to Scala
- Labs: Getting know Scala
- Big Data, Hadoop, Spark
- Spark concepts and architecture
- Spark eco system (core, spark sql, mlib, streaming)
- Labs: Installing and running Spark
- Spark shell
- Spark web UIs
- Analyzing dataset – part 1
- Labs: Spark shell exploration
- RDDs concepts
- Partitions
- RDD Operations / transformations
- Labs: Unstructured data analytics using RDDs
- Understanding newer Dataset API
- Dataframes
- Loading structured data using Dataframes
- Caching and persistence
- Labs: Dataframes, Datasets, Caching
- Spark SQL concepts and overview
- Defining tables and importing datasets
- Querying data using SQL
- Handling various storage formats: JSON / Parquet / ORC
- Labs: querying structured data using SQL; evaluating data formats
- Hadoop Primer: HDFS / YARN
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
- Spark & Hive
- Machine Learning primer
- Machine Learning in Spark: MLib / ML
- Spark ML overview (newer Spark2 version)
- Algorithms: Clustering, Classifications, Recommendations
- Labs: Writing ML applications
- GraphX library overview
- GraphX APIs
- Labs: Processing graph data using Spark
Related Courses
-
Spark V2 for Developers
DCSK-100- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
-
Solr for Developers
SESR-100- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
Self-Paced Training Info
Learn at your own pace with anytime, anywhere training
- Same in-demand topics as instructor-led public and private classes.
- Standalone learning or supplemental reinforcement.
- e-Learning content varies by course and technology.
- View the Self-Paced version of this outline and what is included in the SPVC course.
- Learn more about e-Learning
Course Added To Shopping Cart
bla
bla
bla
bla
bla
bla
Self-Paced Training Terms & Conditions
Exam Terms & Conditions
Sorry, there are no classes that meet your criteria.
Please contact us to schedule a class.
STOP! Before You Leave
Save 0% on this course!
Take advantage of our online-only offer & save 0% on any course !
Promo Code skip0 will be applied to your registration
Purchase Information
title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.