title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.
Intermediate Apache Spark 3
Course Description
Overview
Apache Spark is a powerful, open-source processing engine for big data processing. It is optimized for speed, ease of use, and sophisticated analytics.This hands-on course in Apache Spark is geared for technical business professional who wish to solve real-world data problems using Apache Spark.
This class is taught with Python language and using Jupyter environment. This course covers latest features in Spark version 3.
Objectives
- ecosystem
- features in Spark3
- Shell
- Data structures (RDD / Dataframe / Dataset)
- SQL
- data formats and Spark
- API
Audience
Prerequisites
- with Big Data use cases and ecosystem
- experience with data warehousing, ETL processing
- knowledge of Python language and Jupyter notebooks is preferred but not mandatory
- with Spark basics is welcome, but not required.
- lab environment will be provided to students, no need to install anything on the laptop
- reasonably modern laptop with unrestricted connection to the Internet. Laptops with overly restrictive VPNs or firewalls may not work properly
- modern browser
Topics
- Data stacks, Hadoop, Spark
- new features
- concepts and architecture
- components overview
- tour of Databricks cloud environment
- shell
- web UIs
- dataset – part 1
- Spark shell exploration
- execution
- transformations and actions
- Unstructured data analytics
- overview
- caching mechanisms available in Spark
- memory file systems
- use cases and best practices
- Benchmark of caching performance
- Intro
- structured data (json, CSV) using Dataframes
- schema for Dataframes
- Dataframes, Datasets, Schema
- SQL concepts and overview
- tables and importing datasets
- data using SQL
- various storage formats : JSON / Parquet / ORC
- Query Engine (AQE) (Spark 3 feature)
- querying structured data using SQL; evaluating data formats
- will do multiple workshops along the way to reinforce the concepts
- of Spark APIs in Scala / Python
- cycle of a Spark application
- APIs
- Spark applications on YARN
- Developing and deploying a Spark application
- practices for Spark programming
- pitfalls to watch out for
- optimizers in Spark3
- Tuning Spark queries
Related Courses
-
Spark V2 for Developers
DCSK-100- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
-
Solr for Developers
SESR-100- Duration: 3 Days
- Delivery Format: Classroom Training, Online Training
- Price: 2,100.00 USD
Self-Paced Training Info
Learn at your own pace with anytime, anywhere training
- Same in-demand topics as instructor-led public and private classes.
- Standalone learning or supplemental reinforcement.
- e-Learning content varies by course and technology.
- View the Self-Paced version of this outline and what is included in the SPVC course.
- Learn more about e-Learning
Course Added To Shopping Cart
bla
bla
bla
bla
bla
bla
Self-Paced Training Terms & Conditions
Exam Terms & Conditions
Sorry, there are no classes that meet your criteria.
Please contact us to schedule a class.

STOP! Before You Leave
Save 0% on this course!
Take advantage of our online-only offer & save 0% on any course !
Promo Code skip0 will be applied to your registration
Purchase Information
title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.