Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.
Machine Learning With Spark
AWS Training Pass
Take advantage of flexible training options with the AWS Training Pass and get Authorized AWS Training for a full year.
OverviewThis Machine Learning with Spark course is designed to teach Machine Learning at Scale with the popular Apache Spark framework. This course is taught using Spark & Python.
For each machine learning concept, we first discuss the foundations, its applicability, and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.
Please note that this course does not cover the in-depth coverage of Math / Stats is behind Machine Learning.
- Learn popular machine learning algorithms, their applicability, and limitations
- Practice the application of these methods in the Spark machine learning environment
- Learn practical use cases and limitations of algorithms
- Data Scientists and Software Engineers
- Working knowledge of Apache Spark.
- If students are new to Apache Spark, we can offer one day of ‘Introduction to Spark’ training
- Programming background
- Familiarity with Python would be a plus, but not required
- No machine learning knowledge is assumed
- Machine Learning landscape
- Machine Learning applications
- Understanding ML algorithms & models (supervised and unsupervised)
- Spark ML Overview
- Introduction to Jupyter notebooks
- Lab: Working with Jupyter + Python + Spark
- Lab: Spark ML utilities
- Statistics Primer
- Covariance, Correlation, Covariance Matrix
- Errors, Residuals
- Overfitting / Underfitting
- Cross-validation, bootstrapping
- Confusion Matrix
- ROC curve, Area Under Curve (AUC)
- Lab: Basic stats
- Preparing data for ML
- Extracting features, enhancing data
- Data cleanup
- Visualizing Data
- Lab: data cleanup
- Lab: visualizing data
- Simple Linear Regression
- Multiple Linear Regression
- Running LR
- Evaluating LR model performance
- Use case: House price estimates
- Understanding Logistic Regression
- Calculating Logistic Regression
- Evaluating model performance
- Use case: credit card application, college admissions
- SVM concepts and theory
- SVM with kernel
- Use case: Customer churn data
- Theory behind trees
- Classification and Regression Trees (CART)
- Random Forest concepts
- Use case: predicting loan defaults, estimating election contributions
- Use case: spam filtering
- Theory behind K-Means
- Running K-Means algorithm
- Estimating the performance
- Use case: grouping cars data, grouping shopping data
- Understanding PCA concepts
- PCA applications
- Running a PCA algorithm
- Evaluating results
- Recommender systems overview
- Collaborative Filtering concepts
- Use case: movie recommendations, music recommendations
- Best practices for scaling and optimizing Apache Spark
- Memory and processing optimization in Spark and how to take advantage of them
- Effective transformations
- Beyond JVM
- Testing and validation
- Machine Learning Performance
Self-Paced Training Info
Learn at your own pace with anytime, anywhere training
Course Added To Shopping Cart
Self-Paced Training Terms & Conditions
Sorry, there are no classes that meet your criteria.Please contact us to schedule a class.
STOP! Before You Leave
Save 0% on this course!
Take advantage of our online-only offer & save 0% on any course !
Promo Code skip0 will be applied to your registration
To view the cart, you can click "View Cart" on the right side of the heading on each page