title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.
Certified Data Science Practitioner CDSP (DSP-210) Exam Voucher
Course Description
Overview
The Certified Data Science Practitioner™ (CDSP) is an industry-validated certification which helps professionals differentiate themselves from other job candidates by demonstrating their ability to put data science concepts into practice. Data can reveal insights and inform—by guiding decisions and influencing day-to-day operations. This calls for a robust workforce of professionals who can analyze, understand, manipulate, and present data within an effective and repeatable process framework. This certification validates candidates’ ability to use data science principles to address business issues, use multiple techniques to prepare and analyze data, evaluate datasets to extract valuable insights, and design a machine learning approach. In addition, it will validate skills to design, finalize, present, implement, and monitor a model to address issues regardless of business sector.Objectives
Audience
Prerequisites
-
There are no formal prerequisites to register for and schedule an exam. Successful candidates will possess the knowledge, skills, and abilities as identified in the domain objectives in this blueprint. It is also strongly recommended that candidates possess the following knowledge, skills, and abilities:
- A working level knowledge of programming languages such as Python® and R
- Proficiency with a querying language
- Strong communication skills
- Proficiency with statistics and linear algebra
- Demonstrate responsibility based upon ethical implications when sharing data sources
- Familiarity with data visualization
- Introduction to Programming with Python®
- Advanced Programming Techniques with Python®
- Using Data Science Tools in Python®
- R Programming for Data Science
- DSBIZ™ (Exam DSZ-210)
- DEBIZ™ (Exam DEB-110): Data Ethics for Business Professionals or Certified Ethical Emerging Technologist™ (CEET)
Topics
- Identify project specifications, including objectives (metrics/KPIs) and stakeholder requirements
- Identify mandatory deliverables, optional deliverables
- Determine project timeline
- Identify project limitations (time, technical, resource, data, risks)
- Understand terminology
- Milestone
- POC (Proof of concept)
- MVP (Minimal Viable Product)
- Become aware of data privacy, security, and governance policies
- GDPR
- HIPPA
- California Privacy Act
- Obtain permission/access to stakeholder data
- Ensure appropriate voluntary disclosure and informed consent controls in place
- Identify references relevant to the data science problem
- Optimization problem
- Forecasting problem
- Regression problem
- Classification problem
- Segmentation/Clustering problem
- Identify data sources and type
- Structured/unstructured
- Image
- Text
- Numerical
- Categorical
- Select modeling type
- Regression
- Classification
- Forecasting
- Clustering
- Optimization
- Recommender systems
- Read Data
- Write a query for a SQL database
- Write a query for a NoSQL database
- Read data from/write data to cloud storage solutions
- AWS S3
- Google Storage Buckets
- Azure Data Lake
- Become aware of first-, second-, and third-party data sources
- Understand data collection methods
- Understand data sharing agreements, where applicable
- Explore third-party data availability
- Demographic data
- Bloomberg
- Collect open-source data
- Use APIs to collect data
- Scrape the web
- Generate data assets
- Dummy or test data
- Randomized data
- Anonymized data
- AI-generated synthetic data
- Identify and eliminate irregularities in data (e.g., edge cases, outliers)
- Nulls
- Duplicates
- Corrupt values
- Parse the data
- Check for corrupted data
- Correct the data format
- Deduplicate data
- Apply risk and bias mitigation techniques
- Understand common forms of ML bias
- Sampling bias
- Measurement bias
- Exclusion bias
- Observer bias
- Prejudicial bias
- Confirmation bias
- Bandwagoning
- Identify the sources of bias
- Sources of bias include data collection, data labeling, data transformation, data imputation, data selection, and data training methods
- Use exploratory data analysis to visualize and summarize the data, and detect outliers and anomalies
- Assess data quality by measuring and evaluating the completeness, correctness, consistency, and currency of data
- Use data auditing techniques to track and document the provenance, ownership, and usage of data, and applied data cleaning steps
- Mitigate the impact of bias
- Apply mitigation strategies such as data augmentation, sampling, normalization, encoding, validation
- Evaluate the outcomes of bias
- Use methods such as confusion matrix, ROC curve, AUC score, and fairness metrics
- Monitor and improve the data cleaning process
- Establish or adhere to data governance rules, standards, and policies for data and the data cleaning process
- Join data from different sources
- Make sure a common key exists in all datasets
- Unique identifiers
- Load data
- Load into DB
- Load into dataframe
- Export the cleaned dataset
- Load into visualization tool
- Make an endpoint or API
- Apply word vectorization or word tokenization
- Word2vec
- TF-IDF
- Glove
- Generate latent representations for image data
- Generate summary statistics
- Examine feature types
- Visualize distributions
- Identify outliers
- Find correlations
- Identify target feature(s)
- Identify missing values
- Make decisions about missing values (e.g., imputing method, record removal)
- Normalize, standardize, or scale data
- Apply encoding to categorical data
- One-hot encoding
- Target encoding
- Label encoding or Ordinal encoding
- Dummy encoding
- Effect encoding
- Binary encoding
- Base-N encoding
- Hash encoding
- Split features
- Text manipulation
- Split
- Trim
- Reverse
- Manipulate data
- Split names
- Extract year from title
- Convert dates to useful features
- Apply feature reduction methods
- PCA
- Missing value ratio
- t-SNE
- Low-variance filter
- Random forest
- High-correlation filter
- Backward feature elimination
- SVD
- Forward feature selection
- False discovery rate
- Factor analysis
- Feature importance methods
- Decide proportion of data set to use for training, testing, and (if applicable) validation
- Split data to train, test, and (if applicable) validation sets, mitigating data leakage risk
- Define models to try
- Regression
- Linear regression
- Random forest
- XGBoost
- Classification
- Logistic regression
- Random forest classification
- XGBoost classifier
- naïve Bayes
- Forecasting
- ARIMA
- Clustering
- k-means
- Density-based methods
- Hierarchical clustering
- Train model or pre-train or adapt transformers
- Tune hyper-parameters, if applicable
- Cross-validation
- Grid search
- Gradient decent
- Bayesian optimization
- Define evaluation metric
- Compare model outputs
- Confusion matrix
- Learning curve
- Select best-performing model
- Store model for operational use
- MLflow
- Kubeflow
- Design A/B tests
- Experimental design
- Design use cases
- Test creation
- Statistics
- Define success criteria for test
- Evaluate test results
- Build streamlined pipeline (using dbt, Fivertran, or similar tools)
- Implement confidentiality, integrity, and access control measures
- Put model into production
- AWS SageMaker
- Azure ML
- Docker
- Kubernetes
- Ensure model works operationally
- Monitor pipeline for performance of model over time
- MLflow
- Kubeflow
- Datadog
- Consider enterprise data strategy and data management architecture to facilitate the end-to-end integration of data pipelines and environments
- Data warehouse and ETL process
- Data lake and ETL processes
- Data mesh, micro-services, and APIs
- Data fabric, data virtualization, and low-code automation platforms
- Implement model in a basic web application for demonstration (POC implementation)
- Web frameworks (Flask, Django)
- Basic HTML
- CSS
- Derive insights from findings
- Identify features that drive outcomes (e.g., explainability, interpretability, variable importance plot)
- Show model results
- Generate lift or gain chart
- Ensure transparency and explainability of model
- Use explainable methods (e.g., intrinsic and post hoc)
- Visualization
- Feature importance analysis
- Attention mechanisms
- Avoiding black-box techniques in model design
- Explainable AI (XAI) frameworks and tools
- SHAP
- LIME
- ELI5
- What-If Tool
- AIX360
- Skater
- Et al
- Document the model lifecycle
- ML design and workflow
- Code comments
- Data dictionary
- Model cards
- Impact assessments
- Engage with diverse perspectives
- Stakeholder analysis
- User testing
- Feedback loops
- Participatory design
- Make data more accessible to a wider range of stakeholders
- Make data more understandable and actionable for nontechnical individuals
- Implement self-service data/analytics platforms
- Create a culture of data literacy
- Educate employees on how to use data effectively
- Offer support and guidance on data-related issues
- Promote transparency and collaboration around data
Related Courses
-
Certified Data Science Practitionerâ„¢ (CDSP) (Exam DSP-210)
CNX0020- Duration: 5 Days
- Delivery Format: Classroom Training, Online Training
- Price: 3,500.00 USD
-
DSBIZâ„¢ (Exam DSZ-210)
CNX0018- Duration: 0.5 Day
- Delivery Format: Classroom Training, Online Training
- Price: 420.00 USD
Self-Paced Training Info
Learn at your own pace with anytime, anywhere training
- Same in-demand topics as instructor-led public and private classes.
- Standalone learning or supplemental reinforcement.
- e-Learning content varies by course and technology.
- View the Self-Paced version of this outline and what is included in the SPVC course.
- Learn more about e-Learning
Course Added To Shopping Cart
bla
bla
bla
bla
bla
bla
Self-Paced Training Terms & Conditions
- All cancellations must be made in accordance with the policies of the specific testing center that is administering your certification exam. Additionally, candidates are subject to the testing center’s no-show policy in terms of rescheduling or seeking a refund. Visit your testing centers’ website for more information on cancellations and no-shows.
- Vouchers for CertNexus certification exams are non-refundable, non-transferable, and non-exchangeable.
- All vouchers, including any retakes, expire 18 months from the date of purchase, unless otherwise noted.
- Any candidates who do not pass a CertNexus certification exam on their first attempt are eligible for a second attempt immediately, at no additional cost and with no waiting period before the retake. All CertNexus certification exam vouchers include one free retake.
- Retakes are only valid for the same exam and same exam version that was initially purchased and using the same voucher code. All attempts, including retakes, must occur prior to the voucher expiration date.
- For any attempts after the free retake (i.e. before the third attempt or any subsequent attempt, or after the expiration date), candidates must purchase another voucher.
- While there are no time restrictions on the third attempt or any subsequent attempts thereafter, CertNexus strongly recommends a 30-day preparation period before taking the exam again.
For more information:Visit
Exam Terms & Conditions
Sorry, there are no classes that meet your criteria.
Please contact us to schedule a class.
STOP! Before You Leave
Save 0% on this course!
Take advantage of our online-only offer & save 0% on any course !
Promo Code skip0 will be applied to your registration
Purchase Information
title
Please take a moment to fill out this form. We will get back to you as soon as possible.
All fields marked with an asterisk (*) are mandatory.