Cloudera Machine Learning with Spark ML and MLlib

8 hours
695,00 €
Classroom or Live Virtual Class
Classroom or Live Virtual Class


Cloudera University’s one-day Introduction to Machine Learning with Spark ML and MLlib will teach you the key language concepts to machine learning, Spark MLlib, and Spark ML.

The course includes coverage of collaborative filtering, clustering, classification, algorithms, and data volume.

PUE is Cloudera Strategic Partner, authorized by Cloudera to deliver official training in Cloudera technologies.

PUE is also accredited and recognized to carry out consulting and mentoring services in the implementation of Cloudera solutions in the business field with the added value in the practical and business approach to knowledge that is translated in its official courses.

Audience and prerequisites

This course is intended for software engineers who have basic Linux experience in addition to experience with either the Scala or Python programming languages (code examples and exercises are presented in both languages, so students can choose whichever language they prefer).

Prior knowledge of Apache Spark is required, so it is expected that students have taken the relevant foundational material from our Developer Training for Spark and Hadoop course.


Through instructor-led discussion, as well as hands-on exercises, participants will learn topics including:

  • Data types, statistics support, feature extraction, transforming vectors, using the StandardScaler class
  • An overview of dimensionality reduction
  • Machine learning models, regression, linear regression support, and regularization.
  • Finally, the course discusses machine learning with Spark ML topics such as using data frames, transformers and estimators, an introduction to pipelines, using pipelines to generate models, and regularization.


1. Machine Learning Overview

  • Introduction
  • Collaborative Filtering
  • Clustering
  • Classification
  • Relationship of Algorithms and Data Volume

2. Machine Learning with Spark MLlib

  • Introduction
  • Data Types
  • Basic Statistics
  • Feature Extraction
  • Dimensionality Reduction
  • Models
  • Regression

3. Machine Learning with Spark ML

  • Overview of Spark ML
  • DataFrames
  • Transformers and Estimators
  • Pipelines
  • Decision Tree Classifiers
  • k-Means Clustering

Open calls