Cloudera

Cloudera Administrator Training for Apache Hadoop - Virtual English

28 hours
2695 €
Live Virtual Class
Live Virtual Class

23 Nov 2020 - 26 Nov 2020   |  

28 h.    2695 €

Cloudera Administrator Training for Apache Hadoop - Virtual English

28 h | 2695 € | Live Virtual Class | English
from Monday to Thursday (09:00h - 17:00h)
Calendario de sesiones

Description

TASTE OF TRAINING

This course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.

PUE is Cloudera's official Training Partner, authorized by this multinational to provide official training in Cloudera technologies.

PUE is also accredited and recognized to carry out consulting and mentoring services in the implementation of Cloudera solutions in the business field with the added value in the practical and business approach to knowledge that is translated in its official courses.

Audience and prerequisites

This official course is aimed at System Administrators and all personnel who are responsible for managing Apache Hadoop Clusters in Production or Development environments.

This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.

Objectives

At the end of the training, the participant will know:

  • Cloudera Manager features that make managing your clusters easier, such as aggregated logging, configuration management, resource management, reports, alerts, and service management
  • Configuring and deploying production-scale clusters that provide key Hadoop-related services, including YARN, HDFS, Impala, Hive, Spark, Kudu, and Kafka
  • Determining the correct hardware and infrastructure for your cluster
  • Proper cluster configuration and deployment to integrate with the data center
  • Ingesting, storing, and accessing data in HDFS, Kudu, and cloud object stores such as Amazon S3
  • How to load file-based and streaming data into the cluster using Kafka and Flume
  • Configuring automatic resource management to ensure service-level agreements are met for multiple users of a cluster
  • Best practices for preparing, tuning, and maintaining a production cluster
  • Troubleshooting, diagnosing, and solving cluster issues

Certification included

This is the official course recommended by Cloudera for preparing their associated official certification exam valued at 295.00€, which is included in the price of the course for all members of the PUE Alumni program.

The successful completion of this exam is needed for obtaining Cloudera Certified Administrator for Apache Hadoop. This certification has been designed to verify that candidates have acquired the concepts and skills required in the following areas:

  • Install: Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.
  • Configure: Perform basic and advanced configuration needed to effectively administer a Hadoop cluster.
  • Manage: Maintain and modify the cluster to support day-to-day operations in the enterprise.
  • Secure: Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices.
  • Test: Benchmark the cluster operational metrics, test system configuration for operation and efficiency.
  • Troubleshoot: Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios.

Topics

Introduction

The Cloudera Enterprise Data Hub

  • Cloudera Enterprise Data Hub
  • CDH Overview
  • Cloudera Manager Overview
  • Hadoop Administrator Responsibilities

Installing Cloudera Manager and CDH

  • Cluster Installation Overview
  • Cloudera Manager Installation
  • CDH Installation
  • CDH Cluster Services

Configuring a Cloudera Cluster

  • Overview
  • Configuration Settings
  • Modifying Service Configurations
  • Configuration Files
  • Managing Role Instances
  • Adding New Services
  • Adding and Removing Hosts

Hadoop Distributed File System

  • Overview
  • HDFS Topology and Roles
  • Edit Logs and Checkpointing
  • HDFS Performance and Fault Tolerance
  • HDFS and Hadoop Security Overview
  • Web User Interfaces for HDFS
  • Using the HDFS Command Line Interface
  • Other Command Line Utilities

HDFS Data Ingest

  • Data Ingest Overview
  • File Formats
  • Ingesting Data using File Transfer or REST Interfaces
  • Importing Data from Relational Databases with Apache Sqoop
  • Ingesting Data From External Sources with Apache Flume
  • Best Practices for Importing Data

Hive and Impala

  • Apache Hive
  • Apache Impala

YARN and MapReduce

  • YARN Overview
  • Running Applications on YARN
  • Viewing YARN Applications
  • YARN Application Logs
  • MapReduce Applications
  • YARN Memory and CPU Settings

Apache Spark

  • Spark Overview
  • Spark Applications
  • How Spark Applications Run on YARN
  • Monitoring Spark Applications

Planning Your Cluster

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Virtualization Options
  • Cloud Deployment Options
  • Configuring Nodes

Advanced Cluster Configuration

  • Configuring Service Ports
  • Tuning HDFS and MapReduce
  • Enabling HDFS High Availability

Managing Resources

  • Configuring cgroups with Static Service Pools
  • The Fair Scheduler
  • Configuring Dynamic Resource Pools
  • Impala Query Scheduling

Cluster Maintenance

  • Checking HDFS Status
  • Copying Data Between Clusters
  • Rebalancing Data in HDFS
  • HDFS Directory Snapshots
  • Upgrading a Cluster

Monitoring Clusters

  • Cloudera Manager Monitoring Features
  • Health Tests
  • Events and Alerts
  • Charts and Reports
  • Monitoring Recommendations

Cluster Troubleshooting

  • Overview
  • Troubleshooting Tools
  • Misconfiguration Examples
  • Essential Points

Installing and Managing Hue

  • Overview
  • Managing and Configuring Hue
  • Hue Authentication and Authorization

Security

  • Hadoop Security Concepts
  • Hadoop Authentication Using Kerberos
  • Hadoop Authorization
  • Hadoop Encryption
  • Securing a Hadoop Cluster

Apache Kudu

  • Kudu Overview
  • Architecture
  • Installation and Configuration
  • Monitoring and Management Tools

Apache Kafka

  • What Is Apache Kafka?
  • Apache Kafka Overview
  • Apache Kafka Cluster Architecture
  • Apache Kafka Command Line Tools
  • Using Kafka with Flume

Object Storage in the Cloud

  • Object Storage
  • Connecting Hadoop to Object Storage

Open calls

23 Nov 2020 - 26 Nov 2020   |  

28 h.    2695 €

Cloudera Administrator Training for Apache Hadoop - Virtual English

28 h | 2695 € | Live Virtual Class | English
from Monday to Thursday (09:00h - 17:00h)
Calendario de sesiones