Analyzing with Cloudera Data Warehouse

28 hours

1840,00 €

Classroom or Live Virtual Class

Description
Addressed to
Objectives
Topics
Open calls
Request Info

Request Info

Calls

Note: The prices indicated below do not include 21% VAT.

10 Jun 2024 13 Jun 2024	10 Jun 2024 - 13 Jun 2024 \| Analyzing with Cloudera Data Warehouse 28 h \| 1840 € \| Live Virtual Class \| Spanish \| from Monday to Thursday (09:00h - 17:00h) \| Calendario de sesiones Enroll Me	Enroll Me
16 Sep 2024 19 Sep 2024	16 Sep 2024 - 19 Sep 2024 \| Analyzing with Cloudera Data Warehouse 28 h \| 1840 € \| Live Virtual Class \| Spanish \| from Monday to Thursday (09:00h - 17:00h) \| Calendario de sesiones Enroll Me	Enroll Me
25 Nov 2024 28 Nov 2024	25 Nov 2024 - 28 Nov 2024 \| Analyzing with Cloudera Data Warehouse 28 h \| 1840 € \| Live Virtual Class \| Spanish \| from Monday to Thursday (09:00h - 17:00h) \| Calendario de sesiones Enroll Me	Enroll Me

Description

This Analyzing with Data Warehouse course will teach you how to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Audience and prerequisites

This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.

Objectives

Students who successfully complete this course will be able to:

Use Apache Hive and Apache Impala to access data through queries
Identify distinctions between Hive and Impala, such as differences in syntax, data formats, and supported features
Write and execute queries that use functions, aggregate functions, and subqueries
Use joins and unions to combine datasets
Create, modify, and delete tables, views, and databases
Load data into tables and store query results
Select file formats and develop partitioning schemes for better performance
Use analytic and windowing functions to gain insight into their data
Store and query complex or nested data structures
Process and analyze semi-structured and unstructured data
Optimize and extend the capabilities of Hive and Impala
Determine whether Hive, Impala, an RDBMS, or a mix of these is the best choice for a given task
Utilize the benefits of CDP Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Public Cloud Data Warehouse

Topics

Foundations for Big Data Analytics

Big Data Analytics Overview
Data Storage: HDFS
Distributed Data Processing: YARN,
MapReduce, and Spark
Data Processing and Analysis: Hive and Impala
Database Integration: Sqoop
Other Data Tools
Exercise Scenario Explanation

Introduction to Hive and Impala

What Is Hive?
What Is Impala?
Why Use Hive and Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Use Cases

Querying with Hive and Impala

Databases and Tables
Basic Hive and Impala Query Language Syntax
Data Types
Using Hue to Execute Queries
Using Beeline (Hive’s Shell)
Using the Impala Shell

Common Operators and Built-in functions

Operators
Scalar Functions
Aggregate Functions

Data Management

Simplifying Queries with Views
Storing Query Results

Data Storage and Performance

Partitioning Tables
Loading Data into Partitioned Tables
When to Use Partitioning
Choosing a File Format
Using Avro and Parquet File Formats

Working with Multiple Datasets

UNION and Joins
Handling NULL Values in Joins
Advanced Joins

Analytic Functions and Windowing

Using Common Analytic Functions
Other Analytic Functions
Sliding Windows

Complex Data

Complex Data with Hive
Complex Data with Impala

Analyzing Text

Using Regular Expressions with Hive and Impala
Processing Text Data with SerDes in Hive
Sentiment Analysis and n-grams in Hive

Apache Hive Optimization

Understanding Query Performance
Cost-Based Optimization and statistics
Bucketing
ORC File Optimizations

Apache Impala Optimization

How Impala Executes Queries
Improving Impala Performance

Extending Hive and Impala

User-Defined Functions
Parameterized Queries

Choosing the Best Tool for the Job

Comparing MapReduce, Hive, Impala and Relational Databases
Which to Choose?

CDP Public Cloud Data Warehouse

Data Warehouse Overview
Auto-Scaling
Managing Virtual Warehouses
Querying Data Using CLI and Third-Party Integration

Appendix: Apache Kudu

What Is Kudu?
Kudu Tables
Using Impala with Kudu

Open calls

Note: The prices indicated below do not include 21% VAT.

10 Jun 2024 13 Jun 2024	10 Jun 2024 - 13 Jun 2024 \| Analyzing with Cloudera Data Warehouse 28 h \| 1840 € \| Live Virtual Class \| Spanish \| from Monday to Thursday (09:00h - 17:00h) \| Calendario de sesiones Enroll Me	Enroll Me
16 Sep 2024 19 Sep 2024	16 Sep 2024 - 19 Sep 2024 \| Analyzing with Cloudera Data Warehouse 28 h \| 1840 € \| Live Virtual Class \| Spanish \| from Monday to Thursday (09:00h - 17:00h) \| Calendario de sesiones Enroll Me	Enroll Me
25 Nov 2024 28 Nov 2024	25 Nov 2024 - 28 Nov 2024 \| Analyzing with Cloudera Data Warehouse 28 h \| 1840 € \| Live Virtual Class \| Spanish \| from Monday to Thursday (09:00h - 17:00h) \| Calendario de sesiones Enroll Me	Enroll Me