Cloud

Success in cloud computing migration.

Infrastructure Modernization
Reliable and profitable

Reduce costs optimizing resources by choosing different storage types according your data and costs, and increase resources any time your business need it.

We reduce the Cloud migration impacts so you can get all the benefits of an infrastructure modernization:

Migration from Data Warehouse to Big Query
Migration of workloads and/or data from an on-premises environment to the Cloud
SQL database migration

High Performance Computing
Automation and resources optimization

The cloud manages the infrastructure as a service, reducing costs and routine task operations. At PUE, we bet for the implementation of full stack architectures with CI/CD applied to the processes automation for Infrastructure and Development, as well as container-based microservice architectures that allows resource consumption optimization and automatic scalability of services based on workload demand.

Data Analytics
Data-driven decisions

We design and create Data Lakes connecting data from different sources, which allows to process information, analyze data in real time and visualize the updated results that will provide you valuable information for decisions making.

Machine Learning & AI
Data insights

We use the data to implement ML & AI models with the Google Cloud suite of services (Vertex AI), to help companies optimize their business, get a better understanding of their customers, prevent risks and save time with disruptive and trained models.

Infrastructure Modernization

Reliable and profitable

Reduce costs optimizing resources by choosing different storage types according your data and costs, and increase resources any time your business need it.

We reduce the Cloud migration impacts so you can get all the benefits of an infrastructure modernization:

Migration from Data Warehouse to Big Query
Migration of workloads and/or data from an on-premises environment to the Cloud
SQL database migration

High Performance Computing

Automation and resources optimization

Data Analytics

Data-driven decisions

Machine Learning & AI

Data insights

Infrastructure proposal

Use Case:

Containers on GKE & Hosts on Compute Engine

Data analytics project on Google Cloud microservices infrastructure

Objetives

Evolve a platform from B2C to B2B.
Have two clusters, one productive and one pre-productive with contingency capabilities in a Public Cloud environment based on Google Cloud.
Develop the functionalities of the project.
Solution support to cover the entire Big Data lifecycle.

Google Cloud environment

The solution proposes the use of Google's project infrastructure, which primarily features the components:

Compute Engine
Google Kubernetes Engine
Cloud Storage
Cloud SQL
VPC Network

Instance types

For the creation of two fully interchangeable pre- and pro-differentiated environments, two independent environments identical in infrastructure are proposed to fulfil contingency functions. Each environment consists of the following components:

3 Master Nodes
6 Worker nodes

The recommended distribution of services and types of instances are:

Services	Master Nodes	Worker Nodes
MapReduce YARN Spark Hive	N1-highmem-8	N1-highmem-8
HBase SolR Impala	N1-highmem-8	N1-highmem-8 N1-highmem-16
All CDH services	N1-highmem-8	N1-highmem-8 N1-highmem-16

Google Kubernetes Engine

For client processes, GKE microservices are used with the following features:

2 clusters: pre-production and production.
2 instance templates, one for each cluster.
Types of nodes running in the clusters: n1-standard-1.

Service architecture

Data analysis phase

Storage layer

Type	Component	Description
Main	HDFS	Distributed storage system, core for all other systems and processing
Main	HBAse	Columnar database for low-latency, scalable access to non-random data
Optional	KUDU	Intermediate performance database between HBase and HDFS
Main	Parquet	Columnar storage format on top of HBase and with possibility of compression
Main	Avro	HDFS information container that allows the compression and metadata storage of the available information schema.

Data entry phase

Input Element	Access Elements	Description
Syslog TCP	Syslog TCP Source in Flume and Kafka	Flume will be used for the receipt of information with a TCP listening source
Files in local/ network directories	Spool Source at Flume and Kafka	Flume will be used for pooling new information in parameterised directories
Relational databases	Sqoop	Sqoop handles both full and incremental ingestion of structured SQL sources.
Web Services	HTTP Source in Flume and JSON Handler and Kafka	A connector to web services with JSON content handling will be used

Data processing phase

Requirement	Component	Description
Batch processing	Spark	Highly effective in-memory processing of complete datasets
Real-Time Processing	Spark Streaming	Micro-batch processing within time windows
Filtering, Transformation, Bundling and Enrichment	Spark Streaming, Spark SQL HBase	Spark Streaming in conjunction with Spark SQL (facilitating process coding) for filtering options. HBase for low latency access to databases in order to exploit transformation and enrichment functions
Process Analytics and Machine Learning	MLlib, GraphX	Spark processes that use the Machine Learning libraries
Text indexing, aggregation and time series	SolR	SolR allows indexing of documents for subsequent agile search even in visual environments (integration with Hue for exploitation)
Near Real-Time Decision Making	Spark	Design of a rule engine for low latency micro-batch decision making and performance
Data Compression	Avro and Parquet	Enable reading and writing alongside native processing of compressed data

Data exploitation phase

Requirement	Module	Description
Visual Environment	HUE	Hue is the unified environment for the exploitation of data, it allows to connect visually with all the components of the cluster and to carry out the necessary processes in a visual and intuitive environment
SQL-Like Queries	Hive/Impala	Hive is the standard query engine. Impala is included as a low latency engine to improve query response times.
Third Party Integration	ODBC/ WebServices/ Sqoop	The connection can be made via JDBC/ODBC connections for reading data with Hive, or via WebServices such as HBase and dumping data into external databases via Sqoop

Share your challenges, we will help you with a solution
Let’s talk?