Modern Data Warehousing

Course Details

Today’s businesses need distributed decision making where every stakeholder, from a business analyst to the store manager to a frontline operator needs to analyze data and take decisions. Those decisions involve analyzing vast amounts of data that originate from wide variety of data sources such as transactions, CRM, marketing, mobile and web.

Amazon Redshift is a fully managed Cloud Data Warehousing that seamlessly scales with high performance and throughput to analyze vast amounts of data so that you can build powerful reports for your business intelligence or derive operational analytics from your business events.

In this workshop, you will learn how you can use Amazon Redshift to analyze your data to derive business intelligence. You will learn the distributed architecture of Amazon Redshift, how to bring data to the data warehouse, best practices for performance and managed capabilities that reduces your operational overheads. In addition, you will also learn how Redshift can seamlessly extend to your data lake allowing you to analyze all the data available in your data lake.

Course Outline

DAY 1

Module 1: Overview of Amazon Redshift

  • Course Introduction
  • Introduction to Data Warehousing
  • Amazon Redshift architecture, its components and features

Module 2: Table Design Concepts

  • Deep Dive into Distribution Styles and Sort Keys
  • Understanding Data Compression
  • How to choose distribution styles and sort keys for different workloads
  • Loading data into the Cluster

Lab 1: Launching a Redshift cluster, loading data and running queries

Module 3: Managing your Redshift cluster

  • Choosing Redshift node types
  • Pause, Resume and Elastic Resizing your cluster
  • Backups and Disaster Recovery for your cluster

DAY 2

Module 4: Managing Workloads on your Cluster

  • How to manage different workloads in your cluster?
  • Automatic and Manual Workload Management
  • Short Query Acceleration and Assigning queries to queues
  • Concurrency Scaling

Module 5: Extend to your data lake using Redshift Spectrum

  • Overview of Redshift Spectrum and its architecture
  • Best practices for Redshift Spectrum performance

Lab 2 : Use Redshift Spectrum to query Data Lake

Module 6: Maintaining your cluster

  • Monitoring query performance and analyzing workload performance
  • Redshift Advisor
  • System tables to analyze cluster performance

Module 7: Security

  • Data Protection
  • Managing access to cluster
  • Infrastructure & Network Security

Quiz and Course Wrap Up

  • Quiz
  • Summary of 2 days
  • Further learning resources

Course Duration

2 Days

Key Takeaways

  • Understand Amazon Redshift distributed architecture for scale and performance
  • How Amazon Redshift seamlessly integrates with Data Lakes
  • Processing structured and semi structured data
  • Best practices for performance and cost

Key Services that you will learn

  • Amazon Redshift
  • Amazon Redshift Spectrum
  • AWS Glue (Data Catalog)

Prerequisites

  • Beginner’s knowledge of the AWS Platform and its key services like EC2, S3
  • Basic hands-on experience with the AWS management console will be a plus
  • Prior experience in Data Warehousing concepts and any Data Warehousing product will be an advantage

Intended Audience

  • Data / Database Architects, developers and engineers
  • Data Analysts / Scientists
  • Solution Architects
  • Data platform owners

Sharing is caring!

Subscribe to our Newsletter1CloudHub