Serverless Data Lakes

Course Details

Data Lakes allow an organization to collect data from variety of data sources and store them all in a central repository. Once data is available in the Data Lake, various stakeholders can run different types of analytics – from dashboards to visualizations to real-time analytics, big data analytics and Machine Learning to gain better insights and drive decisions.

AWS Serverless technologies allows organizations to quickly build out such data lakes without the overheads of managing complex infrastructure. In this workshop, learn how you will use Serverless Technologies to quickly build data pipelines that gets answers from all your data. You will learn proven design patterns and architectures that helps in scaling your data lakes while keeping your costs optimized.

Course Outline

DAY 1

Overview of Data Lakes

  • What are Data Lakes and its benefits
  • Data Lakes Vs Data Warehousing
  • Characteristics of a Data Lake

Module 1: Ingestion

  • Amazon Kinesis: Ingest data from real-time data sources
  • Overview of Kinesis Streams, Firehose and Analytics
  • More ways to bring data to AWS – Database Migration Service, Glue, Storage Gateway, Snowball, AWS CLI and SDKs, Partner Tools

Module 2: Collection and Storage

  • Amazon S3: Central storage for your Data Lake
  • Storage Classes and Lifecycle Policies
  • Best practices: File formats, partitioning, compression

Lab 1: Ingest real-time events data into Firehose and deliver data to S3

Module 3: Building a Data Catalog

  • Why Data Catalog?
  • Overview of AWS Glue Data Catalog
  • Integration with other Data Services
  • Glue Crawlers and Data Sources

Lab: Build Glue data catalog by crawling data

 

 

Quiz and Wrap Up

DAY 2

Module 4: Serverless Data Processing using AWS Glue

  • Job authoring using AWS Glue
  • Data Sources and Targets
  • Built-in Glue Transformations
  • Bring your own scripts and libraries
  • Job scheduling, execution and monitoring

Demo 1: Build and run a Glue job to process ingested data

Module 5: Serverless Analytics using Amazon Athena

  • Overview of Amazon Athena and its features
  • Athena best practices – file formats, data partitioning, compression

Lab 3 : Query data in S3 data lake using Athena

 

Module 6: Modern BI using Amazon Quicksight

  • Overview of Amazon Quicksight features
  • Supported Data sources
  • SPICE: In-memory data store for faster analytics
  • Integrations with AWS data services

Lab 4 : Visualize your data lake using Quicksight and Athena

Module 7: Securing your Data Lake on AWS

  • Shared Responsibility Model
  • S3 security best practices
  • Data Security best practices for Glue, Athena, QuickSight

Quiz and Course Wrap Up

  • Quiz
  • Summary of 3 days
  • Further learning resources

Course Duration

3 Days

Key Takeaways

  • Learn how to build Data Lakes on AWS using Serverless technologies
  • Build end to end data pipelines for ingestion, storage and analysis
  • In-depth understanding on best practices for scaling, performance and cost
  • Understand benefits of centralized data lakes
  • Security and Governance for data lakes

Key Services that you will learn

  • Amazon Kinesis
  • Amazon S3
  • AWS Glue
  • Amazon Athena
  • Amazon Quicksight

Prerequisites

  • Beginner’s knowledge of the AWS Platform and its key services like EC2, S3
  • Basic hands on experience with the AWS management console will be a plus
  • Prior experience in Data Analytics pipelines and data processing workflows are an added advantage

Intended Audience

  • Data Architects, developers and engineers
  • Solution Architects
  • Data platform owners who like to learn the AWS platform

Sharing is caring!

Subscribe to our Newsletter1CloudHub