Migrating from On-Premises Teradata to AWS Redshift Serverless

As data volumes grow exponentially, traditional data warehousing solutions often struggle to meet the increasing demands for scalability, performance, and advanced analytics. 

Shifting to Amazon Redshift provides organizations with the potential for superior price-performance, enhanced data processing, faster query speeds, and seamless integration with technologies like machine learning (ML) and artificial intelligence (AI). 

As organizations face increasing pressure to modernize their data infrastructure and harness the power of advanced analytics, migrating from traditional on-premises systems to cloud-based solutions has become a critical step. One key decision is transitioning from legacy platforms like Teradata to modern cloud data warehouses such as Amazon Redshift. This guide explores the motivations behind this shift and the benefits it offers. 

The migration from Teradata to Redshift enables organizations to overcome the limitations of traditional systems by improving scalability, performance, and cost-efficiency. Using AWS tools like the Schema Conversion Tool (SCT), we aim to seamlessly convert the schema, code, and data to ensure compatibility with Redshift while maintaining data accuracy and minimizing disruption to business operations. 

Why migrate from Teradata to Amazon Redshift?

Cost-Efficiency: Redshift eliminates the high operational costs associated with Teradata. While Teradata can also be used in the cloud, Redshift is more cost-effective because it is a native AWS service, offering flexible pricing and better integration with other AWS services to optimize cost for transitive workloads. 

Performance and Scalability: Redshift improved performance and scalability cater to growing data volumes and user loads effectively. 

Vendor Independence: Migration liberates organizations from vendor lock-in, allowing strategic decision-making based on their unique needs. 

As part of your modern data architecture transition strategy, the migration goal of a new Amazon Redshift based platform is to use the scalability, performance, cost-optimization, and additional lake house capabilities of Amazon Redshift, resulting in improving the existing data consumption experience.

Key Differences between Teradata and Redshift 

Feature  Teradata  Redshift 
Architecture  MPP (Massively Parallel Processing), shared-nothing (on-prem or cloud)  Cloud-native MPP (Massively Parallel Processing) (AWS) 
Scalability  Scales well but requires more planning  Elastic cloud scalability 
Storage  Row-based (traditionally)  Columnar storage 
Performance  High-performance on-prem  Optimized for cloud, with Spectrum 
Cost  Higher TCO (especially on-prem)  Generally lower with pay-as-you-go 
Administration  More hands-on  Fully managed by AWS 
Cloud Ecosystem  Can be deployed in the cloud  Deep AWS integration 

 Assessment and Planning stage: 

The main achievement of this migration is the data is for archival process, allowing users to access data with minimal usage during business hours. 

Since the project involves an archival process and pertains to ETL (Extract, Transform, Load), we plan to conduct the migration in two phases: Static Schema and Online Schema. 

This migration does not involve a Change Data Capture (CDC) method, and the AWS DMS service is not supported. Instead, the data was manually migrated using the AWS Schema Conversion Tool (SCT). 

ETL jobs are executed on both Static and Online schemas. Static schemas refer to tables that do not undergo changes, while Online schemas include tables that experience transactional changes. 

Utilize AWS SCT for schema conversion during the migration process.  

Consider using AWS S3 for efficient staging and ETL workflows.  

Develop robust data validation and performance testing to be done while comparing with the source and target database. 

Cutover of the production is separated in two phases like migrating all the Static schemas(tables, views and stored procedures)

The architecture diagram below, which outlines the flow for planning the migration from on-premises Teradata to AWS Redshift.

 

Pre-migration stage: 

Analyze the Source Environment: Assess the data volume of multi-terabyte, along with the schema complexity and workload dependencies. Identify key tables, views, stored procedures, and functions to be migrated. 

Evaluate Infrastructure Needs: Establish performance benchmarks and estimate the compute and storage requirements for AWS Redshift. Ensure that the network capacity can support smooth data transfers without bottlenecks. 

Optimize Data and Schema: Archive unused or obsolete data to reduce the migration load. Review and simplify the Teradata schema to avoid unnecessary complexity during the migration. 

Tool for Migration: Use AWS SCT for schema conversion, planning for potential manual adjustments. Consider AWS S3 for staging data and managing ETL workflows. 

Create a Migration Roadmap: Plan for comprehensive testing and validation to ensure data integrity and optimal performance post-migration. 

Migration Phase: 

The migration phase is where the actual transfer of data and schema takes place. During this stage, leveraging the right tools and strategies is crucial to ensure that the migration happens efficiently, with minimal downtime and no data loss. We migrate the data of the two phase – Static and Online schema   

Schema Conversion with AWS SCT: Use AWS SCT to convert the Teradata schema to Redshift. Manually adjust any Teradata-specific features that AWS SCT cannot automatically convert.  

For large tables we used Virtual partioning option to migrate the data using date column filter. 

Validate the converted schema to ensure accuracy. 

Data Migration: Use SCT custom ETL process to load data into Redshift. Consider staging large datasets in AWS S3 before loading into Redshift for efficient processing. 

Data Validation: Conduct validation checks to ensure data integrity and accuracy after migration. Perform data quality audits to verify that all data has been migrated correctly and completely. 

Validation Phase:

Validate (Compare row count of every table) the data in the Redshift database against the source Teradata database to ensure consistency.

Cut-over phase:

Plan a maintenance window to perform the final data migration and switch over to Amazon Redshift as the production database.

Update application connection strings and configurations to point to the new Redshift cluster.

Optimization Strategies for Teradata to Redshift Migration:

Leverage Columnar Storage and Compression: Redshift uses columnar storage, which is highly efficient for analytical queries. Take advantage of Redshift’s automatic data compression to reduce storage costs and improve query performance by minimizing I/O.

Optimize Data Distribution and Sort Keys: To enhance performance, strategically define distribution keys based on frequently joined tables and apply sort keys to optimize query patterns. This helps in reducing data movement during query execution and speeds up the process.

Utilize Redshift Spectrum for Data Lakes: For large, infrequently accessed datasets, consider using Redshift Spectrum to query data directly from Amazon S3 without having to load it into Redshift. This optimizes both storage costs and performance for big data queries.

Concurrency Scaling: To manage variable workloads efficiently, enable Redshift’s concurrency scaling feature. It automatically adds more capacity during peak times, ensuring high performance without over-provisioning resources during regular loads.

Workload Management (WLM) Tuning: Use Redshift’s workload management (WLM) to prioritize queries and optimize system resources. By defining different queues for various workloads, you can allocate system resources effectively, improving query throughput and response times.

Automatic Table Maintenance: Enable automatic table vacuuming and analyze functions to optimize query performance. Regularly vacuum and analyze your tables to remove outdated rows and update query statistics.

Monitor with Amazon CloudWatch and Redshift Advisor: Use CloudWatch for real-time monitoring of your cluster’s performance and Redshift Advisor to get automated insights and recommendations on optimizing your database setup, schema design, and workload execution.

Conclusion:

Migrating from on-premises Teradata to AWS Redshift serverless provides a transformative opportunity for organizations to modernize their data infrastructure, reduce costs, and unlock new analytical capabilities. By leveraging Redshift’s cloud-native architecture, advanced performance features, and seamless integration with AWS services, organizations can achieve superior scalability, faster insights, and more efficient data management. As you embark on this migration journey, careful planning, optimization, and continuous monitoring will ensure a successful transition that maximizes the benefits of a cloud-first data strategy.

Contact 1CloudHub today for expert guidance on automating your infrastructure, reducing costs, and enhancing performance. Let’s elevate your cloud strategy!

Written by

Sujay V

Lead Database Administrator

Rakesh Kumar K

Associate Database Administrator

Sharing is caring!

In Blog
Subscribe to our Newsletter1CloudHub