Data security management with AWS Big Data PaaS

Today with the explosion of data many cloud platform providers provide Big Data as a service. The security aspect of big data is of paramount importance. Are the platform provider's out-of-the-box security controls good enough? This blog explores a scenario where the out-of-the-box platform security is augmented with custom security measures to safeguard customer data.

Data security is one of the key factors in cloud decision for enterprises, especially those that are handling their customer’s data under SLA. Today’s cloud services do have standard security as the default option that is quick and simple to implement. But when it comes to leveraging cloud for Big Data processing, customer’s information security teams are keen to ensure that they have the best custom controlled data protection possible on top of the mechanisms available as default options.

One of our customers who was running their Hadoop data processing on IaaS, to process high-velocity streaming data from IoT devices engaged us for architecting a PaaS based solution.

Data Security requirement

The focus was to migrate and ensure functional validation, along with performance validation. The timeline for this project was 6 weeks. Information Security team requested to ensure all customer data to be AES 256 encrypted and with custom managed key along with Transport encryption for any access beyond VPC. The challenge was to implement the requested data security with the least possible impact to the project schedule/migration.

Our solution approach

Our architecture design included EMR, S3, RDS and Redshift AWS services (PaaS) for data processing & data warehousing, both have support for encryption and ability to support data availability across regions with encryption.

We designed the solution to address these requirements by using AWS KMS – CMK (Custom Master Key), IAM and enabled encryption for the data processed on cloud within a day by not affecting any on-going migration activities.

data-security-AWS

Custom Key with controlled access

AWS KMS provides an efficient and secure way to create and manage encryption keys that encrypt data. It also has integration with services like Redshift, EMR, EC2, S3 and RDS.

However, using Customer Managed Key (CMK) gives more control than using AWS Managed keys.

We can define our own usage policies and access controls for each key (CMK) as requested.

Data Security with AWS Elastic Map Reduce (EMR), S3, RDS & Redshift

  1. Leveraged EMR security configuration to specify settings for encrypting data at rest for both EMRFS data (stored in S3) and local disk encryption(using LUKS) with SSE-KMS using custom master key.
  2. IAM Role for EMR-EC2 access for the SSE-KMS Custom Master Key
  3. RDS to encrypt data at rest with SSE-KMS Custom Key
  4. Redshift to encrypt data at rest with SSE-KMS Custom Master Key
  5. Redshift to IAM Role to have access to SSE-KMS Custom Master Key used for S3 encryption for data load / unload.

Data reliability with security

  1. S3 data replication across the region with an equivalent KMS key at the other region is a regional service.
  2. Redshift for cross-region snapshot replication into another region with an equivalent KMS key at other regions.
  3. RDS for cross-region read replica into another region with an equivalent KMS key at other regions.

What we have achieved would have taken more than a week and caused enough disruptions in terms of rework had we implemented it on-premise or on IaaS. With our design leveraging PaaS, the available data security mechanisms helped us achieve the required security posture for data protection with least or negligible impact to ongoing project and production operations.

Written by:
Geetha Pandiyan (LinkedIn)
Cloud Solution Architect