AWS Glue Write to S3 KMS Encrypted Bucket
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. AWS Key Management Service (KMS) provides secure, centralized management of the encryption keys used to protect your data. Writing data from AWS Glue to an S3 bucket encrypted with KMS is a common requirement in data processing pipelines. This setup ensures that your data at rest in S3 is encrypted using keys that you control, adding an extra layer of security to your data.
Table of Contents#
- Core Concepts
- AWS Glue
- Amazon S3
- AWS KMS
- S3 KMS Encryption
- Typical Usage Scenarios
- Common Practice
- Prerequisites
- Setting up an S3 KMS Encrypted Bucket
- Configuring AWS Glue to Write to the Encrypted Bucket
- Best Practices
- Conclusion
- FAQ
- References
Article#
Core Concepts#
AWS Glue#
AWS Glue is a serverless ETL service. It automatically discovers your data, stores metadata in a Data Catalog, and generates the code needed to transform your data. You can use AWS Glue to read data from various sources, perform transformations, and write the processed data to different destinations, including S3.
Amazon S3#
Amazon S3 is a highly scalable object storage service. It allows you to store and retrieve any amount of data at any time from anywhere on the web. S3 buckets are used to organize and store objects, which can be files, images, videos, etc.
AWS KMS#
AWS Key Management Service (KMS) is a managed service that makes it easy for you to create and control the encryption keys used to encrypt your data. KMS uses hardware security modules (HSMs) to protect the security of your keys. You can use KMS to generate, manage, and rotate encryption keys for use with AWS services like S3.
S3 KMS Encryption#
S3 supports different types of encryption, including Server - Side Encryption with AWS KMS (SSE - KMS). When you enable SSE - KMS for an S3 bucket, S3 uses a KMS customer master key (CMK) to encrypt your data at rest. Each object in the bucket is encrypted with a unique data encryption key (DEK), which is encrypted with the CMK.
Typical Usage Scenarios#
- Data Security: When dealing with sensitive data such as customer personal information, financial data, or healthcare records, encrypting the data at rest in S3 using KMS provides an extra layer of security.
- Compliance Requirements: Many industries have regulatory requirements for data encryption. Writing data from AWS Glue to an S3 KMS encrypted bucket helps meet these compliance standards.
- Data Governance: Organizations may want to have more control over the encryption keys used to protect their data. KMS allows you to manage and audit the use of encryption keys.
Common Practice#
Prerequisites#
- An AWS account with appropriate permissions to create and manage AWS Glue jobs, S3 buckets, and KMS keys.
- Basic knowledge of AWS Glue, S3, and KMS services.
Setting up an S3 KMS Encrypted Bucket#
- Create a KMS Key:
- Navigate to the AWS KMS console.
- Click "Create key".
- Select the key type (e.g., symmetric or asymmetric). For S3 encryption, a symmetric key is usually used.
- Provide a key alias and description for easy identification.
- Configure the key policy to define who can use the key and for what purposes.
- Create an S3 Bucket:
- Go to the Amazon S3 console.
- Click "Create bucket".
- Provide a unique bucket name and choose a region.
- In the "Default encryption" section, select "AWS KMS" and choose the KMS key you created earlier.
Configuring AWS Glue to Write to the Encrypted Bucket#
- Create an AWS Glue Job:
- Navigate to the AWS Glue console.
- Click "Jobs" and then "Add job".
- Provide a job name, IAM role (make sure the role has permissions to access the S3 bucket and the KMS key), and specify the data source and transformation steps.
- Modify the Job Script:
- In the job script (e.g., PySpark), you can use the following code to write data to the S3 bucket:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init("your_job_name", {})
# Assume you have a DataFrame named 'df'
df = spark.createDataFrame([(1, "John"), (2, "Jane")], ["id", "name"])
s3_path = "s3://your-encrypted-bucket/path/to/data"
df.write.parquet(s3_path)
job.commit()- Permissions:
- The IAM role associated with the AWS Glue job should have the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": "arn:aws:s3:::your-encrypted-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "arn:aws:kms:your-region:your-account-id:key/your-kms-key-id"
}
]
}Best Practices#
- Key Rotation: Regularly rotate your KMS keys to enhance security. You can configure automatic key rotation in the AWS KMS console.
- Monitoring and Auditing: Use AWS CloudTrail to monitor and audit the use of KMS keys and S3 bucket operations. This helps detect any unauthorized access or misuse of encryption keys.
- Least Privilege Principle: Apply the principle of least privilege when configuring IAM roles for AWS Glue jobs. Only grant the minimum permissions necessary for the job to access the S3 bucket and use the KMS key.
Conclusion#
Writing data from AWS Glue to an S3 KMS encrypted bucket is a powerful way to enhance the security of your data at rest. By understanding the core concepts, typical usage scenarios, and following the common practices and best practices, software engineers can effectively implement this setup in their data processing pipelines. This not only helps protect sensitive data but also ensures compliance with various regulatory requirements.
FAQ#
Q: Can I use an existing KMS key for S3 bucket encryption? A: Yes, you can use an existing KMS key when creating or modifying an S3 bucket's default encryption settings.
Q: What if the KMS key is disabled or deleted? A: If the KMS key is disabled or deleted, you will not be able to encrypt new objects in the S3 bucket using that key. Existing objects encrypted with the key may still be accessible if you have a copy of the data encryption key, but this requires additional steps.
Q: Do I need to pay extra for using KMS with S3? A: Yes, there are costs associated with using AWS KMS. You will be charged for key creation, key usage, and key rotation operations.
References#
- AWS Glue Documentation: https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html
- Amazon S3 Documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
- AWS KMS Documentation: https://docs.aws.amazon.com/kms/latest/developerguide/overview.html