How to lower costs by automatically deleting and recreating HSMs

You can use AWS CloudHSM to help manage your encryption keys on FIPS 140-2 Level 3 validated hardware security modules (HSMs). AWS recommends running a high-availability production architecture with at least two CloudHSM HSMs in different Availability Zones. Although many workloads must be available 24/7, quality assurance or development environments typically do not have this requirement.

In this post, we show you how to automate the deletion and recreation of HSMs when you do not have a requirement for high availability. Using this approach of deleting HSMs and restoring them from backups on a predefined schedule can help lower your monthly CloudHSM costs. For more information on the CloudHSM backup process, see the CloudHSM cluster backup documentation.

Prerequisites

Solution overview

For this solution, you use the following AWS services to automate the process of deleting and restoring HSMs running non-production workloads:

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Here’s how the process works, as shown in Figure 1:

  1. At the scheduled time (we are using 7:30 PM UTC in this example), a CloudWatch Events rule triggers the DeleteHSM Lambda function.
  2. The DeleteHSM Lambda function stores the HSM metadata, such as IP address and Availability Zone, in a DynamoDB table, deletes the HSMs from the cluster, and sends an email notification.
  3. At the scheduled time (we are using 7:30 AM UTC in this example), another CloudWatch Events rule triggers the AddHSM Lambda function.
  4. The AddHSM Lambda function retrieves the HSM metadata in the DynamoDB table, creates the HSMs into the cluster with the same IP address and Availability Zone, and sends an email notification.

Note: In this solution, we use the same IP address when creating a new HSM, so you don’t need to modify the configuration files for the CloudHSM client instance connecting to the HSM.

Deployment steps

  1. To open the CloudFormation template, select the Launch Stack button below.

    Select button to launch stack

  2. Give your stack a name.
  3. Under Parameters, enter values for the following parameters based on your requirements:
    • ClusterId: The ID of the existing CloudHSM cluster that you wish to use.
    • CreateTime: The time when your HSMs should be created in UTC. This parameter must be a valid cron expression. The CreateTime shown in Figure 2 is 7:30 UTC Mon-Fri.
    • DeleteTime: The time when your HSMs should be deleted in UTC. This parameter must be a valid cron expression. The DeleteTime shown in Figure 2 is 19:30 UTC Mon-Fri.
    • EmailAddress: The email address that will be subscribed to the SNS topic for creation and deletion events for your HSMs.
    Figure 2: Specify the parameters in CloudHSM

    Figure 2: Specify the parameters in CloudHSM

  4. On the Specify stack details page, select Next, and then, on the Configure stack options page, select Next.
  5. On the Review page, check the box that says I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create stack, as shown in Figure 3.
    Figure 3: Check the box to acknowledge the conditions

    Figure 3: Check the box to acknowledge the conditions

  6. After you create the stack, the CloudFormation template automatically creates an SNS topic that notifies you when HSMs are deleted or created in your cluster. You must subscribe to this topic so you can receive the alerts. Select Confirm subscription, as shown in Figure 4.
    Figure 4: Select ‘Confirm subscription’ to subscribe to the SNS topic

    Figure 4: Select ‘Confirm subscription’ to subscribe to the SNS topic

AWS resources deployed by the CloudFormation stack

When stack creation is complete, the template will have deployed the following AWS resources:

  1. An AWS Identity and Access Management (IAM) role named ClusterExecutionRole for the Lambda functions to use on each invocation. The IAM role uses the managed policy AWSLambdaBasicExecutionRole and an inline policy called CloudHSMPermissions. The managed policy provides the Lambda execution role permission to write CloudWatch Logs. The inline policy grants the role permission to delete an HSM, create an HSM, publish to an SNS topic, create a DynamoDB table, put items into the table, and retrieve items from the table. To learn more about CloudHSM permissions, see Predefined AWS Managed Policies for AWS CloudHSM. The CloudHSMPermissions inline policy is shown below.

    Note: The resource names in the policy shown below are examples. The template will update them to match resources in your account when the solution is deployed.

    
    { "Version": "2012-10-17", "Statement": [ { "Sid": "DynamoDBPermissions", "Effect": "Allow", "Action": [ "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query" ], "Resource": [ "arn:aws:dynamodb:us-west-1:111122223333:table/DynamoDBTable-blogpost", "arn:aws:dynamodb:us-west-1:111122223333:table/DynamoDBTable-blogpost/*" ] }, { "Sid": "CloudHSMPerClusterPermissions", "Effect": "Allow", "Action": [ "cloudhsm:DeleteHsm", "cloudhsm:CreateHsm" ], "Resource": "arn:aws:cloudhsm:us-west-1:111122223333:cluster/cluster-id" }, { "Sid": "DescribeCreatePermissions", "Effect": "Allow", "Action": [ "cloudhsm:DescribeClusters", "ec2:DeleteNetworkInterface", "ec2:CreateNetworkInterface", "ec2:AuthorizeSecurityGroupIngress", "ec2:AuthorizeSecurityGroupEgress", "ec2:RevokeSecurityGroupEgress", "ec2:CreateSecurityGroup", "ec2:DescribeNetworkInterfaces", "ec2:DescribeSubnets", "ec2:DescribeSecurityGroups" ], "Resource": "*", "Condition": { "StringEquals": { "aws: RequestedRegion": "us-west-1" } } }, { "Sid": "SNSPermission", "Effect": "Allow", "Action": "sns: Publish", "Resource": "arn:aws:sns:us-west-1:111122223333:SNSTopicName" } ]
    }
    

  2. Two CloudWatch Events rules: one rule for creating the HSMs and one rule for deleting the HSMs. These rules are triggered at the times that you specified during creation of the stack.
  3. Two Lambda functions: AddHSM and DeleteHSM. You’ll learn more about what each function does in the following section.
  4. An SNS topic that notifies you when HSMs have been created or deleted.

Detailed walkthrough of the Lambda functions

We’ll walk you through the code in the Lambda functions that get triggered to delete and recreate the HSMs at scheduled intervals.

Delete CloudHSMs Lambda function

At the scheduled time (7:30 PM UTC in this example), the ScheduledRuleDeleteHSM CloudWatch Events event is triggered.

As shown below, the code for the DeleteHSM Lambda function first passes the cluster ID, table name, and region as environment variables based on the parameters you entered when creating the CloudFormation stack. At the scheduled time, the CloudWatch Events rule triggers the Lambda function, which in turn creates a DynamoDB table the first time the DeleteHSM Lambda function is triggered.

The code then makes a Describe API call with the cluster ID provided in the CloudFormation stack to retrieve the HSM details, such as the HSM IP and Availability Zones. These items are saved to the DynamoDB table for later use, and the HSMs in the cluster are deleted. All recipients subscribed to the SNS topic receive an SNS notification showing the details of the HSMs deleted from the cluster.


import boto3, sys, time
from datetime import datetime from botocore.exceptions import ClientError
from os import environ ClusterId = environ.get('ClusterId')
TableName = environ.get('TableName')
TopicARN = environ.get('TopicARN') def lambda_handler(event, context): cloudhsm_client = boto3.client('cloudhsmv2') dynamodb_resource = boto3.resource('dynamodb') try: response = cloudhsm_client.describe_clusters(Filters={'clusterIds':[ClusterId]}) for item in response['Clusters'][0]['Hsms']: if item['State'] == 'ACTIVE' and item['ClusterId'] == ClusterId: table = dynamodb_resource.Table(TableName) table.put_item(Item={ 'ClusterId': item['ClusterId'],'AvailabilityZone': item['AvailabilityZone'], 'IpAddress': item['EniIp'],}) print(item['AvailabilityZone']) print(item['ClusterId']) print(item['EniIp']) print(item['State']) else: print('HSMs in Cluster {} not in ACTIVE State'.format(ClusterId)) except Exception as e: print (e) sys.exit(1) time.sleep(5) response = cloudhsm_client.describe_clusters(Filters={'clusterIds':[ClusterId]}) for item in response['Clusters'][0]['Hsms']: if item['State'] == 'ACTIVE' and item['ClusterId'] == ClusterId: print('Deleting HSMs {0} in Cluster {1}'.format(item['EniIp'], item['ClusterId'])) response = cloudhsm_client.delete_hsm(ClusterId=item['ClusterId'], EniIp=item['EniIp']) try: sns_client = boto3.client('sns') message_subject = '[{timestamp}] Deleting HSMs From Cluster!'.format(timestamp=datetime.now().strftime('%b/%d/%Y %H:%M')) message_body = 'These are the details of the Deleted HSM:\n\nCluster ID: {ClusterId}\n\nHSM IP: {EniIp}\n\nAvailability Zone: {AvailabilityZone}\n\n'.format(**item) print('Sending SNS notification...') sns_response = sns_client.publish(TopicArn=TopicARN, Message=message_body, Subject=message_subject) except Exception as e: print('Exception: %s' % e) else: print('HSMs in Cluster {} not in Active State'.format(ClusterId))

Create CloudHSMs Lambda function

At the scheduled time (7:30 AM UTC in this example), the AddHSM Lambda function is triggered by the CloudWatch Events rule ScheduledRuleAddHSM to add the HSMs back to the cluster. The code shown below retrieves the CloudHSM details, including the CloudHSM IP and Availability Zone, from the DynamoDB table. Next, the CloudHSMs are created with the same IP address into the same Availability Zone. This saves you the effort of having to make configuration changes on the CloudHSM client instances connecting to the HSM because the same IPs are used. The CloudHSM daemon installed on the client instance reconnects automatically to the HSMs immediately as they become active. All recipients subscribed to the SNS topic receive an SNS notification showing the details of the HSMs created.


import boto3
from datetime import datetime
from os import environ
from boto3.dynamodb.conditions import Key ClusterId = environ.get('ClusterId')
TableName = environ.get('TableName')
TopicARN = environ.get('TopicARN') def lambda_handler(event, context): dynamodb_resource = boto3.resource('dynamodb') table = dynamodb_resource.Table(TableName) resp = table.query(KeyConditionExpression=Key('ClusterId').eq(ClusterId)) for item in resp['Items']: try: cloudhsm_client = boto3.client('cloudhsmv2') response = cloudhsm_client.create_hsm(ClusterId=ClusterId,AvailabilityZone=item['AvailabilityZone'],IpAddress=item['IpAddress']) try: sns_client = boto3.client('sns') message_subject = '[{timestamp}] Adding HSMs To Cluster!'.format(timestamp=datetime.now().strftime('%b/%d/%Y %H:%M')) message_body = 'These are the details of the Newly Created HSM:\n\nCluster ID: {ClusterId}\n\nHSM IP: {IpAddress}\n\nAvailability Zone: {AvailabilityZone}\n\n'.format(**item) print('Sending SNS notification...') sns_response = sns_client.publish(TopicArn=TopicARN, Message=message_body, Subject=message_subject) except Exception as e: print('Exception: %s' % e) except: print('Failure Adding HSM to Cluster {}'.format(ClusterId))

Note: If you decide to schedule the stop and start of your client instances as described above, you must ensure that the CloudHSM client daemon automatically starts running when the instance(s) boot up so the connection to your CloudHSM cluster will resume.

Conclusion

In this post, you learned an approach to help lower your monthly CloudHSM costs for environments that don’t need to be running 24/7. You learned how to achieve this cost savings by using scheduled CloudWatch Events rules to trigger Lambda functions that delete and recreate the CloudHSMs in your cluster on a specified schedule without modifying client configuration files.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS CloudHSM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

David Ogunmola

David is a Security Engineer at AWS. He enjoys the culture at Amazon because it aligns with his dedication to lifelong learning. He holds an MS in Cyber Security from the University of Nebraska. Outside of work, he loves watching soccer and experiencing new cultures.

Author

Gabriel Santamaria

Gabriel is a Senior Technical Account Manager at AWS. He holds an MS in Information Technology from George Mason University, as well as the AWS Solutions Architect Professional, DevOps Professional, and Security Specialist certifications. In his free time he enjoys spending time with his family catching up on the latest TV shows and is an avid fan of board games.