Parameter Store at Edge

Parameter Store at Edge

Continuing on the Serverless GraphQL at Edge journey, one of the optimizations to do is having a configuration variable using Systems Manager (SSM) Parameter Store.  One can only create a Parameter Store in a specific region but working with Lambda@Edge means these Parameter Stores may not be in the same region where the lambda is active.  Retrieving the Parameter Store in another region will cause additional latency.

That is why a replication mechanism is needed to make use of the Parameter Store within the region where the Lambda@Edge is.

The Different Attempts

There are several attempts that were done to finally come up with a good solution for the SSM Parameter Store replication.  Below will explain the advantages and disadvantages of these attempts.

EventBridge

The first attempt was to use EventBridge.  Whenever there is a creation, update, or deletion of a Parameter Store in a specified region where it is set up, it will replicate them to other regions.  This is one of the advantages of EventBridge as it works with events where one can set the source of the event from the SSM Parameter Store.  Also, setting up an event pattern in the Event Rule is possible where it will apply the rule when such pattern is met.  But the problem is, how will it know which region to replicate the Parameter Store?  That is where it will end up setting up rules to all regions in which it defeats the purpose of replication on-the-fly.

Viewer Request

To be able to know which region to replicate, one can use another Lambda@Edge and have it as the Viewer Request handler.  Its responsibility is to create the necessary SSM Parameter Store within the region when it does not exist.  The advantage of this approach is that the logic is separated from the Origin Request handler and the latter will only focus on the core functionality.  Unfortunately, it adds up to unnecessary overhead that results in additional latency especially with cold start.  Below is the result of the observation:

Viewer RequestOrigin RequestLatency
(user receives the response)
Cold Start
(Parameter Store not yet in the edge location)
~1.7s~1.7s~5s
Warmed Up~500ms~500ms~1.1s

Here’s a sample of Lambda@Edge that works with serverless-offline-edge-lambda: ssm-creation-edge-viewer

Based on this result especially during Cold Start, it seems that Cloudfront introduces some overhead when there are several handlers for different events (viewer request and origin request).  Also, these handlers are executed synchronously.

Because of this, it is best to keep the logic within the Origin Request handler and below shows the observation result:

Origin RequestLatency
(user receives the response)
Init
Cold Start
(Parameter Store not yet in the edge location)
~2.6s~3.7s~750ms
Cold Start
(Parameter Store is in the edge location)
~2s~3s~700ms
Warmed Up~500ms~550msNA

Comparing the two observations, there is a big difference in terms of latency.  There is an overhead eliminated when we only have the Origin Request handler to manage the Parameter Store retrieval from the source and creation within the edge location.  Also during warmed up, the latency is mostly from Origin Request processing time.  Although there is not much gain when we have the Parameter Store within the region vs accessing it from the source region during cold start, but having a few milliseconds improvement is already good and it might be useful later.

Parameter Store Cleanup

One thing to think about is not to pollute the edge regions with unused Parameter Store as these are not managed in the Infrastructure Code (CDK).  The SSM Parameter Store has this expiration setting feature.  Unfortunately, this feature is only available in the Advanced Tier and this means that it is not free.  Sticking to the Standard Tier, the only way is to introduce a scheduled task or job for Parameter Store cleanup.

There are 2 options where one can create a scheduled job: CloudWatch Rule and EventBridge Rule.

To give further details, EventBridge is a serverless event bus that is built on top of the CloudWatch Events API and it has additional features and one of them is the ability to create custom event buses.  So, in terms of creating a scheduled job, there is no difference between CloudWatch Rule and EventBridge Rule. 

Here’s a sample of EventBridge Rule declaration in infrastructure code (CDK): EventBridge Scheduler declaration

The sample solution will be the EventBridge rule, where the rule is scheduled to execute, for example, every 12 hours.  Cleaning it up at every X time of the day is to make sure that these Parameter Stores are deleted whenever they are not used or needed anymore and also allows the edge region to keep an up-to-date value of the Parameter Store.

The Final Solution

Based on the experiments done, the following are the components needed to be able to replicate the SSM Parameter Stores in the edge region:

  • A logic in the Lambda@Edge that will:
    • check if the parameter store already exist in the edge region
    • if it does not exist, retrieve the parameter store from the source region
    • save the parameter store in the edge region’s SSM 
    • send an entry to the SQS for cleanup later
  • A scheduled job using EventBridge that will trigger a Lambda function

The Lambda function will retrieve all the messages from SQS queue and delete all the SSM Parameter stores on the edge region

Below is how the overall infrastructure looks like:

SSM Parameter Store Replication at Edge

Here is the version of the codebase: graphql-as-bff-v1.1.0

Leave a Reply

Your email address will not be published. Required fields are marked *