This is a Serverless application that scans a given DynamoDB table and inserts every item into a Kinesis Stream. You can then process the Kinesis stream, allowing you to perform an operation on all existing items in a DynamoDB table.
It was inspired by a tweet from Eric Hammond:
We really want to run some code against every item in a DynamoDB table.
— Eric Hammond (@esh) February 5, 2019
Surely there's a sample project somewhere that scans a DynamoDB table, feeds records into a Kinesis Data Stream, which triggers an AWS Lambda function?
We can scale DynamoDB and Kinesis manually. https://t.co/ZyAiLfLpWh
This project uses the Serverless Framework to deploy a Lambda function and associated AWS resources.
To use it, follow these steps:
-
Install the Framework and create your service:
# Make sure you have the Serverless Framework installed $ npm install -g serverless $ sls create --template-url https://github.com/alexdebrie/serverless-dynamodb-scanner --path serverless-dynamodb-scanner $ cd serverless-dynamodb-scanner
-
Update the configuration in
serverless.yml
.Add the ARN of the DynamoDB table you want to scan and the ARN of the Kinesis stream where you want the config added:
# serverless.yml custom: dynamodbTableArn: 'arn:aws:dynamodb:us-east-1:123456789012:table/my_table' kinesisStreamArn: 'arn:aws:kinesis:us-east-1:123456789012:stream/my-stream' ...
-
Deploy your service:
$ sls deploy
-
When you're ready, kick off your scan by invoking the function:
$ sls invoke -f scanner
The basic workflow is as follows:
A diagram that's missing a few steps ¯\_(ツ)_/¯
-
Inside the Lambda function, check AWS SSM for a
LastEvaluatedKey
parameter that would be sent with our Scan call.If no parameter exist in SSM, we're just starting the scan.
-
Make a
Scan
call to our DynamoDB table. -
Insert the items returned from our Scan into our Kinesis Stream via a
PutRecords
call. -
If the
Scan
call did not return aLastEvaluatedKey
, our Scan is done! We can exit the function. -
If the
Scan
did return aLastEvaluatedKey
, store the value in SSM. -
Do a time check -- if our function has less than 15 seconds of execution time left, we'll invoke another instance of our function and exit the loop for this one. Our next function will pick up where our scan left off by using the
LastEvaluatedKey
in SSM.