Categories
Uncategorized

How to transfer a DynamoDB table from one account to another?

This can be done with AWS Glue and IAM roles. Let’s assume we are transferring from account A to account B.

Define a role in A to add B as trusted entity to read from DynamoDB.

Create the role for AWS account, Another AWS account, and use the account B id. Result is similar to this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<ACCOUNT B ID 12 numbers>:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {}
        }
    ]
}
  • Add AmazonDynamoDBFullAccess as the permission.
  • Name this role DynamoDBCrossAccessRole

This role says that account B can read from the account A’s DynamoDB.

Create a policy in B that allows Glue to assume this role in from A

{
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Allow",
        "Action": "sts:AssumeRole",
        "Resource": "arn:aws:iam::<ACCOUNT A ID>:role/DynamoDBCrossAccessRole"
    }
}

Create a role for the Glue job

Create role > AWS Service > search and select Glue.

  • Add AmazonDynamoDBFullAccess policy
  • Add the policy created in step 2

This role will be used by the Glue job. It allows read from account A and write to account B.

Create the Glue the job and run it

Use this script as the Glue job:

import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
glue_context= GlueContext(SparkContext.getOrCreate())
job = Job(glue_context)
job.init(args["JOB_NAME"], args)

dyf = glue_context.create_dynamic_frame_from_options(
    connection_type="dynamodb",
    connection_options={
    "dynamodb.region": "us-east-1",
    "dynamodb.input.tableName": "<FROM_TABLE_NAME>",
    "dynamodb.sts.roleArn": "arn:aws:iam::<FROM_ACCOUNT_ID>:role/DynamoDBCrossAccessRole"
    }
)
dyf.show()

glue_context.write_dynamic_frame_from_options (  
    frame = dyf,    
    connection_type = "dynamodb",    
    connection_options  = { "dynamodb.region": "us-east-1",  
                            "dynamodb.output.tableName": "<TO_TABLE_NAME>", 
                            "dynamodb.throughput.write.percent": "1.0" }
   )

job.commit()

Run the job

Create an empty table with same schema as the table in account A.

Run the job and wait.

This job takes a long time for bigger tables. I think it is limited by the scan speed of 25 rows per read. Because of this write speed never goes above 20 WCU.

70mb table with 2M 35byte entries took 24 hours for me.

Be sure to set both input and output table Capacity mode to On-demand

Faster way would be to export to S3 and import with Glue job, which took only 20 minutes for the same table.

Warning!

Glue is a paid service, and for a 24h running job there are considerable costs. For me, a 24h running job cost 150$.

More info

AWS tutorial about the same topic: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-dynamo-db-cross-account.html. It includes some other use cases like cross region transfers.