This can be done with AWS Glue and IAM roles. Let’s assume we are transferring from account A to account B.
Define a role in A to add B as trusted entity to read from DynamoDB.
Create the role for AWS account
, Another AWS account, and use the account B id. Result is similar to this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<ACCOUNT B ID 12 numbers>:root"
},
"Action": "sts:AssumeRole",
"Condition": {}
}
]
}
- Add
AmazonDynamoDBFullAccess
as the permission. - Name this role
DynamoDBCrossAccessRole
This role says that account B can read from the account A’s DynamoDB.
Create a policy in B that allows Glue
to assume this role in from A
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::<ACCOUNT A ID>:role/DynamoDBCrossAccessRole"
}
}
Create a role for the Glue job
Create role > AWS Service > search and select Glue.
- Add
AmazonDynamoDBFullAccess
policy - Add the policy created in step 2
This role will be used by the Glue job. It allows read from account A and write to account B.
Create the Glue the job and run it
Use this script as the Glue job:
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
glue_context= GlueContext(SparkContext.getOrCreate())
job = Job(glue_context)
job.init(args["JOB_NAME"], args)
dyf = glue_context.create_dynamic_frame_from_options(
connection_type="dynamodb",
connection_options={
"dynamodb.region": "us-east-1",
"dynamodb.input.tableName": "<FROM_TABLE_NAME>",
"dynamodb.sts.roleArn": "arn:aws:iam::<FROM_ACCOUNT_ID>:role/DynamoDBCrossAccessRole"
}
)
dyf.show()
glue_context.write_dynamic_frame_from_options (
frame = dyf,
connection_type = "dynamodb",
connection_options = { "dynamodb.region": "us-east-1",
"dynamodb.output.tableName": "<TO_TABLE_NAME>",
"dynamodb.throughput.write.percent": "1.0" }
)
job.commit()
Run the job
Create an empty table with same schema as the table in account A.
Run the job and wait.
This job takes a long time for bigger tables. I think it is limited by the scan speed of 25 rows per read. Because of this write speed never goes above 20 WCU.
70mb table with 2M 35byte entries took 24 hours for me.
Be sure to set both input and output table Capacity mode to On-demand
Faster way would be to export to S3 and import with Glue job, which took only 20 minutes for the same table.
Warning!
Glue is a paid service, and for a 24h running job there are considerable costs. For me, a 24h running job cost 150$.
More info
AWS tutorial about the same topic: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-dynamo-db-cross-account.html. It includes some other use cases like cross region transfers.