AWS Lambda(Python)でDynamoDB テーブルを日次で削除/作成(オートスケーリング付き)

2018/12/05 2018/12/08

この記事はAWS #2 Advent Calendar 2018に参加した記事です。

「Selenium, Headless ChromeとAWS Lambdaで夜な夜なスクレイピング」にも書きましたが、上記のようなアーキテクチャで、Alexaスキルの開発を進めていまして、元となる情報をwebとあるwebサイトから収集しています。
日時点での最新情報をDynamoDBに書き込んでいます。
洗い替えでかまわないので、テーブル名に日付け文字列を付加して、毎日作り直しています。
作り直しが完了すれば、スクレイピングを始めるという処理順番をStepFunctionsで設定しています。

DynamoDBテーブルを作るだけなら調べることなくさっさと終わったのですが、キャパシティユニットのオートスケーリングを有効にするところで、少し調べたので書き残します。

StepFunctions

一応StepFunctionsのステートマシンのJSONです。
順番に処理しているだけなのでシンプルです。

{
  "StartAt": "table_delete",
  "States": {
    "table_delete": {
      "Type": "Task",
      "Next": "table_create",
      "Resource": "arn:aws:lambda:ap-northeast-1:123456789012:function:table_delete"
    },
    "table_create": {
      "Type": "Task",
      "Next": "1_course",
      "Resource": "arn:aws:lambda:ap-northeast-1:123456789012:function:table_create"
    },
    "1_course": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-1:123456789012:function:1_course",
      "End": true
    }
  }
}

{

"StartAt": "table_delete",

"States": {

"table_delete": {

"Type": "Task",

"Next": "table_create",

"Resource": "arn:aws:lambda:ap-northeast-1:123456789012:function:table_delete"

"table_create": {

"Type": "Task",

"Next": "1_course",

"Resource": "arn:aws:lambda:ap-northeast-1:123456789012:function:table_create"

"1_course": {

"Type": "Task",

"Resource": "arn:aws:lambda:ap-northeast-1:123456789012:function:1_course",

"End": true

}

では、Lambdaのコードを見ていきますが、削除の前に作成から見ます。
作成したものを翌々日に削除するので、何を作成しているかから見たほうがわかりやすいからです。

テーブル作成(オートスケーリング付き)

現在、マネジメントコンソールでDynamoDBテーブルを作成すると、オートスケーリングがデフォルトで有効になります。
ですが、LambdaのSDK boto3ではそこまではやってくれません。

このLambdaは環境変数で、タイムゾーンをJSTにしています。
具体的にはLambdaの環境変数キーに　TZ , 値に Asia/Toky としています。

そして、StepFunctionsはCloudWatch Eventsで23時にターゲットとして実行されています。
UTCでは9時間の時差があるので、Cron式で 0 14 * * ? としています。

それを踏まえた上で見ていきます。

まず最初に作成するテーブル名を決めます。
明日の日付が入るテーブル名にします。

table_name = 'course_detail{tomorrow}'.format(
    tomorrow=(datetime.now() + timedelta(days=1)).strftime('%Y%m%d')
)

table_name = 'course_detail{tomorrow}'.format(

tomorrow=(datetime.now() + timedelta(days=1)).strftime('%Y%m%d')

)

そしてDynamoDBテーブルを作成します。

dynamodb = boto3.client('dynamodb')
response = dynamodb.create_table(
    TableName=table_name,
    AttributeDefinitions=[
        {
            'AttributeName': 'course_code',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'row_no',
            'AttributeType': 'N'
        },
        {
            'AttributeName': 'venue',
            'AttributeType': 'S'
        },
    ],
    KeySchema=[
        {
            'AttributeName': 'course_code',
            'KeyType': 'HASH'
        },
        {
            'AttributeName': 'row_no',
            'KeyType': 'RANGE'
        }
    ],
    LocalSecondaryIndexes=[
        {
            'IndexName': 'venue_index',
            'KeySchema': [
                {
                    'AttributeName': 'course_code',
                    'KeyType': 'HASH'
                },
                {
                    'AttributeName': 'venue',
                    'KeyType': 'RANGE'
                }
            ],
            'Projection': {
                'ProjectionType': 'INCLUDE',
                'NonKeyAttributes': [
                    'date'
                ]
            }
        }
    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 1,
        'WriteCapacityUnits': 100
    }
)

waiter = dynamodb.get_waiter('table_exists')
waiter.wait(
　　　　TableName=table_name
)

dynamodb = boto3.client('dynamodb')

response = dynamodb.create_table(

TableName=table_name,

AttributeDefinitions=[

{

'AttributeName': 'course_code',

'AttributeType': 'S'

{

'AttributeName': 'row_no',

'AttributeType': 'N'

{

'AttributeName': 'venue',

'AttributeType': 'S'

KeySchema=[

{

'AttributeName': 'course_code',

'KeyType': 'HASH'

{

'AttributeName': 'row_no',

'KeyType': 'RANGE'

}

LocalSecondaryIndexes=[

{

'IndexName': 'venue_index',

'KeySchema': [

{

'AttributeName': 'course_code',

'KeyType': 'HASH'

{

'AttributeName': 'venue',

'KeyType': 'RANGE'

}

'Projection': {

'ProjectionType': 'INCLUDE',

'NonKeyAttributes': [

'date'

]

}

ProvisionedThroughput={

'ReadCapacityUnits': 1,

'WriteCapacityUnits': 100

}

)

waiter = dynamodb.get_waiter('table_exists')

waiter.wait(

　　　　TableName=table_name

)

ローカルセカンダリインデックスを持つDynamoDBテーブルを作成しました。
waiterを使ってテーブルが作成完了するまで待ちます。

次にこのテーブルにRCU,WCUそれぞれのオートスケーリングを設定していきます。

autoscaling_client = boto3.client('application-autoscaling')
response = autoscaling_client.register_scalable_target(
    ServiceNamespace='dynamodb',
    ResourceId='table/{table_name}'.format(
        table_name=table_name
    ),
    ScalableDimension='dynamodb:table:ReadCapacityUnits',
    MinCapacity=1,
    MaxCapacity=100,
    RoleARN=ROLE_ARN
)

response = autoscaling_client.register_scalable_target(
    ServiceNamespace='dynamodb',
    ResourceId='table/{table_name}'.format(
        table_name=table_name
    ),
    ScalableDimension='dynamodb:table:WriteCapacityUnits',
    MinCapacity=1,
    MaxCapacity=100,
    RoleARN=ROLE_ARN
)

autoscaling_client = boto3.client('application-autoscaling')

response = autoscaling_client.register_scalable_target(

ServiceNamespace='dynamodb',

ResourceId='table/{table_name}'.format(

table_name=table_name

ScalableDimension='dynamodb:table:ReadCapacityUnits',

MinCapacity=1,

MaxCapacity=100,

RoleARN=ROLE_ARN

)

response = autoscaling_client.register_scalable_target(

ServiceNamespace='dynamodb',

ResourceId='table/{table_name}'.format(

table_name=table_name

ScalableDimension='dynamodb:table:WriteCapacityUnits',

MinCapacity=1,

MaxCapacity=100,

RoleARN=ROLE_ARN

)

オートスケーリングを設定するには、application-autoscaling クライアントが必要でした。
dynamodb:table:ReadCapacityUnits, dynamodb:table:WriteCapacityUnitsでそれぞれ設定します。

これで完了かと思うとそうではないのですね。
この段階ではスケールする枠だけが決まって、いつスケールするかの設定が入っていません。
そうです。スケーリングポリシーもCloudWatchアラームも設定されていません。

次にその設定をしていきます。

percent_of_use_to_aim_for = 70.0
scale_out_cooldown_in_seconds = 60
scale_in_cooldown_in_seconds = 60

autoscaling_client.put_scaling_policy(
    ServiceNamespace='dynamodb',
    ResourceId='table/{table_name}'.format(
        table_name=table_name
    ),
    PolicyType='TargetTrackingScaling',
    PolicyName='{table_name}ReadCapacity'.format(
        table_name=table_name
    ),
    ScalableDimension='dynamodb:table:ReadCapacityUnits',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': percent_of_use_to_aim_for,
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
        },
        'ScaleOutCooldown': scale_out_cooldown_in_seconds,
        'ScaleInCooldown': scale_in_cooldown_in_seconds
    }
)

autoscaling_client.put_scaling_policy(
    ServiceNamespace='dynamodb',
    ResourceId='table/{table_name}'.format(
        table_name=table_name
    ),
    PolicyType='TargetTrackingScaling',
    PolicyName='{table_name}WriteCapacity'.format(
        table_name=table_name
    ),
    ScalableDimension='dynamodb:table:WriteCapacityUnits',
    TargetTrackingScalingPolicyConfiguration={
    'TargetValue': percent_of_use_to_aim_for,
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'
        },
        'ScaleOutCooldown': scale_out_cooldown_in_seconds,
        'ScaleInCooldown': scale_in_cooldown_in_seconds
    }
)

percent_of_use_to_aim_for = 70.0

scale_out_cooldown_in_seconds = 60

scale_in_cooldown_in_seconds = 60

autoscaling_client.put_scaling_policy(

ServiceNamespace='dynamodb',

ResourceId='table/{table_name}'.format(

table_name=table_name

PolicyType='TargetTrackingScaling',

PolicyName='{table_name}ReadCapacity'.format(

table_name=table_name

ScalableDimension='dynamodb:table:ReadCapacityUnits',

TargetTrackingScalingPolicyConfiguration={

'TargetValue': percent_of_use_to_aim_for,

'PredefinedMetricSpecification': {

'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'

'ScaleOutCooldown': scale_out_cooldown_in_seconds,

'ScaleInCooldown': scale_in_cooldown_in_seconds

}

)

autoscaling_client.put_scaling_policy(

ServiceNamespace='dynamodb',

ResourceId='table/{table_name}'.format(

table_name=table_name

PolicyType='TargetTrackingScaling',

PolicyName='{table_name}WriteCapacity'.format(

table_name=table_name

ScalableDimension='dynamodb:table:WriteCapacityUnits',

TargetTrackingScalingPolicyConfiguration={

'TargetValue': percent_of_use_to_aim_for,

'PredefinedMetricSpecification': {

'PredefinedMetricType': 'DynamoDBWriteCapacityUtilization'

'ScaleOutCooldown': scale_out_cooldown_in_seconds,

'ScaleInCooldown': scale_in_cooldown_in_seconds

}

)

application-autoscaling クライアントのput_scaling_policyを使いました。

これでDynamoDBのオートスケーリングとCloudWatchのアラームも出来ました。

テーブル削除(CloudWatchアラームも削除)

次にテーブルを削除するコードを見ます。
DynamoDBのテーブルだけが削除されてアラームが残っていては余分なコストがかかってしまいます。
ですので、消し忘れのないようにCloudWatchアラームも削除します。

table_name = 'course_detail{yesterday}'.format(
    yesterday=(datetime.now() - timedelta(days=1)).strftime('%Y%m%d')
)

table_name = 'course_detail{yesterday}'.format(

yesterday=(datetime.now() - timedelta(days=1)).strftime('%Y%m%d')

)

テーブルは2日間は保持しておくので、昨日の日付のテーブルを削除しています。

dynamodb = boto3.client('dynamodb')
response = dynamodb.delete_table(
    TableName=table_name
)

dynamodb = boto3.client('dynamodb')

response = dynamodb.delete_table(

TableName=table_name

)

テーブルを削除します。
テーブルに追加したスケーラブルターゲットもあわせて削除されました。
でもこれだけではCloudWatchアラームが残ります。

オートスケーリングポリシーを削除します。

autoscaling_client = boto3.client('application-autoscaling')
response = autoscaling_client.delete_scaling_policy(
    PolicyName='{table_name}ReadCapacity'.format(
        table_name=table_name
    ),
    ServiceNamespace='dynamodb',
    ResourceId='table/{table_name}'.format(
        table_name=table_name
    ),
    ScalableDimension='dynamodb:table:ReadCapacityUnits'
)

response = autoscaling_client.delete_scaling_policy(
    PolicyName='{table_name}WriteCapacity'.format(
        table_name=table_name
    ),
    ServiceNamespace='dynamodb',
    ResourceId='table/{table_name}'.format(
        table_name=table_name
    ),
    ScalableDimension='dynamodb:table:WriteCapacityUnits'
)