Wednesday, December 28, 2016

Dynamodb restore using datapipeline with newer emr version

12 comments
We faced an interesting issue here.

We had a task to restore dynamodb from backup stored in s3. We took the backup using AWS pipeline. But when we started restoring, which we had been doing for a long time, we found the cluster was not able to provisioned and it was failing on bootstraping.
Digging more into it we found the AWS Dynamodb import template of datapipeline use default subnet under default VPC. Now, we don't have Internet Gateway for this VPC. Since, EMR needs IG to work successfully, this pipeline was failing with "internal error"

We started talking with AWS support. After some discussion, we changed our private subnet. Again the EMR provisioning was failing as the template to restore dynamodb that provided by AWS uses AMI Version 3.9.0 which does not support private subnet.

So we decided to change the AMI Version 3.9 to release label "emr-4.5.0" which we have been using for all our EMR so far. Again we failed with error:
Unable to create resource for @EmrClusterForLoad_2016-12-28T20:33:21 due to: The supplied bootstrap action(s): 'bootstrap-action.7237c1e1-31de-4c02-ae68-c546dd581732' are not supported by release 'emr-4.5.0'. (Service: AmazonElasticMapReduce; Status Code: 400; Error Code: ValidationException; Request ID: e8be350e-cd3c-11e6-8e60-cb10b4c3228c)

That is, the template script provided by AWS does not support emr release label 4.5.0. To overcome the problem, we had to modify EmrCluster bootstrap action in pipeline definition which was:
s3://#{myDDBRegion}.elasticmapreduce/bootstrap-actions/configure-hadoop, --mapred-key-value,mapreduce.map.speculative=false

This was only supported by AMI 3.9.0. For release label emr-4.5.0, it should be added as configuration properties as follows:
--
        {
            "configuration": {
                "ref": "EmrConfigurationId_XXWNE"
            },
            "releaseLabel": "emr-4.5.0",
            "type": "EmrCluster",
            ...
       },
       {
            "property": {
                "ref": "PropertyId_3ghq7"
            },
            "type": "EmrConfiguration",
            "id": "EmrConfigurationId_XXWNE",
            "classification": "mapred-site",
            "name": "DefaultEmrConfiguration1"
        },
        {
            "key": "mapreduce.map.speculative",
            "type": "Property",
            "id": "PropertyId_3ghq7",
            "value": "false",
            "name": "DefaultProperty1"
        },
--

Now, we exported the pipeline definition and added the above configuration. The final pipeline definition was looking like this:

{
  "objects": [
    {
      "property": [
        {
          "ref": "PropertyId_3ghq7"
        }
      ],
      "name": "DefaultEmrConfiguration1",
      "id": "EmrConfigurationId_XXWNE",
      "type": "EmrConfiguration",
      "classification": "mapred-site"
    },
    {
      "output": {
        "ref": "DDBDestinationTable"
      },
      "input": {
        "ref": "S3InputDataNode"
      },
      "maximumRetries": "1",
      "name": "TableLoadActivity",
      "step": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbImport,#{input.directoryPath},#{output.tableName},#{output.writeThroughputPercent}",
      "runsOn": {
        "ref": "EmrClusterForLoad"
      },
      "id": "TableLoadActivity",
      "type": "EmrActivity",
      "resizeClusterBeforeRunning": "true"
    },
    {
      "subnetId": "subnet-xxxxxxx",
      "name": "EmrClusterForLoad",
      "coreInstanceCount": "1",
      "coreInstanceType": "m3.xlarge",
      "releaseLabel": "emr-4.5.0",
      "id": "EmrClusterForLoad",
      "masterInstanceType": "m3.xlarge",
      "region": "#{myDDBRegion}",
      "type": "EmrCluster",
      "terminateAfter": "23 Hours",
      "configuration": {
                "ref": "EmrConfigurationId_XXWNE"
            }
    },
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "PSS-BDP-QA-DataPipelineDefaultResourceRole",
      "pipelineLogUri": "s3://pss-bdp-qa-logfiles/datapipeline-logs/PSS-BDP-SQA-Dynamodb-Import-1/",
      "role": "PSS-BDP-DataPipelineDefaultRole",
      "scheduleType": "ONDEMAND",
      "name": "Default",
      "id": "Default"
    },
    {
      "writeThroughputPercent": "#{myDDBWriteThroughputRatio}",
      "name": "DDBDestinationTable",
      "id": "DDBDestinationTable",
      "type": "DynamoDBDataNode",
      "tableName": "#{myDDBTableName}"
    },
    {
      "directoryPath": "#{myInputS3Loc}",
      "name": "S3InputDataNode",
      "id": "S3InputDataNode",
      "type": "S3DataNode"
    },
    {
        "key": "mapreduce.map.speculative",
        "type": "Property",
        "id": "PropertyId_3ghq7",
        "value": "false",
        "name": "DefaultProperty1"
    }
  ],
  "parameters": [
    {
      "description": "Input S3 folder",
      "id": "myInputS3Loc",
      "type": "AWS::S3::ObjectKey"
    },
    {
      "description": "Target DynamoDB table name",
      "id": "myDDBTableName",
      "type": "String"
    },
    {
      "default": "0.25",
      "watermark": "Enter value between 0.1-1.0",
      "description": "DynamoDB write throughput ratio",
      "id": "myDDBWriteThroughputRatio",
      "type": "Double"
    },
    {
      "default": "us-east-1",
      "watermark": "us-east-1",
      "description": "Region of the DynamoDB table",
      "id": "myDDBRegion",
      "type": "String"
    }
  ],
  "values": {
    "myDDBRegion": "us-east-1",
    "myDDBTableName": "TABLE_TEST",
    "myDDBWriteThroughputRatio": "1",
    "myInputS3Loc": "s3://my-dynamobackup/TABLE_TEST_201609/2016-12-22-22-55-57"
  }
}

12 Responses so far

  1. piyagupta says:

    Bangalore Escorts provides escort call girls by the escort agency in Bangalore. We have selected the best high profile call girls in Bangalore. Visit us www.piyagupta.com/
    South Bangalore Escorts
    Marathahalli Escorts
    Electronic City Escorts
    Hebbal Escorts
    Mg road Escorts
    Ulsoor Escorts
    Ub city Escorts
    Nandi hills Escorts
    Malleswaram Escorts

  2. Sexy escorts in Chennai with the best reviews. Gorgeous young busty escorts Chennai incall/outcall, our escort girls provide a 1st class experience for gentlemen. www.chennai-escort.com/
    Anna Nagar Escorts
    Chetput Escorts
    Guindy Escorts
    KK Nagar Escorts
    Saidapet Escorts
    t nagar Escorts
    Villivakkam Escorts
    Adyar Escorts
    Ashok Nagar Escorts

  3. Shoppa.in says:


    If you searching for high-class independent escort girls in Hyderabad I and Hyderabad to full enjoy with Hyderabad call girls top class independent escort girls in Hyderabad

    Call Girls in Hyderabad
    Call Girl in Chandigarh

  4. You can book call girls in Guwahati or escorts in Guwahati for incall facility or outcall Facility
    as and when required.
    Escort Services in Guwahati
    Guwahati Escort Service

  5. Escort girls in Delhi make independent Brahma; Delhi call girls are always ready to serve you for interested people. Call girls in Delhi have been placed in many categories. Surely we are available according to you, you can come on our website and contact us. Delhi Escort Service

Leave a Reply

Labels