Wednesday, January 25, 2017

AWS Athena to read Dynamodb backup data

15 comments
Our data scientists wanted to read some data from Dynamodb. In order to do this, we had to increase the Read Capacity of the table to a high value and it was cost consuming. So we architected a system using Athena where data scientist could run their query whenever they want.

We used AWS datapipeline template to backup the dynamodb table. So, we had our data archived into S3 in multiple json formatted files. Here Athena came into picture. Athena is a new tool created by AWS which runs presto underneath. So we created a table in Athena using the Dynamodb backup location and started exploring data:

CREATE EXTERNAL TABLE IF NOT EXISTS default.test ( key string, sortkey string, data string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://bucket-name/DynamodbDBBackup/2016-12-23-20-28-20/'

It created a table in Athena. It's time to run query.

select * from default.test limit 3;
Sample Data:

keysortkeydetails
{"s":"key1"}{"s":"sortkey1"}{"s":"{""keys"":[""abc""],""usage"":32,""temparature"":89,""size"":21}
{"s":"key2"}{"s":"sortkey1"}{"s":"{""keys"":[""abc""],""usage"":32,""temparature"":67,""size"":42}
{"s":"key1"}{"s":"sortkey2"}{"s":"{""keys"":[""abc""],""usage"":21,""temparature"":69,""size"":45}


Now we had to extract these data in such a way that we would get only values for key and sortkey and from details attribute we had to extract only usage.

As I said, Athena is a service for presto, we used presto function to extract data from json. There is a function is presto described at presto functions:
json_extract(jsonjson_path)

json_path expression can use the dot-notation and it takes $ as root path. More details are available at JsonPath

Using the above formula, we wrote our query:

SELECT cast(json_extract(key,'$.s') as varchar) as key ,
cast(json_extract(sortkey,'$.s') as varchar) as sortkey
, json_extract(replace(cast(json_extract(details,'$.s') as varchar),'""','"'),'$.usage') as usage
FROM test
limit 10;​

And that gave us the data as:


keysortkeyusage
key1sortkey132
key2sortkey132
key1sortkey221

It was really fun to use Athena.




15 Responses so far

  1. Unknown says:

    what was the performance difference between the DynamoDB vs Athena + S3?

  2. The actual time and effort taken to create this wonderful article were really great, I really appreciate that...
    Best Online Software Training Institute | AWS Training

  3. Hii… The Post is a Wonderful Opportunity for Beginners of Python Course. Take time To look at our Website.
    aws training in hyderabad

  4. very valuable information available in the blog.For AWS information visit this site.aws online training | aws training in hyderabad | aws online training in hyderabad

  5. Mumbai Escorts provides Independent Mumbai Escorts and stylish high class Mumbai

  6. Unknown says:

    Kolkata Escorts Service To Make Your Sex Life Memorabel

    Our escort and call girls in Kolkata are expertly qualified and known for their
    beauty and supportive characteristics.These girls can provide you with actual escort
    and call girl like sensation or first evening encounter.They can provide you with
    100% actual sweetheart as well sensation and you can have greatest fun.
    kolkata Escorts

  7. zara says:

    The strategy you have posted on this technology helped me to get into the AWS Training in Hyderabad

  8. Unknown says:


    Nice post. It is very useful and informative post.

    CEH Training In Hyderbad

  9. Amazon has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help businesses scale and grow.For more information visit : aws online training

  10. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. aws certification training is one of the best equipping skill in 2019.

  11. Unknown says:

    I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts
    amazon web services training is the crucial to invite the visitors to visit the web page, that's what this web page is providing.

Leave a Reply

Labels