Amazon DynamoDB AWS NoSql database is getting lot of popularity these days for it’s capabilities. However before It should be used in production, proper analysis needs to be done. Specially if you have spent most of your time working with relational databases, it’s important to be more than 100% sure before moving towards a NoSQL database.

If you are beginner with AWS, I recommend reading below cloud articles:

Amazon DynamoDB – A Cloud Database?

Amazon DynamoDB is a fully managed NoSQL database service that promises performance in single digit(ms) for any amount of data.

I needed to do benchmark analysis for our NoSQL use case. I thought we can bring up a small Dynamo DB instance to benchmark its performance over conventional Mysql DB for our use case. However I was absolutely wrong.

After signing to AWS console we realized there is no concept of physical/virtual instance in DynamoDB. It’s a database service that spreads the data and traffic for your tables over a sufficient number of servers to handle your throughput and storage requirements.

Amazon DynamoDB Key Features

Amazon DynamoDB can be run locally in development environment. This is great for developers. Developers can do development, debug, write unit tests without spending any penny on the remote service. Download and Running DynamoDB

can be in development environment. This is great for developers. Developers can do development, debug, write unit tests without spending any penny on the remote service.

Amazon DynamoDB supports storing, querying, and updating documents. A row is equivalent to document.

It’s schema-less. Amazon DynamoDB has flexible database schema. The data items in a table need not have same attributes or even the same number of attributes. Multiple data types (strings, numbers, binary data, and sets) add richness to the data model.

Amazon DynamoDB gives you the flexibility to query on any attribute (column) using global and local secondary indexes. Secondary indexes are indexes that contain hash or hash-and-range keys that can be different from the keys in the table on which primary index is based.

Amazon DynamoDB integrates with AWS Lambda to provide Triggers . Using Triggers, you will be able to automatically execute a custom Lambda function when item level changes in a DynamoDB table are detected. Getting Started with AWS Lambda

. Using Triggers, you will be able to automatically execute a custom Lambda function when item level changes in a table are detected.

Amazon Redshift integrates with Amazon DynamoDB with advanced business intelligence capabilities and a powerful SQL-based interface. When you copy data from a DynamoDB table into Amazon Redshift, you can perform complex data analysis queries on that data, including joins with other tables in your Amazon Redshift cluster. You can learn more about Amazon Redshift from below. Amazon Redshift – Working with JSON Data Amazon Redshift User Management Queries

with Amazon with advanced business intelligence capabilities and a powerful SQL-based interface. When you copy data from a table into Amazon Redshift, you can perform complex data analysis queries on that data, including joins with other tables in your Amazon Redshift cluster. You can learn more about Amazon Redshift from below.

Amazon DynamoDB cloud database is integrated with Elasticsearch using the Amazon DynamoDB Logstash plugin. With this integration, you can easily search DynamoDB content such as messages, locations, tags, and keywords. It can be used for use cases like product search for e-commerce website.

Amazon DynamoDB supports cross-region replication that automatically replicates DynamoDB tables across multiple AWS regions.

Limitations in Amazon DynamoDB



You can’t query an item without a where clause having the primary key or using one of the secondary index. Scan can be used in this case. Scan is slow and not recommended as per Amazon DynamoDB docs. Secondary indexes by default do not allow selecting any columns which are not part of the index. To enable this, we need to either project these columns to the index (which duplicated them on disk with index) or have a second query after getting the primary key from the first query. However it’s not of a big concern since 25 GB disk size is free every month. You can define up to 5 local secondary indexes and 5 global secondary indexes per table. This could be a limitation in a complex business intensive table where various types of queries needs to run. As of now, new indexes can not be added after the table has been created. This means for modifying indexes you need to create new table which can be management headache. So chose your indexes wisely. This limitation only applies to Local Secondary Indexes (LSIs) and not to Global Secondary Indexes (GSIs). As of now, existing indexes can not be deleted / modified after the table has been created. Amazon DynamoDB import / export features will be useful if you have to do it. This limitation only applies to Local Secondary Indexes (LSIs) and not to Global Secondary Indexes (GSIs).

Data Creation for Benchmarking

I needed to benchmark Amazon DynamoDB queries with below use case:

Create lists table with static attributes user_id list_id status

Primary index attributes are user_id & list_id. Projection => “All Attributes” Create document having 1+ million items. Use proper hash and range primary index. Add secondary index on list_id, status. There can be dynamic attributes also like V1, V2, V3. e.g. V1, V2 attributes are present for list_id 1. V1, V3 attributes are present for list_id 2. Use Cases to benchmark Batch write in batch of 25 records. One by one write operation. Query list-id, status attributes and fetch all attributes. A list can have upto 0.1 million records. Query list-id, status attributes and fetch only user_id attribute with a list can have 0.1 million records. Scan query operation on list_id. For above benchmarking use case I created 0.67 million items/rows in the table. Table was created in EU Ireland region. For creation of data we used Rails Faker and Fabricate gem to create random values. When the script started it was running very slow. I thought may be DynamoDB is slow. It’s could be happening because of api latency. Each write api call was taking around 200 ms. As per the calculation it would take around 5 hours for the script to complete which was not at all acceptable. While reading the docs more we came to know about the real power of Amazon DynamoDB. Read ThroughPut : Number of item reads per second × 4 KB item size. Write ThroughPut : Number of item writes per second × 1 KB item size. You can anytime increase/decrease read/write throughput. If your application’s read or write requests exceed the provisioned throughput for a table, then those requests might be throttled. It’s important to keep monitor throughput from dashboard and modify till it matches the production requirement. Initially I kept 1 read throughput and 1 write throughput. Script to create data was running very slow. After increasing write throughput to 100, script for creating benchmark data ended under 1 hour. This was great. We ran various queries as per benchmark use case. Benchmark Results Use Case Read Throughput Write Throughput Benchmark Results Insert 1 by 1 100 6 ms Batch Insert(25) 100 1.7 ms Query primary index(fetch all attributes) 100 1 ms Scan on non primary index 100 580 ms Query Secondary Index(fetch all attributes) 100 44 ms Benchmark results were quite positive. We were convinced to use it for our use case. After finalizing Amazon DynamoDB, we researched more about it’s pricing, availability for production usage. DynamoDB Pricing



For 1 Million Reads Per day 1 Million Writes Per day 12 reads/second 12 writes/second 100k read request from streams $0.25/day ($7.50/month) which is reasonable. Availability

The service runs across Amazon’s proven, high-availability data centers. The service replicates data across three facilities in an AWS Region to provide fault tolerance in the event of a server failure or availability zone outage. Amazon DynamoDB does the database management and administration, and you simply store and request your data. Automatic replication and failover provides built-in fault tolerance, high availability and data durability. Database Backup using Snapshot & Streams Below are ways by which you can take daily backup of dynamoDB table. Use AWS Console to manually trigger export process which would internally spawn and use AWS Data PipeLine and AWS EMR. You will be charged for this. Set up custom instances of Data PipeLine and EMR and write cron jobs to take snapshots using them. Using Scheduled Tasks in Lambda we can invoke node.js snippet which would dump the entire table content in a csv file to S3. Streams give us ability to capture changes to items stored in a DynamoDB table. Amazon DynamoDB Redshift Integration for Data Backup

DynamoDB to Redshift.

copy favoritemovies from '<span class="il">dynamodb</span>://ProductCatalog' credentials 'aws_access_key_id=<em><code><access-key-id></code></em>;aws_secret_access_key=<em><code><secret-access-key></code></em>' readratio 50; 1 2 3 4 5 copy favoritemovies from '<span class="il">dynamodb</span>://ProductCatalog' credentials 'aws_access_key_id=<em><code><access-key-id></code></em>;aws_secret_access_key=<em><code><secret-access-key></code></em>' readratio 50 ; Using a simple copy command we can copy over the data fromto Redshift. But it has few limitations: Amazon DynamoDB attributes that do not match a column in the Amazon Redshift table are discarded. This means, every time we uploads a new attribute we would have to come back and alter Redshift table schema. Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon complex DynamoDB BINARY and SET data types are not supported.

I asked question on reddit about the first limitation which could be very obvious use case.

Till now haven’t got any relevant answer. Will update you if I got any answer.