DynamoDB - quick review of concept and tutorial (hands-on with Python)

Page content

Concept

TL;DR: DB consists of tables, and items in table have sort of “kay-value”. The main contents (attributes, ~values of key-value) is json.

Table

  • In table, we put items.
  • Each items has “attributes”.
  • Every items must have a “partition key (or primary key)” attribute.
  • Each items could have “sort key” attribute.
  • “We can think of the parition as a folder/bucket which contains items. And the sort key orders the items within the folder/bucket.”
  • On the sort key, we can perform a lot of operations, like ==, <, begins with, etc.

Partition keys

  • Partition Key is used for building an unordered hash index.
  • Every tables in DynamoDB has its own key space.
  • Hash (partition) will be the index of the key.
  • When scaling, DynamoDB chups out the key space and distribute to multiple physical devides.

Sort Key

  • Specifing the Partition key and go into the partition, selective.
    • If you want to get a data from DynamoDB, you need to specify a partition key and a sort key.
  • Fact: Partitions are three-way replicated. -> a little bit latency for consistency.

LSI and GSI, Local Secondary Index and Global

LSI

  • Re-sort the data in partitions.
  • Same partition key as the table -> re-sort, not re-group

GSI

  • alternate partition and/or sort key
  • You can imagine that GSI creates clones of the primary table, which has different partition key.
  • Index is across all partition keys

Hands-on

Create local DynamoDB environment

https://hub.docker.com/r/amazon/dynamodb-local/

docker run \
  --rm -d \
  -p 8000:8000 \
  amazon/dynamodb-local

Create a table

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.01.html

MoviesCreateTable.py:

#!/usr/bin/env python3

import boto3

def create_movie_table(dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")

    table = dynamodb.create_table(
        TableName='Movies',
        KeySchema=[
            {
                'AttributeName': 'year',
                'KeyType': 'HASH'  # Partition key
            },
            {
                'AttributeName': 'title',
                'KeyType': 'RANGE'  # Sort key
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'year',
                'AttributeType': 'N'
            },
            {
                'AttributeName': 'title',
                'AttributeType': 'S'
            },

        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 10,
            'WriteCapacityUnits': 10
        }
    )
    return table


if __name__ == '__main__':
    movie_table = create_movie_table()
    print("Table status:", movie_table.table_status)
$ ./MoviesCreateTable.py
Table status: ACTIVE

Load sample data

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.02.html#GettingStarted.Python.02.01

curl https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/samples/moviedata.zip -O

## large json file moviedata.json will be extracted.
unzip moviedata.zip

MoviesLoadData.py:

#!/usr/bin/env python3

from decimal import Decimal
import json
import boto3


def load_movies(movies, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")

    table = dynamodb.Table('Movies')
    for movie in movies:
        year = int(movie['year'])
        title = movie['title']
        print("Adding movie:", year, title)
        table.put_item(Item=movie)


if __name__ == '__main__':
    with open("moviedata.json") as json_file:
        movie_list = json.load(json_file, parse_float=Decimal)
    load_movies(movie_list)
$ ./MoviesLoadData.py
Adding movie: 2013 Rush
Adding movie: 2013 Prisoners
Adding movie: 2013 The Hunger Games: Catching Fire
...
Adding movie: 2010 The Clinic
Adding movie: 2004 Little Black Book

Create an item

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.03.html

MoviesItemOps01.py:

#!/usr/bin/env python3

from pprint import pprint
import boto3


def put_movie(title, year, plot, rating, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")

    table = dynamodb.Table('Movies')
    response = table.put_item(
       Item={
            'year': year,
            'title': title,
            'info': {
                'plot': plot,
                'rating': rating
            }
        }
    )
    return response


if __name__ == '__main__':
    movie_resp = put_movie("The Big New Movie", 2015,
                           "Nothing happens at all.", 0)
    print("Put movie succeeded:")
    pprint(movie_resp, sort_dicts=False)

$ ./MoviesItemOps01.py
Put movie succeeded:
{'ResponseMetadata': {'RequestId': 'e2263a48-99a9-4c09-b18d-0d0ea3de395f',
                      'HTTPStatusCode': 200,
                      'HTTPHeaders': {'date': 'Sat, 27 Feb 2021 22:17:11 GMT',
                                      'content-type': 'application/x-amz-json-1.0',
                                      'x-amz-crc32': '2745614147',
                                      'x-amzn-requestid': 'e2263a48-99a9-4c09-b18d-0d0ea3de395f',
                                      'content-length': '2',
                                      'server': 'Jetty(9.4.18.v20190429)'},
                      'RetryAttempts': 0}}

Read the item

MoviesItemOps02.py:

#!/usr/bin/env python3

from pprint import pprint
import boto3
from botocore.exceptions import ClientError

def get_movie(title, year, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")

    table = dynamodb.Table('Movies')

    try:
        response = table.get_item(Key={'year': year, 'title': title})
    except ClientError as e:
        print(e.response['Error']['Message'])
    else:
        return response['Item']


if __name__ == '__main__':
    movie = get_movie("The Big New Movie", 2015,)
    if movie:
        print("Get movie succeeded:")
        pprint(movie, sort_dicts=False)
$ ./MoviesItemOps02.py
Get movie succeeded:
{'title': 'The Big New Movie',
 'year': Decimal('2015'),
 'info': {'rating': Decimal('0'), 'plot': 'Nothing happens at all.'}}

Security

https://en.wikipedia.org/wiki/Data_at_rest

Data at rest in information technology means data that is housed physically on computer data storage in any digital form (e.g. cloud storage, file hosting services, databases, data warehouses, spreadsheets, archives, tapes, off-site or cloud backups, mobile devices etc.).

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html

All user data stored in Amazon DynamoDB is fully encrypted at rest.

My References