DynamoDB - quick review of concept and tutorial (hands-on with Python)
Concept
TL;DR: DB consists of tables, and items in table have sort of “kay-value”. The main contents (attributes, ~values of key-value) is json.
Table
- In table, we put items.
- Each items has “attributes”.
- Every items must have a “partition key (or primary key)” attribute.
- Each items could have “sort key” attribute.
- “We can think of the parition as a folder/bucket which contains items. And the sort key orders the items within the folder/bucket.”
- On the sort key, we can perform a lot of operations, like ==, <, begins with, etc.
Partition keys
- Partition Key is used for building an unordered hash index.
- Every tables in DynamoDB has its own key space.
- Hash (partition) will be the index of the key.
- When scaling, DynamoDB chups out the key space and distribute to multiple physical devides.
Sort Key
- Specifing the Partition key and go into the partition, selective.
- If you want to get a data from DynamoDB, you need to specify a partition key and a sort key.
- Fact: Partitions are three-way replicated. -> a little bit latency for consistency.
LSI and GSI, Local Secondary Index and Global
LSI
- Re-sort the data in partitions.
- Same partition key as the table -> re-sort, not re-group
GSI
- alternate partition and/or sort key
- You can imagine that GSI creates clones of the primary table, which has different partition key.
- Index is across all partition keys
Hands-on
Create local DynamoDB environment
https://hub.docker.com/r/amazon/dynamodb-local/
docker run \
--rm -d \
-p 8000:8000 \
amazon/dynamodb-local
Create a table
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.01.html
MoviesCreateTable.py:
#!/usr/bin/env python3
import boto3
def create_movie_table(dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")
table = dynamodb.create_table(
TableName='Movies',
KeySchema=[
{
'AttributeName': 'year',
'KeyType': 'HASH' # Partition key
},
{
'AttributeName': 'title',
'KeyType': 'RANGE' # Sort key
}
],
AttributeDefinitions=[
{
'AttributeName': 'year',
'AttributeType': 'N'
},
{
'AttributeName': 'title',
'AttributeType': 'S'
},
],
ProvisionedThroughput={
'ReadCapacityUnits': 10,
'WriteCapacityUnits': 10
}
)
return table
if __name__ == '__main__':
movie_table = create_movie_table()
print("Table status:", movie_table.table_status)
$ ./MoviesCreateTable.py
Table status: ACTIVE
Load sample data
curl https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/samples/moviedata.zip -O
## large json file moviedata.json will be extracted.
unzip moviedata.zip
MoviesLoadData.py:
#!/usr/bin/env python3
from decimal import Decimal
import json
import boto3
def load_movies(movies, dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")
table = dynamodb.Table('Movies')
for movie in movies:
year = int(movie['year'])
title = movie['title']
print("Adding movie:", year, title)
table.put_item(Item=movie)
if __name__ == '__main__':
with open("moviedata.json") as json_file:
movie_list = json.load(json_file, parse_float=Decimal)
load_movies(movie_list)
$ ./MoviesLoadData.py
Adding movie: 2013 Rush
Adding movie: 2013 Prisoners
Adding movie: 2013 The Hunger Games: Catching Fire
...
Adding movie: 2010 The Clinic
Adding movie: 2004 Little Black Book
Create an item
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.03.html
MoviesItemOps01.py:
#!/usr/bin/env python3
from pprint import pprint
import boto3
def put_movie(title, year, plot, rating, dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")
table = dynamodb.Table('Movies')
response = table.put_item(
Item={
'year': year,
'title': title,
'info': {
'plot': plot,
'rating': rating
}
}
)
return response
if __name__ == '__main__':
movie_resp = put_movie("The Big New Movie", 2015,
"Nothing happens at all.", 0)
print("Put movie succeeded:")
pprint(movie_resp, sort_dicts=False)
$ ./MoviesItemOps01.py
Put movie succeeded:
{'ResponseMetadata': {'RequestId': 'e2263a48-99a9-4c09-b18d-0d0ea3de395f',
'HTTPStatusCode': 200,
'HTTPHeaders': {'date': 'Sat, 27 Feb 2021 22:17:11 GMT',
'content-type': 'application/x-amz-json-1.0',
'x-amz-crc32': '2745614147',
'x-amzn-requestid': 'e2263a48-99a9-4c09-b18d-0d0ea3de395f',
'content-length': '2',
'server': 'Jetty(9.4.18.v20190429)'},
'RetryAttempts': 0}}
Read the item
MoviesItemOps02.py:
#!/usr/bin/env python3
from pprint import pprint
import boto3
from botocore.exceptions import ClientError
def get_movie(title, year, dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', endpoint_url="http://localhost:8000")
table = dynamodb.Table('Movies')
try:
response = table.get_item(Key={'year': year, 'title': title})
except ClientError as e:
print(e.response['Error']['Message'])
else:
return response['Item']
if __name__ == '__main__':
movie = get_movie("The Big New Movie", 2015,)
if movie:
print("Get movie succeeded:")
pprint(movie, sort_dicts=False)
$ ./MoviesItemOps02.py
Get movie succeeded:
{'title': 'The Big New Movie',
'year': Decimal('2015'),
'info': {'rating': Decimal('0'), 'plot': 'Nothing happens at all.'}}
Security
https://en.wikipedia.org/wiki/Data_at_rest
Data at rest in information technology means data that is housed physically on computer data storage in any digital form (e.g. cloud storage, file hosting services, databases, data warehouses, spreadsheets, archives, tapes, off-site or cloud backups, mobile devices etc.).
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html
All user data stored in Amazon DynamoDB is fully encrypted at rest.