You are viewing the preview version of this book
Click here for the full version.

DynamoDB

DynamoDB is the obvious choice for AppSync as they together provide a truly serverless setup. DynamoDB tables can scale up and down easily and AWS also offers per-request pricing (called on-demand capacity). This makes it possible for the API to scale to any load and also to go back to zero.

Also, GraphQL's concept of many individual resolvers usually maps to DynamoDB tables easily. As it is a NoSQL database that does not support complex queries but has no problems handling tons of simple ones, it can work well with AppSync.

In this chapter we'll look into how to write resolvers that fetch and store data in DynamoDB and what are the best practices to adapt the data model for efficient GraphQL queries.

Configuring the data source

When you create a DynamoDB data source, you need to define a table and a role. The formes defines what table to send the requests (when it's missing, such as for transactions, as we'll see later), and the latter gives AppSync permissions to do the operations.

DynamoDB data source configured with a table and an IAM role

With the IAM role setting, you can give fine-grained permissions for each data source. For example, if you don't add write permissions then AppSync won't be able to change data in the table.

One role or multiple?

With a data source per table, you have the choice to use a separate role for each of them, in contrast to using a single role with all permissions AppSync needs.

So, which approach is better?

I found that using separate roles adds too much verbosity and brings very little benefits. Since it brings no security benefits, I usually go with a single role per AppSync API.

AppSync permission model for DynamoDB

Operations

DynamoDB resolvers implement the same operations (GetItem, Query, Scan, DeleteItem, UpdateItem, etc) as you'd find anywhere else and while the structure is different they work the same. This means all usual constraints apply: you can only get an item with the full key, you can query with the partition key, and scanning is expensive. Moreover, you can use indices (local and global) the same way as in other languages.

To find out what structure each operation expects, the best resource is the official documentation. It contains examples for each section.

Note that you can't copy-paste DynamoDB code from other languages as the resolvers require a different structure. For example, this PutItemCommand uses the Javascript v3 SDK:

{
  TableName: COUNTS_TABLE,
  Item: {
    type: {S: "users"},
    count: {N: "0"},
  },
  ConditionExpression: "attribute_not_exists(#pk)",
  ExpressionAttributeNames: {"#pk": "type"},
}

Converted to an AppSync resolver:

{
  "version": "2018-05-29",
  "operation": "PutItem",
  "key": {
    "type": {"S": "users"},
    "count": {"N": "0"}
  },
  "condition": {
    "expression": "attribute_not_exists(id)",
    "expressionNames": {
      "#pk": "type"
    }
  }
}
Expression names and values

If you read the documentation you'll see that the expressionNames and the expressionValues are present in multiple places:

  • "update"
  • "query"
  • "filter"
  • "condition"

While it's not obvious, these collapse into a single value. This means if you have both an update and a condition block with the same expression names, one will overwrite the other. Make sure you use different names in different blocks.

Error handling is a common theme in all DynamoDB operations and it follows the usual AppSync way: when an operation fails, the response mapping template will get the error in the $ctx.error value. By default, the resolver should throw an error when the operation fails:

#if($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end
...
Getting elements

(Official docs)

The GetItem operation retrieves an element defined by its full key. For example, to implement a resolver for the field groupById(id: String!): Group, use:

{
  "version": "2018-05-29",
  "operation": "GetItem",
  "key": {
    "id": {"S": $util.toJson($ctx.args.id)}
  },
  "consistentRead": true
}

The resolver gets the id argument as $ctx.args.id and constructs the input for the data source. The table has only the id defined as a key, so specifying a value for that is enough.

GetItem operation

The $util.toJson structure is important even though things usually work without it. It converts the argument value to a JSON string, making sure that every special character is safely escaped. VTL is string concatenation, so any user input that goes to the result directly opens a way for injection attacks.

Then notice the DynamoDB type system: "id": {"S": "123"}. The data source follows the same structure as other utilities for types, so it's always "<property>": {"type": "value"}, even for numbers: {"N": "15"}.

AppSync provides a few utility functions that convert these types into native JSON types, but I find myself opting out of them. A similar helpes exists for filter expressions too.

Finally, the "consistentRead": true sends a strongly consistent read, meaning the result will always be the most up-to-date version of the item. While it is more expensive, it provides an opt-out of DynamoDB's eventual consistency model.

Item keys

As DynamoDB is a key-value store, you need to specify the full ID of the item.

The result is the a JSON with the properties of the item converted to native JSON types. This means, apart from error handling, if it matches the GraphQL schema then it can be returned as-is.

#if($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end
$util.toJson($ctx.result)
Running queries

(Official docs)

To get a list of items from DynamoDB, you can use a Query. This is when you specify the partition key and get an ordered list of results. As is common in DynamoDB data modeling, you need to plan in advance: if your API needs to send a query that gets elements for a given partition key, a table or an index needs to be in place to support that. We'll cover this aspect a bit more in the Data modeling chapter.

The data source supports queries on indices (local and global) as well as the table itself. In the query expression you need to define the partition key, and optionally a range for the range key.

For example, if users belong to groups and there is a global secondary index (GSI) with the partition key as the group ID, this resolver mapping template returns users belonging to a given group:

{
  "version": "2018-05-29",
  "operation": "Query",
  "index": "groupId",
  "query": {
    "expression": "#groupId = :groupId",
    "expressionNames": {
      "#groupId": "groupId"
    },
    "expressionValues": {
      ":groupId": {"S": $util.toJson($ctx.source.id)}
    }
  }
}
Query operation

The consistentRead is also a valid argument for queries, but only for the table itself or a local secondary index (LSI). This is again a limitation of DynamoDB.

Also, the scanIndexForward defines whether the items are read in increasing or decreasing order. Finally, the limit defines the maximum number of items the query can return.

Limit

Limit sets the maximum number of items, there are no guarantees that the result will return that many elements even if the table has that many items.

The result is an object with an items and a nextToken field. The former contains the result items while the latter is important for pagination, which we'll see in the Pagination chapter. Here, you can return this structure without changing, or you can map the result properties to another names:

#if($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end
{
  "users": $utils.toJson($ctx.result.items),
  "nextToken": $util.toJson($ctx.result.nextToken)
}
Query vs Scan

The other operation that returns a list is the Scan. The difference is that Scan evaluates all items in the table, while Query reads only those with a given partition key. While it's not a problem for small tables, it becomes extremely costly and slow to run scans when the table has many items.

Try to store your data in a way that you don't need to use scans only queries.

Pagination

Pagination is an important topic in DynamoDB as every operation that returns a list of items return pages. This means the AppSync resolver might not get all the results of a query just some, along with a continuation token called nextToken. If that token is null then there are guaranteed that a query returns no more results. If it is non-null then sending it along with the same query might return more items. To get all the elements for a query or a scan, send the same operation over and over again passing the previous token until the nextToken becomes null.

Paginating DynamoDB queries with nextToken

This is called cursor-based pagination as a cursor value (the nextToken) is used to get the next page of items. In the SQL world the so-called offset-based pagination is more prevalent, where you define an OFFSET that skips the first n items. Both have advantages and disadvantages, and the main reason why DynamoDB opted for cursors is that all operations run fast no matter which page you fetch.

DynamoDB pagination has some strange properties that might feel overly restrictive at first but are the direct consequences of one fundamental idea: all operations must be fast. Because of this, the database guarantees only a few things in terms of pagination.

If you use limit, the number of the returned items will never be more than the limit specified. On the other hand, it can have fewer items than the specified limit, even if the table has more results. In an extreme case, it can happen that a query returns zero items but a non-null nextToken and only the next page will start returning data. Moreover, it can happen that client needs to fetch several pages but all of them containg zero elements. The only guarantee here is that if there are more items then the nextToken will be non-null and that eventually it becomes null.

These limitations are inherent to DynamoDB and have nothing to do with AppSync. But as an API developer, it's good to know about them so that consumers of the API can have the right expectations.

Limit and nextToken

The nextToken defines where DynamoDB starts searching for elements. Think of it as a range query that means "start from here" and that it has no concept of previously sent queries.

Because of this, the limit only affects the current operation. If you want to get 10 items and the first query returns 3, set the limit to 7 for the next page.

The AppSync resolver can only send one query, which means it can fetch only a single page. Purely with the DynamoDB data source you can't implement a "fetch all items"-type functionality. While you could use a Lambda function between AppSync and the database that implements this, I don't recommend it. AppSync resolvers are meant to return a small amount of data and should run fast. Writing them in a way that might require an unbounded amount of time (for paging through potentially a large number of items) will reach a threshold eventually.

It is a best practice to expose DynamoDB pagination through the GraphQL schema and not hide it behind some abstraction. This way the clients can decide how to fetch pages and when to stop.

To implement this, whenever the API returns a list based on a DynamoDB query or a scan, add a nextToken argument and an indirection that exposes the result's nextToken.

type User {
  id: ID!
  name: String!
}

type PaginatedUsers {
  users: [User!]!
  nextToken: String
}

type Group {
  id: ID!
  name: String!
  users(count: Int, nextToken: String): PaginatedUsers!
}

The resolver can then use the argument, potentially also adding a count:

{
  "version": "2018-05-29",
  "operation": "Query",
  "index": "groupId",
  "query": {
    "expression": "#groupId = :groupId",
    "expressionNames": {
      "#groupId": "groupId"
    },
    "expressionValues": {
      ":groupId": {"S": $util.toJson($ctx.source.id)}
    }
  },
  "limit": $util.toJson($ctx.args.count),
  "nextToken": $util.toJson($ctx.args.nextToken)
}

Then the response mapping template can extract the items and the token from the data source's response:

#if($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end
{
  "users": $utils.toJson($ctx.result.items),
  "nextToken": $util.toJson($ctx.result.nextToken)
}
DynamoDB pagination in GraphQL

There is more, but you've reached the end of this preview
Read this and all other chapters in full and get lifetime access to:
  • all future updates
  • full web-based access
  • PDF and Epub versions