You are viewing the preview version of this book
Click here for the full version.


(Example code)

RDS is an AWS-managed SQL database that supports various engines and access modes. It eases the operational burden of running, monitoring, updating, and backing up databases. While it's complex to configure and has quite some edge cases, it's the preferred way to run an SQL-based database in the AWS cloud.

SQL allows complex queries, which is preferable when you don't have defined access patterns and you don't expect extreme load. It allows you to structure data in a logical way without worrying about query performance, an aspect that makes working with DynamoDB a challenge.

By default, RDS supports the traditional, connection-based way to send queries. This was designed with a few fixed servers sending queries to the database in mind, which is not the case in a serverless architecture. And since AppSync is serverless, it can not easily use this model.

Additionally, RDS supports the Data API. This is a connection-less way to run SQL statements that fits into the serverless model. This works by encoding all the information in all the requests sent to the API so there is no need to maintain a persistent connection to the database.

AppSync supports only the Data API, and in extension, only supports engines that support the Data API. At the time of writing, this included Aurora Serverless v1, but not Aurora Serverless v2.

Connection-based vs Data API
RDS data source vs RDS Proxy

As usual, you can go back to a Lambda function any time using the Lambda data source and send requests to RDS in code. Lambda can use the same Data API as AppSync but in addition it can take advantage of the RDS Proxy.

RDS Proxy manages the connection and is not tied to the lifecycle of the Lambda instances, in effect bridging the worlds of serverless with non-serverless. Since it does not rely on the Data API, you can use any engine you'd like and it offers a slightly better performance.

The catch is the price tag. Make sure you understand how much the RDS Procy adds to the database costs if you plan to go down that road.

RDS configuration

To use the RDS data source, enable the Data API for the cluster:

Enable Data API for the cluster

Every RDS cluster has a Master password that grants access to it. In the case of the Data API, this password has to be stored in Secrets Manager and the identity who is sending the request has to has access to retrieve it. This adds some fixed costs ($0.40 / month) as well as some extra complexity.

Secrets Manager does not impose any structure on the stored data, but to be usable with the Data API it needs to be in a specific JSON-based format. The exact format is in the documentation, but a sufficient one for mysql is like this:

  "engine": "mysql",
  "host": "<cluster endpoint>",
  "username": "<username>",
  "password": "<password>"
RDS password stored in Secrets Manager

While Secrets Manager enables seamless rotation of the password, in this case it only adds complexity. But nevertheless, this is a requirement from AWS.

RDS data source architecture

Data source configuration

As usual, the data source needs an IAM Role that AppSync can use. This is defined in the Role's trust policy:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": ""
      "Action": "sts:AssumeRole"

Then the role needs permissions to do two things. First, to read the secret value from Secrets Manager, and second, to send queries to the RDS cluster data endpoint. In JSON:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": "rds-data:ExecuteStatement",
      "Resource": "arn:aws:rds:..."
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:..."
Why both?

These two permissions seem redundant. The rds-data:ExecuteStatement uses IAM credentials to grant access to the Data API, but it also needs a password. On the other hand, secretsmanager:GetSecretValue allows fetching that password that will be sent to the Data API along with the AWS signature.

Why not just use one authentication type? Well, this is how AWS designed it.

RDS data source permissions structure

When the Role is ready and has the necessary permissions, the only thing left is to add the data source. It needs the ARN of the cluster, the ARN of the secret storing the password, the IAM Role, and finally the database name.

Relational database data source configuration

Reading data

Now that we have an RDS cluster compatible with the RDS data source set up, let's implement a simple resolver that reads a an object from the database!

We'll use two tables: User and UserGroup. The query we'll implement gets a the id of the UserGroup and returns the full object.

The tables in SQL:

  id varchar(255) PRIMARY KEY,
  name TEXT,
  groupId varchar(255)

  id varchar(255) PRIMARY KEY,
  name TEXT

  FOREIGN KEY (groupId) REFERENCES UserGroup(id);

Then the GraphQL schema:

type Group {
  id: ID!
  name: String!
  users: [User!]!

type Query {
  groupById(id: String!): Group # We'll implement this

To send an SQL query to the data source, define the SQL statement in the statements array and the variable in the variableMap object:

  "version": "2018-05-29",
  "statements": [
    "SELECT * FROM UserGroup WHERE id = :ID"
  "variableMap": {
    ":ID": $util.toJson($"'", "''").replace("\", "\\"))
SQL injection vulnerability

The variableMap is vulnerable to SQL injection. See the next chapter for details.

Notice that the statements is an array, which hints about some possibilities. Indeed, it is possible to define multiple statements there, but the limit is rather low: maximum 2 statements. The only use-case it was designed is to send an INSERT INTO followed by a SELECT as we'll see in the next chapter. Furthermore, I couldn't find any information whether the statements run in a transaction or not so it's safer to assume they are not.

The placeholders in the statements use the :<name> format and the variableMap needs to have a key for every variable.

Input sanitization

This structure closely resembles prepared statements which is a best practice to avoid SQL injection attacks. Injection-type attacks are a huge problem: they are currently #3 on OWASP's top 10 list (down from #1 in the last installment). And injection vulnerabilities in SQL is particularly damaging as they may give an attacker read and possibly write access to the data.

The root cause of SQL injection is that the SQL code and the data are sent together in a string (example from Wikipedia):

SELECT * FROM users WHERE name = 'user1'

Here, the code is the SELECT ... part, and the data is the name of the user (user1).

When the data comes from a source that is not trusted and is not properly sanitized, it allows to "break out of the data" into the code. For example, if the username is a';DROP TABLE users; SELECT * FROM userinfo WHERE 't' = 't, then the final SQL query becomes:

SELECT * FROM users WHERE name = 'a';DROP TABLE users;SELECT * FROM userinfo WHERE 't' = 't';

There is more, but you've reached the end of this preview
Read this and all other chapters in full and get lifetime access to:
  • all future updates
  • full web-based access
  • PDF and Epub versions