Lambda

(Official docs)

The Lambda data source calls the configured Lambda function and returns its result. This is the most versatile data source, as a Lambda function can do any processing, such as getting items from databases, call other functions, and even interact with third-party resources. Because of this, it's a best practice to implement complex functionality with a Lambda data source.

Configuring the data source

To add the data source, you'll need two things: the Lambda function that AppSync will call and a role that gives permission to do that. As usual, the role needs to allow the AppSync service in its trust policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "appsync.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then it needs to allow the lambda:InvokeFunction in its permissions policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Action": [
        "lambda:InvokeFunction",
      ],
      "Resource": [
        "arn:aws:lambda:...:function:..."
      ]
    },
  ]
}

Notice that there are two layers of permissions working here. First, AppSync uses a role to call the Lambda function. But then the Lambda function uses another role to access resources in the account, such as databases or other functions.

AppSync APIDynamoDB TableIAM RoleIAM RoleLambdaService:appsync.amazonaws.comAction:sts:AssumeRoleAction:lambda:InvokeFunctionService:lambda.amazonaws.comAction:sts:AssumeRoleAction:dynamodb:Query
Lambda data source permissions
Note

Use environment variables for the Lambda function to pass configuration to the function, such as the name of DynamoDB tables, S3 bucket names, and other non-changing values.

Event object

In the request mapping template you can define what the Lambda function gets in the event object. For example, this template passes the field arguments and an constant text to the function:

{
  "version": "2018-05-29",
  "operation": "Invoke",
  "payload": {
    "arguments": $util.toJson($ctx.arguments),
    "extra_data": "something else"
  }
}

For a field, such as field(fieldArg: String), the event object will be:

{
  "arguments":{
    "fieldArg":"arg1"
  },
  "extra_data":"something else"
}

This gives a flexible structure: the Lambda function can do a specific thing and the AppSync resolver calls it with the appropriate arguments. For example, you might need to create Cognito users in various parts in the API. For this, you can write one function that gets the email address and other required parameters and create the user. Then the mapping templates can adapt the GraphQL arguments to the Lambda. This is especially useful for mutations.

Batching

The Lambda function runs for every value that can result in several calls for a single query. This usually happens when one of the upper fields in the query returns a list.

For example, when the users are managed by Cognito, the email field requires a call to the service. When this query runs:

query project {
  users {
    email
  }
}

AppSync first resolves the project field, then moves to the users. This returns a list, so the email resolver runs for every single user in the result list. If a project has 100 users, that means there are 100 Lambda calls.

Batching is a performance boost in cases like this one: instead of calling the function for every single item, AppSync groups the requests into batches and sends a list.

See more in the official docs.

Direct Lambda resolver

(Official docs)

Lambda is a special data source as the request and the response mapping templates are optional. If the request template is missing, the whole context object is passed to the Lambda function as the event.

To extract the various parts, it's best practice to destructure the incoming object:

exports.handler = async (event, context) => {
  const {arguments, prev, stash, identity, source, info} = event;

  // ...
};

The various fields:

  • arguments: the arguments passed to the field
  • prev: in case of a pipeline resolver, this is the result of the previous step
  • stash: in a pipeline resolvers, this is the stash
  • identity: contains information about the caller user
  • source: the parent object (if any)
  • info: information about the GraphQL query for this field

For example, a unit (non-pipeline) resolver might get these values:

{
  "arguments": {
    "email": "user@example.com"
  },
  "identity": {
    "accountId": "...",
    "..."
  },
  "source": {
    "projectId": "abc123"
  },
  "info": {
    fieldName: "searchUser",
    parentTypeName: "Project",
    selectionSetGraphQL: "{\n	id\n}",
    selectionSetList: ["id"]
  },
  "prev": null,
  "stash": {}
}
Direct Lambda resolver or template?

Using a mapping template provides greater reusability for the function but direct Lambda resolvers take less mental space.

Result object

Whatever the function returns will be available in the response mapping template at $ctx.result. Here, you can do arbitrary transformations, such as adding or removing fields in objects, or calculate values based on the response.

As with the request, the response mapping template is optional. If it is missing, the resolver returns with whatever the Lambda returned with. This is equivalent to $util.toJson($ctx.result) (apart from error handling).

Errors

If the Lambda function throws an Exception then it will be available in the $ctx.error value. As a best practice, the response template should check if there was an error during resolving the field and rethrow it:

#if($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end

As for a Lambda invocation, when the function throws an Error:

exports.handler = async (event, context) => {
  throw new Error("ErrorMsg");
};

Rethrowing it will provide a response similar to this one:

{
  "data": {
    "item": {
      "field2": null
    }
  },
  "errors": [
    {
      "path": [
        "item",
        "field2"
      ],
      "data": null,
      "errorType": "Lambda:Unhandled",
      "errorInfo": null,
      "locations": [
        {
          "line": 3,
          "column": 5,
          "sourceName": null
        }
      ],
      "message": "ErrorMsg"
    }
  ]
}

Use only one Lambda function?

As we've seen in this chapter, the Lambda data source is extremely versatile. Without a request mapping template, it gets all the information AppSync has about the field. This prompts a question: is it possible to handle all resolvers with a single Lambda function?

This is certainly possible as the info contains which field the Lambda needs to resolve, so it can decide how to handle requests.

AppSync APIDynamoDB TableLambdaData SourceLambdaCognito User Pool
One Lambda function for all fields

There are several advantages to this approach. First, Lambda cold starts are rarer as more requests invoke the same function. This helps keeping the response times to vary a bit less.

Then with a single function there are fewer AppSync resources that you need to deploy. I've found that for a medium-sized schema the resolvers and the data sources can span hundreds of resources and that can slow down deployment and eventually hit CloudFormation's 500 resources limit.

Finally, using a Lambda function to back the whole API gives you familiar tools and greater control over how things are working. One of AppSync's greatest weakness is its reliance on a lot of foreign concepts. Starting out with a single Lambda function and a schema while still getting all the benefits of GraphQL is an easy way to get up to speed with AppSync.

The other end of the spectrum is to write several Lambda functions and also take advantage of other data sources and the pipelining feature of AppSync. For example, a DynamoDB data source with appropriate resolvers can read data from tables, a Lambda resolver can interact with Cognito and another one with SNS.

AppSync APIDynamoDB TableLambdaData SourceDynamoDBData SourceLambdaData SourceLambdaLambdaCognito User PoolSNS
Using granular data sources

While this requires a lot more resources working together to provide the same functionality, it is the "native AppSync way". This provides clearer separation of each part of the API and only calls the appropriate data source required for the resolver.

Granular resolvers help with the "lousy neighbour" problem: while Lambda can scale to a heavy load, it is still limited. If one part of the API calls the function too many times, other parts that don't rely on that function can still work.

Also many small parts provide better visibility. If all requests call the same function it's hard to see what is happening inside. Logging and monitoring of the underlying processors are easier when they are small.

Finally, since permissions are tied to data sources, having multiple of them allows the implementation of least privilege. Instead of one Lambda function that has full access to everything, small and limited components provide defense in depth.

Tip

While it's tempting to add a single Lambda function and handle everything there, it is easier to operate an API that relies more on the resolvers infrastructure AppSync provides.

Master AppSync and GraphQL
Support this book and get all future updates and extra chapters in ebook format.