Queries and Mutations

Queries in GraphQL are based on two things: an entry point and traversals. The entry point is the initial object (or objects) that are coming from a Query or a Mutation type. This provides a "foothold" in the objects graph.

Then the query can define what other fields and objects it needs in the response. This makes GraphQL queries powerful: the client can define what it needs and it gets exactly that.

For example, a client might want to show the social graph for a user, so it fetches the friends:

query {
  user(id: "user1") {
    name
    friends {
      name
    }
  }
}

A different client might want to show an admin panel and show all the users with their email addresses and permissions:

query {
  allUsers {
    name
    email
    permissions {
      name
    }
  }
}

Both queries above use the same object graph, but they get different data. This gives GraphQL queries a lot of flexibility and if used well provides a faster user experience.

Note

It's all too easy to query everything and then select what is needed on the client-side. But this does not utilize the flexibility of the queries and goes against the best practices for GraphQL.

But queries are a lot more powerful than just selecting what fields the client needs. GraphQL provides a lot of features to fine-tune their behavior. In this chapter we'll look into the most useful ones and how to use them.

Fields

(Official docs)

What gives the most versatility for GraphQL queries is the ability to select the fields they need in the response. A type might have many fields, but if the client needs just a few of them, then it consumes less resources to omit the unneeded ones.

For example, a User object has a username, a bio, and an email address:

type User {
  username: String!
  email: String
  bio: String
}

type Query {
  user(username: String!): User
}

If the client does not need the email address, it can choose not to ask for that:

# query
query MyQuery {
  user(username: "user1") {
    username
    bio
  }
}

And the result:

{
  "data": {
    "user": {
      "username": "user1",
      "bio": "Lorem ipsum"
    }
  }
}
It's not just about the bandwidth

While most fields directly map to columns in a database, it is not always the case and the client should not assume it is "free" on the backend. For example, the User object in the database might only contain the username but not the email address. In a common scenario, returning the email means sending a call to Cognito (or a similar user directory) and extract the value from the response.

Because of this, clients should query only for data that they will actually use.

Fields can also be types defined in the schema. Getting a field that is not scalar is how a query can move from object to object in the graph. For example, showing the friends of a user means asking for the friend object and then their fields.

# schema
type User {
  username: String!
  email: String
  bio: String
  friends: [User!]!
}

type Query {
  user(username: String!): User
}
# query
query MyQuery {
  user(username: "user1") {
    username
    friends {
      username
    }
  }
}

And the result:

{
  "data": {
    "user": {
      "username": "user1",
      "friends": [
        {
          "username": "user2"
        },
        {
          "username": "user3"
        }
      ]
    }
  }
}

Arguments

(Official docs)

As we've seen in the Arguments chapter, fields of any type can define what arguments they need. These can be "top-level", such as a Query or a Mutation, but every field for every type can have arguments too.

A client query must provide all required arguments and it can fill any optional ones. When a query does not pass any arguments for a field the parentheses are missing.

For example, a query in the schema might define an optional argument and the client can choose whether to define that or not:

# schema
type Query {
  # No ! after String, so this is an optional argument
  allUsers(search: String): [User]
}

Both of these queries are valid:

query Query1 {
  allUsers {
    username
    email
  }
}

query Query2 {
  allUsers(search: "admin") {
    username
    email
  }
}

Arguments for fields work the same. Here, the username is required, while the after is optional:

# schema
type Ticket {
  text: String!
  # POSIX time
  created: Int!
}

type User {
  username: String!
  # the argument is optional
  tickets(after: Int): [Ticket]
}

type Query {
  # the argument is required
  user(username: String!): User
}

Defining arguments for a nested field works the same as for the query itself:

query Query1 {
  user(username: "user1") {
    username
    tickets(after: 1639907490) {
      text
      created
    }
  }
}

query Query2 {
  user(username: "user1") {
    username
    # no argument here
    tickets {
      text
      created
    }
  }
}

Variables

(Official docs)

As queries are text-based and arguments are usually user-supplied, for example a search field on a website, it's a bad idea to use string concatenation. Instead, GraphQL provides a way to separate the fixed part of the query from the variable part.

In this query, the variable part is the text argument of the search query:

query SearchQuery {
  search(text: "users") {
    text
    location
  }
}

With string concatenation it's all too easy to end up with something like this:

// don't do this

// search is the user-provided value
const graphQLQuery = `query SearchQuery {
  search(text: "${search}") {
    text
    location
  }
}`;

The problem here is that the value can contain characters that "break away" from the query argument. For example, if a user searches for something", then the resulting query is going to be:

query SearchQuery {
  search(text: "something"") {
    text
    location
  }
}

With the double ", the query is now invalid and the user will get an error.

To declare a variable for a query, give it a name and add it to the query object. Then whenever you want to define that value, use the variable name:

query SearchQuery($text: String) {
  search(text: $text) {
    text
    location
  }
}

Then pass the variable separately, which is dependent on the client library you are using.

For example, AppSync variables are in params/variables as a JSON object:

Variables are defined separately for an AWS AppSync query
Note

Unlike SQL injection, string concatenation in a GraphQL query is not a security vulnerability. This is because it happens on the client-side, so a malicious actor can already send any query they wish.

Except when the dynamic part is defined by a client different than the one sending the request, such as a Lambda function that gets parameters from a HTTP request. In that case, not using variables can open security vulnerabilities.

Aliases

(Official docs)

A query can define the same field multiple times, which is especially useful when you want to pass different arguments. For example, a user's tickets might be one of 3 states: OPEN, IN_PROGRESS, and DONE and you might want to return the top 3 tickets for each state.

# schema
enum STATUS {
    OPEN
    IN_PROGRESS
    DONE
}

type Ticket {
  id: ID!
  status: STATUS!
  description: String
}

type User {
  username: String!
  tickets(status: STATUS, limit: Int): [Ticket!]!
}

type Query {
  user(username: String!): User
}

The syntax is aliasname: field:

# query
query {
  user(id: "user1") {
    open_tickets: tickets(status: OPEN, limit: 3) {
      id
    }
    in_progress_tickets: tickets(
      status: IN_PROGRESS,
      limit: 3
    ) {
      id
    }
    done_tickets: tickets(status: DONE, limit: 3) {
      id
    }
  }
}

The result contains the tickets 3 times, but their names are open_tickets, in_progress_tickets, and done_tickets:

{
  "data": {
    "user": {
      "open_tickets": [
        {"id": "1"}, // ...
      ],
      "in_progress_tickets": [
        // ...
      ],
      "done_tickets": [
        // ...
      ]
    }
  }
}
Note

In the above example, what is the benefit of including the tickets 3 times, effectively making 3 database queries, instead of getting a bunch of tickets and sort them on the client-side?

Imagine there are a lot of new tickets coming to the system. If you want to show 3 IN_PROGRESS tickets, how many tickets should the client request? Maybe there are 1000 OPEN tickets, which means the client needs to send multiple (maybe a lot) of queries to get 3 that are not OPEN. Worse still, if the project is new and there are no IN_PROGRESS tickets (or less than three), the clients needs to fetch all tickets to determine what to show.

The above query makes it sure that the response contains only what the client needs.

Type safety

(Official docs)

As the schema defines what types a query can include and also the fields of each type, the GraphQL backend can do extensive validations even before it starts building the response. While it's not enough to catch all invalid data coming into the backend, it helps a lot both in security and accidental errors.

For example, queries with fields that are not in the schema raise a validation error, as well as missing or extra argument. Moreover, if the query passes an argument that is of a wrong type also triggers an error.

# schema
type User {
  id: ID
  name: String
}

type Query {
  user(id: ID!): User
}
# query
query Query {
  # Validation error of type FieldUndefined:
  # Field 'missing' in type 'User'
  # is undefined @ 'user/missing'
  user(id: "user1@example.com") {
    id
    name
    missing
  }
  # Validation error of type WrongType:
  # argument 'id' with value
  # 'BooleanValue{value=true}' is not
  # a valid 'ID' @ 'user'
  bad_arg: user(id: true) {
    id
  }
  # Validation error of type MissingFieldArgument:
  # Missing field argument id @ 'user'
  missing_arg: user {
    id
  }
}

Inline fragments

(Official docs)

Inline fragments help with queries that have unions or interfaces, since they don't define a single concrete type, but instead multiple type (unions) or an abstract type (interfaces). With inline fragments, the query can define what fields it needs for each concrete type.

Inline fragments for interfaces

For an interface, the system can have multiple user types with different fields:

# schema
interface User {
  username: String!
  email: String
}

type AdminUser implements User {
  username: String!
  email: String
  permissions: [String!]!
}

type NormalUser implements User {
  username: String!
  email: String
  nickname: String
}

type Query {
  allUsers: [User!]!
}

A query with an inline fragment can define what fields it needs if the type is an AdminUser and what types if it's NormalUser. The syntax is ... on <type> {fields}:

# query
query MyQuery {
  allUsers {
    username
    email
    ... on AdminUser {
      permissions
    }
    ... on NormalUser {
      nickname
    }
  }
}

The result won't have types, but the fields will be dependent on the user type and they contain all common fields (username and email) and also all fields for the type (either permissions or nickname):

{
  "data": {
    "allusers": [
      {
        "username": "user1",
        "email": "user1@example.com",
        "nickname": "Bob"
      },
      {
        "username": "admin1",
        "email": "admin2@example.com",
        "permissions": "system"
      }
    ]
  }
}

Inline fragments for union types

The same logic applies to unions too, as they also don't have a single type.

# schema
type User {
  username: String!
  email: String
}

type Ticket {
  text: String!
}

type Query {
  search(query: String!): User | Ticket
}

The same structure is used here as for the interfaces:

# query
query MyQuery {
  search(query: "test") {
    ... on User {
      username
      email
    }
    ... on Ticket {
      text
    }
  }
}

__typename

(Official docs)

The __typename is a meta field that a query can include for any type and it is the name of that type. It is most useful when a client needs to handle the result differently depending on the object type.

For example, the getCurrentUser can return a NormalUser or an AdminUser:

# schema
type AdminUser {
  username: String!
  email: String
}

type NormalUser {
  username: String!
  email: String
}

type Query {
  getCurrentUser: AdminUser | NormalUser
}

The query can include the __typename for the result:

# query
query MyQuery {
  allUsers {
    __typename
    username
    email
  }
}

And the response contains whether the user is an AdminUser and the client shows the admin control panel or a NormalUser and redirect to a landing page, for example:

{
  "data": {
    "getCurrentUser": {
      "username": "admin1",
      "email": "admin1@example.com",
      "__typename": "AdminUser"
    }
  }
}

In this example, the two types don't differ in their fields, so a client can't decide based on what properties are present in the response. But their __typename will always be different.

Note

While you could also include this information in the schema, for example in a user type:

type AdminUser {
  # ...
  admin: Boolean!
}

type NormalUser {
  # ...
  admin: Boolean!
}

The __typename meta field provides a cleaner alternative to this.

Since the __typename meta field is guaranteed to be present for all objects, it also provides a way for tools to generate strongly-types objects. For example, a TypeScript library can get the result JSON and return an object graph.

Mutations

(Official docs)

So far, all the examples was how to get data from a GraphQL API. Mutations, on the other hand, is how a client can change it, such as creating a new user, or closing a ticket in a ticketing system.

Mutations use the same structure as Queries, you just need to declare them under the Mutation type, instead of the Query. A mutation can get arguments and return data the same as a query.

For example, let's add a mutation to add a new user to the database! For this, we need to add a field to the Mutation type:

# schema
type Mutation {
  createUser(username: String!, email: String): User
}

And to call it, the client query starts with mutation:

# query
mutation MyMutation {
  createUser(username: "user4") {
    username
    friends {
      username
    }
  }
}

Under the hood, the mutation does two things: first, it does the modification based on the arguments. And second, it returns an object or a scalar and that behaves like a normal query. In the above example, the createUser returns a User, so the mutation can select what fields it needs and even move to other objects.

Notice that the Mutation is almost the same as a Query, the difference is just the keyword mutation in the client query and the type in the schema. Otherwise, it has the same query structure. In fact, there is almost no difference between the two. There is nothing preventing a query from changing data and nothing forcing a mutation to change it.

It is still a best practice to keep operations that are strictly read-only separated from write-read operations. First, there are some differences in the specifications, as we'll see in the next chapter. Also, additional functionality, for example caching, might behave differently depending on the operation type.

Tip

Make sure to define read-only operations as queries, and read-write operations as mutations.

Multiple operations in a single request

Aliases allow a single field to be present multiple times, possibly with different arguments, in a query. The same mechanism works for top-level queries and mutations too. This makes it possible to further eliminate on unnecessary roundtrips.

To get not one but two users from the API, define two fields with the necessary arguments and alias them:

# query
query {
  user1: user(id: "user1") {
    username
    friends {
      name
    }
  }
  user2: user(id: "user2") {
    username
    email
  }
}

The result will contain both users, under the key user1 and user2, respectively. Notice that the two queries don't need to get the same structure. In the example, the first fetches the friends also, while the second returns the email address instead.

The same works for mutations too:

# query
mutation MyMutation {
  user4: createUser(username: "user4") {
    username
    friends {
      username
    }
  }
  user5: createUser(username: "user5") {
    username
    bio
  }
}
Note

You can't mix queries and mutations in a single request, only operations of the same type.

There is a difference between queries and mutations though. When a single request contains multiple queries, they are executed in parallel, so it takes the least amount of time. But when multiple mutations are sent in a request, they are executed in serial. Otherwise, it could result in a race condition among the individual operations.

Query cost

Finally, let's talk about how much a query can cost in terms of bandwidth, processing, and other related costs!

In a REST-like API, we can usually calculate it easily. For example, if the /user/<id> fetches a user object from the database, the total cost is 1 database query, then the size of the result object. Then if the client makes 100 such requests, we can just multiply the costs by 100. This makes it easy to cap resource usage per client: just apply some rate-limiting and it provides an efficient guardrail against overusage.

But with the query structure of GraphQL, estimating load becomes much more complicated. This is most apparent when a query includes collections. For example, let's query a user's friends:

query {
  user(id: "user1") {
    username
    friends {
      username
    }
  }
}

What is the cost of this request? It depends on how many friends a user has, and the average can be quite different than the extremes. This is why pagination is extremely important in GraphQL.

But unfortunately, that only solves some problems but not all. Let's say all collections can return no more than 100 items. If a user has a lot of friends, it will take multiple requests to fetch all of them. But when there are multiple layers of collections, it's still not enough. For example, let's get the friends' articles too:

query {
  user(id: "user1") {
    username
    friends {
      username
        articles {
          title
        }
    }
  }
}

While a single collection can have no more than 100 items, the nesting multiplies the maximum number of elements. In this case, 100 friends, each having 100 articles yields a 100 * 100 = 10.000 items in the response, worst case. And what if there are 3 layers? The worst case jumps to 1.000.000 items.

And it's not just about collections but individual fields. In practice, most fields are "free": the object comes from a database query and whether the response includes them or not is just a matter of saving bandwidth. But some fields might involve a separate process. A good example is to fetch the email address for a user from a Cognito user pool. If the query includes that, the GraphQL server needs to send a HTTP request to Cognito to fetch the value. Combine that with 10.000 users and it can easily trip some rate-limiting for connected systems.

This is a common problem in GraphQL and there are different semi-solutions for this. AppSync, for example, has a hard upper limit on execution time and response size. This prevents some problems but not all.

Other solutions try to guess the complexity of the queries before they are executed by the API. A research paper from IBM provides a great insight of the problems, while a simpler approach from Spotify is also worth reading. There are also tools with different features that you can add to your GraphQL backend.

On the other hand, GraphQL's complexity on queries makes this problem even harder. For example, AWS AppSync provides a list of fields in the query called selectionSetList:

# query
query {
  getPost(id: $postId) {
    postId
    title
    content
    author {
      authorId
      name
    }
  }
}

The selectionSetList contains the list of all fields:

{
  "selectionSetList": [
    "postId",
    "title",
    "content",
    "author",
    "author/authorId",
    "author/name",
  ],
}

But not for interfaces or unions:

query {
  node(id: "post1") {
    id
    ... on Post {
      title
    }
  }
}
{
  "selectionSetList": [
    "id"
  ]
}

In this case, if a mechanism uses the selectionSetList to protect an API from overload, an attacker might find a way to circumvent that.

Example

Run the queries

Deploy the schema in your account and run these queries live. You'll need an AWS account, the AWS CLI, and Terraform.

How to deploy and more info in the GitHub repository: https://github.com/sashee/graphql-example

Let's see how to run queries and mutations on the data model we defined in the last chapter!

Ticketid: ID!title: String!description: String!owner: Userseverity: CRITICAL | NORMALattachments: [Attachment!]!Userid: ID!name: String!Attachmentid: ID!url: String!Imagecontent_type: String!Filesize: Int!
Data model

Listing Tickets

First, let's use the Query that retrieves the tickets in the system:

query MyQuery {
  getTickets {
    # return these fields for each Ticket
    description
    id
    owner {
      # if there is an owner, return these fields for the User
      id
      name
    }
    severity
    title
  }
}

This returns a list of Tickets with the defined fields:

{
  "data": {
    "getTickets": [
      {
        "description": "Description 2",
        "id": "ticket2",
        "owner": null,
        "severity": "CRITICAL",
        "title": "Ticket 2"
      },
      {
        "description": "Description 1",
        "id": "ticket1",
        "owner": {
          "id": "user1",
          "name": "user1"
        },
        "severity": "NORMAL",
        "title": "Ticket 1"
      }
    ]
  }
}

This query defines fields that the response contains. It defines nested fields, such as the owner.id and the owner.name, and when the owner is non-null GraphQL returns the id and the name of the user. This is possible, as the schema defines the owner as an optional field:

type Ticket {
  # ...
  owner: User
}

The getTickets query returns a list, and the result is a JSON array. This is because the defines the type of that field as a list:

type Query {
  # must contain a list (might have 0 elements)
  # and each element is a Ticket
  getTickets: [Ticket!]!
  # ...
}

Notice that in the query there is no difference between a list and a type. You need to define the fields you want in the result, and the response JSON will contain an array of objects or an object depending on the schema.

Let's query the Attachments for the Tickets too!

query MyQuery {
  getTickets {
    id
    attachments {
      # every attachment has an id and a url
      id
      url
      ... on File {
        # if it is a File, also add the size
        size
      }
      ... on Image {
        # if it is an Image, add the content_type
        content_type
      }
    }
  }
}

This returns the ticket IDs and the attachments for each ticket:

{
  "data": {
    "getTickets": [
      {
        "id": "ticket2",
        "attachments": []
      },
      {
        "id": "ticket1",
        "attachments": [
          {
            "id": "file1",
            "url": "example.com/file.doc",
            "size": 1500
          },
          {
            "id": "image1",
            "url": "example.com/image1.jpg",
            "content_type": "image/jpg"
          }
        ]
      }
    ]
  }
}

The above query defines the attachments field for the Ticket and that is a list of Attachments. As an Attachment is an interface you can only define the common fields to return (id and url). To get fields defined for the File and the Image types, the query uses inline fragments. This makes it possible to define how to handle the different possible types in a single query.

To help with abstract types, the query can also request the __typename:

query MyQuery {
  getTickets {
    id
    attachments {
      id
      url
      __typename
    }
  }
}

This returns an extra field for each item, indicating whether it's a File or an Image:

{
  "data": {
    "getTickets": [
      {
        "id": "ticket2",
        "attachments": []
      },
      {
        "id": "ticket1",
        "attachments": [
          {
            "id": "file1",
            "url": "example.com/file.doc",
            "__typename": "File"
          },
          {
            "id": "image1",
            "url": "example.com/image1.jpg",
            "__typename": "Image"
          }
        ]
      }
    ]
  }
}

Searching

The other query the schema provides is the search. It needs a query argument and returns a Ticket or a User (or null):

# need to send the value of the query separately
query MyQuery($query: String!) {
  search(query: $query) {
    ... on User {
      # if the result is a User
      id
      name
    }
    ... on Ticket {
      # if the result is a Ticket
      id
      title
      description
    }
  }
}

Calling it with a Ticket ID:

{
  "query": "ticket1"
}

Returns the Ticket object:

{
  "data": {
    "search": {
      "id": "ticket1",
      "title": "Ticket 1",
      "description": "Description 1"
    }
  }
}

Similarly, searching for a User works using the same query:

{
  "query": "user1"
}

Returns a User:

{
  "data": {
    "search": {
      "id": "user1",
      "name": "user1"
    }
  }
}

This query uses an argument for the search term (search(query: $query)) and a variable.

Then to distinguish between a User and a Ticket, the query uses inline fragments. The __typename also works for union types:

query MyQuery($query: String!) {
  search(query: $query) {
    __typename
    ... on User {
      id
      name
    }
    ... on Ticket {
      id
      title
      description
    }
  }
}
{
  "data": {
    "search": {
      "__typename": "User",
      "id": "user1",
      "name": "user1"
    }
  }
}

A search returns a single result, but GraphQL allows multiple fields (and that works for fields of the Query type too) with aliases:

query MyQuery($query1: String!, $query2: String!) {
  # first query
  result1: search(query: $query1) {
    ... on User {
      name
    }
    ... on Ticket {
      title
    }
  }
  # second query
  result2: search(query: $query2) {
    ... on User {
      name
    }
    ... on Ticket {
      title
    }
  }
}

With values for query1 and query2:

{
  "query1": "user1",
  "query2": "ticket1"
}

The result JSON contains the results for both searches:

{
  "data": {
    "result1": {
      "name": "user1"
    },
    "result2": {
      "title": "Ticket 1"
    }
  }
}

Adding tickets

Let's move on to Mutations and see how to add a new Ticket!

mutation MyMutation(
  $description: String!,
  $severity: SEVERITY!,
  $title: String!
) {
  # add a Ticket with these arguments
  addTicket(details: {
    description: $description,
    severity: $severity,
    title: $title
  }) {
    # and return these fields
    id
    owner {
      id
      name
    }
    description
    title
    severity
  }
}

Notice the three variables that the query needs to send:

{
  "description": "Test description",
  "title": "Test ticket",
  "severity": "NORMAL"
}

The result is the new Ticket with the fields defined in the mutation:

{
  "data": {
    "addTicket": {
      "id": "a10f012f-8bd0-4aaa-a4e3-d0ed85322790",
      "owner": null,
      "description": "Test description",
      "title": "Test ticket",
      "severity": "NORMAL"
    }
  }
}

Notice that the owner is null, as it is an optional argument and the mutation did not specify it. To also provide that value:

# the owner is optional
mutation MyMutation(
  $description: String!,
  $severity: SEVERITY!,
  $title: String!,
  $owner: ID
) {
  # also define the owner
  addTicket(details: {
      description: $description,
      severity: $severity,
      title: $title
    }, owner: $owner) {
    id
    owner {
      id
      name
    }
    description
    title
    severity
  }
}

With the variables:

{
  "description": "Test description",
  "title": "Test ticket",
  "severity": "NORMAL",
  "owner": "user1"
}

The created Ticket contains an owner:

{
  "data": {
    "addTicket": {
      "id": "f41010b0-5782-493c-96be-f905b31c77d8",
      "owner": {
        "id": "user1",
        "name": "user1"
      },
      "description": "Test description",
      "title": "Test ticket",
      "severity": "NORMAL"
    }
  }
}
Master AppSync and GraphQL
Support this book and get all future updates and extra chapters in ebook format.