RFC: Data masking with fragments

_This work is part of the [Data masking milestone](https://github.com/apollographql/apollo-client/milestone/38). Follow the milestone for progress on this feature._

## Background

When building complex applications today, Apollo Client users reach for `useQuery` to get data into their components. This is no surprise as this is considered a [best practice](https://www.apollographql.com/docs/react/data/operation-best-practices#query-only-the-data-you-need-where-you-need-it) by the Apollo Client documentation. 

We are taking an initiative this year to change the recommendation to instead prefer fragment composition and [fragment colocation](https://www.apollographql.com/docs/react/data/fragments#colocating-fragments) as the standard pattern for building apps with Apollo Client. While the out-of-the-box experience with this pattern works, as recommended by our documentation, there are a few shortcomings of the existing solution:

* It’s easy to introduce implicit coupling on fragment data from query components[^1]. This makes the app more prone to breakage as child components are refactored and data requirements change.
* Cache writes rerender at the query component level, making fine-grained updates for fragment components difficult to do[^2]

To alleviate these shortcomings, we are are introducing data masking into Apollo Client. Data masking is popularized by [Relay](https://relay.dev/docs/principles-and-architecture/thinking-in-relay/#data-masking) and is useful to avoid implicit dependencies between components.

## What is data masking?

Data masking is the functionality that provides access only to the fields that were requested in the component. This prevents implicit coupling between components by allowing components to access only the fields requested by that component. For query components, this includes all fields in a query not included in a GraphQL fragment. For fragment components, this includes the fields defined in the fragment definition.

Take the following as an example:

```js
const query = gql`
  query {
    user {
      id
      name
      ...UserFields
    }
  }
`;

function App() {
  const { data } = useQuery(query)
  
  // loading state omitted for brevity
  
  return (
    <div>
      {data.user.name}
      <User user={data.user} />
    </div>
  )
}

function User({ user }) {
  // ...
}

User.fragment = {
  user: gql`
    fragment UserFields on User {
      age
      birthdate
    }
  `
}
```

In Apollo Client today, all `user` data is available to `<App />`. This means that `<App />` can access fields such as `age` and `birthdate`, even though these fields were asked for by the fragment in Child. This creates an implicit coupling between `<App />` and `<User />`. For example, if `<App />` consumed `user.age`, and `<User />` was refactored to remove `age` from the fragment, this would break the `<App />` since `age` will no longer be loaded by the query.

Data masking solves this by only allowing the fields declared in that component to be accessible in the component that asked for that data. In this example, `<App />` would only have access to `user.id` and `user.name`, but not `user.age` and `user.birthdate` since these were part of the fragment. `<User />` would have access to `user.age` and `user.birthdate`, but not `user.id` or `user.name` since these fields are not part of the fragment.

The same applies to fragments that include nested fragments. One fragment cannot access data defined in a nested fragment. Take the following example:

```js
const UserFragment = gql`
  fragment UserFragment on User {
    id
    name
    ...UserProfileFragment
  }
`

const UserProfileFragment = gql`
  fragment UserProfileFragment on User {
    age
    birthdate
  }
`
```

Here `UserFragment` can access `id` and `name`, but not `age` and `birthdate`. `UserProfileFragment` can access `age` and `birthdate`, but not `id` and `name` since these fields are not declared in the fragment.

## Usage

With data masking enabled, you will issue queries the same as you do today. This is typically done with the `useQuery` or the `useSuspenseQuery` hook if you've adopted Suspense. The difference is that you will not be able to access fields defined in fragments from these components. 

Instead, fields declared in fragments will be accessed through the [`useFragment`](https://www.apollographql.com/docs/react/api/react/hooks#usefragment) hook*. To provide a nice developer experience, we plan update the [`from` option](https://www.apollographql.com/docs/react/api/react/hooks#from) in `useFragment` to support passing the entire parent object that contains the fragment as the value to this option. See the "Example" section below for a full code sample.

> *Depending on technical feasibility and backwards compatibility, we may need to introduce a separate hook. This will be part of the exploratory work of this feature.

### `@defer`

As part of this work, we will integrate `useFragment` with `@defer` and detect when the fragment is in-flight. This will integrate with [React Suspense](https://react.dev/reference/react/Suspense) and cause the component to suspend until the fragment data has loaded.

#### Usage with non-suspenseful hooks

`useFragment` will not be limited to usage with suspenseful hooks however (such as `useSuspenseQuery`). Users may not yet be compatible with Suspense or may primarily be using `useQuery` to power their apps. Enabling suspense in these situations would be unwise and induce frustration.

To avoid this, we plan to make `useFragment` aware of the hook that produced the query to conditionally suspend the component. This means that Suspense will only be available when the query is produced by a suspenseful hook such as `useSuspenseQuery`.

#### Fetch policies with cached data

Users have the ability to leverage [fetch policies](https://www.apollographql.com/docs/react/data/queries/#supported-fetch-policies) to determine how to use cached data when consuming a query. For example, you can bypass the cache and force a network request with `network-only`, or read from the cache while fetching from the network with `cache-and-network`.

`useFragment` should be aware of this to mimic the query hook behavior when determining whether to return a cached result or suspend. For example, when a deferred fragment is used within a `network-only` query, the hook should suspend until the fragment is fulfilled, regardless of whether there is data in the cache. When used with a `cache-and-network` query, `useFragment` should provide the cached data and rerender when the network request finishes.

### Non-React frameworks/libraries

The Apollo Client ecosystem is not limited to just React users. Libraries such as [Apollo Angular](https://the-guild.dev/graphql/apollo-angular/docs) and [Vue Apollo](https://v4.apollo.vuejs.org/) provide view bindings for their respective view libraries. We plan to provide the foundation for these libraries to adopt this feature as well. This extends to users that use Apollo Client's core APIs.

We will be layering much of this work into Apollo's core query APIs, such as [`watchQuery`](https://www.apollographql.com/docs/react/api/core/ApolloClient#watchquery) and v3.10's [`watchFragment`](https://github.com/apollographql/apollo-client/pull/11465).

## Render performance

Adopting data masking will include more benefits than just programming best practices. It will also provide performance benefits. 

Today, cache writes to fields in a fragment definition cause the query component to re-render, regardless of whether that component consumes the data from the fragment or not. Depending on the depth of the component tree mounted beneath the query component, this may have significant performance implications. Many users avoid this by introducing additional query hooks in components further down the component tree. This however comes at the cost of additional network requests.

You can avoid the render performance implications today with the use of the [`@nonreactive` directive](https://www.apollographql.com/docs/react/data/directives#nonreactive) combined with `useFragment`[^3]. While this works well, it requires manual intervention.

With data masking enabled, this performance benefit will be an out-of-the-box feature. Because `useFragment` will be required to read data out of a fragment, we will target re-renders on the fragment components directly when there are cache writes to fields in the fragment.

## Enabling this feature

Once this feature is introduced, it cannot be enabled automatically since this would constitute a breaking change. Instead, we will need to allow an opt-in to this feature. We plan to allow this in 2 ways:

### Globally

We will allow data masking to be opted into globally by introducing a new option to `ApolloClient`. Enabling this option would automatically turn on data masking for every query that uses the client instance. 

```js
new ApolloClient({ 
  // This name is subject to change
  dataMasking: true
})
```

We will recommend that new users and applications opt into this feature immediately as the default. Smaller apps that have the capacity to migrate in an afternoon should also consider enabling this feature. We plan to make this the default in future major versions of Apollo Client with the eventual goal to deprecate this option and make this standard behavior.

### Incrementally

We understand that large apps cannot stop everything and adopt this feature in its entirely. To allow large apps to migrate over time, we will allow this feature to be opted into incrementally. 

We will do so by first requiring that users enable data masking globally, then allowing users to opt-out of data masking per named fragment. While this approach seems counterintuitive to the goal of an incremental migration, this approach has some advantages:

* New queries are automatically masked. This takes out possible human error forgetting to enable data masking on any newly introduced query to the application.
* Over time, you are removing code to adopt data masking rather than adding code. This makes it easy to spot which queries in your application have not yet adopted the feature.

To make this approach feasible at scale, we will provide some out-of-the-box tools to handle the up-front work for you. For more information, see the “Migration tools” section below.

#### `@unmask`

We plan to add support for a new client-only directive `@unmask` that marks a named fragment as unmasked. 

```gql
query MyQuery {
  user {
    id
    ...UserFields @unmask
  }
}
```

Named fragments marked with `@unmask` will behave as it does today, allowing access to all fields, including those defined in fragments. We are making this a directive used on named fragments to make it easier for you to migrate specific subtrees that consume data from named fragments. 

##### `@unmask` migration mode

While `@unmask` works suitably on its own, it can be difficult to determine at any given time how many of your query components consume fields from named fragments that would normally be masked. We want to provide an obvious way to identify areas of your code where you've accessed a field that would normally be masked. We're introducing a migration mode for `@unmask` that will warn in development when a would-be masked field is accessed in your components. To enable migration mode, you'll set the `mode` argument:

```graphql
query MyQuery {
  user {
    id
    ...UserFields @unmask(mode: "migrate")
  }
}
```

Once you no longer see warnings in your code, it should be safe to remove the `@unmask` directive to start masking data for fields defined in the named fragment.

## Migration tools

We will provide utilities that will ease migration in large apps to make it feasible to adopt this feature.

1. Codemod

We will provide a codemod that will crawl through the application and apply `@unmask` fields to every named fragment. This will handle queries used in `gql` tags and `.graphql`/`.gql` files. By default this will apply the `@unmask` directive in migrate mode, but there will be an option not to apply migrate mode in case you want to avoid the warnings.

1. ESLint plugin

We will provide an ESLint plugin with a rule that will warn for queries that contain `@unmask` directives used in migrate mode. This provides a more automated way to see areas of the codebase that have not yet adopted data masking. This also makes it possible to ban usage of the directive once data masking is fully adopted.

## Example

```js
const USER_QUERY = gql`
  query UserQuery {
    user {
      id
      name
      ...UserInfoFields
      ...UserAvatarFields
    }
  }
`

function App() {
  const { data } = useQuery(USER_QUERY);
  
  return (
    <div>
      {data.user.name}
      <UserInfo user={data.user} />
      <UserAvatar user={data.user} />
    </div>
  );
}
 

const USER_INFO_FRAGMENT = gql`
  fragment UserInfoFields on User {
     age
     birthdate
  }
`;

function UserInfo({ user }) {
  const { data } = useFragment({ 
    fragment: USER_INFO_FRAGMENT,
    from: user
  })

  return (
    <div>
      {data.age} - {data.birthdate}
    </div>
  ) 
}

const USER_AVATAR_FRAGMENT = gql`
  fragment UserAvatarFields on User {
    avatar {
      url
    }
  }
`;

function UserAvatar({ user }) {
  const { data } = useFragment({ 
    fragment: USER_AVATAR_FRAGMENT,
    from: user
  });

  return <img src={data.avatar.url} />
}
```

## Open questions

**Should we allow this for queries that use `no-cache` fetch policies?**

`useFragment` is a cache API. It allows you to selectively read data out of the cache and re-render when that data changes. This gets tricky when used with `no-cache` queries. We'll either need add support to `useFragment` to allow it to be used without the cache, or we'll need to prevent usage with `no-cache` queries.

Depending on our final decision, this may warrant the introduction of a new hook to distinguish `useFragment` as a cache-only API.

**Should this feature apply to `cache.readQuery` and `cache.readFragment`?**

Cache APIs, such as `cache.readQuery` and `cache.readFragment` can be thought of as selectors for data in the cache. Data masking makes less sense here if you just need to read some arbitrary data out of the cache, especially since the queries/fragments you provide to these APIs do not actually have to be a query that was previously executed on the network. These APIs do not cause network requests and do not play a role in re-rendering your components.

Instead, we may prefer to build this into the client layer between the cache and the end usage. For example, using `client.watchQuery` and `client.readQuery` would be data-masked. You'd pair this with `client.watchFragment` and/or `client.readFragment` to masked fragment data from these APIs. The inherent risk with this approach is that it may cause confusion since the distinction between `client.readQuery` and `client.cache.readQuery` may not be apparent.

**Can we allow fragment selections on non-normalized data?**

Due to the way fragments work with the cache, `useFragment` is only able to read normalized data out of the cache. The `from` option creates the cache key used to look up the entity in the cache. With the planned update to the `from` option, should it be possible to read non-normalized data via `useFragment`?

The downside to allowing this is potential confusion on when this is allowed. Allowing non-normalized data to be selected when `from` originates from a query may make it feel like you can do this with any random fragment.

**Will it be possible to pass the parent objects provided to `from` in React context, reactive vars, etc.?**

It should theoretically be possible to pass around the parent objects that would normally be passed to child props in React context or other means of transporting values. I'm capturing this as an open question to make sure we are thinking about this while developing the feature.

**How do cache writes work with data masking?**

With data masking enabled, cache reads and writes are no longer symmetrical. We will need to explore ways to make `cache.writeQuery` make sense with this new paradigm.

[^1]: Query components meaning components that initiate a GraphQL network request via a query hook, such as `useQuery`.
[^2]: This can be avoided with the combination of [`@nonreactive`](https://www.apollographql.com/docs/react/data/directives#nonreactive) and [`useFragment`](https://www.apollographql.com/docs/react/api/react/hooks#usefragment) today, but is more subject to error as it requires you to manually add `@nonreactive` in the appropriate places.
[^3]: For a more in-depth look at this feature, see @alessbell's blog post titled ["Don’t Overreact! Introducing Apollo Client’s new `@nonreactive` directive and `useFragment` hook"](https://www.apollographql.com/blog/introducing-apollo-clients-nonreactive-directive-and-usefragment-hook).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Data masking with fragments #11666

Background

What is data masking?

Usage

`@defer`

Usage with non-suspenseful hooks

Fetch policies with cached data

Non-React frameworks/libraries

Render performance

Enabling this feature

Globally

Incrementally

`@unmask`

`@unmask` migration mode

Migration tools

Example

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development