Skip to content

Commit 092c3b9

Browse files
committed
fix: support Content-Signal
Fixes #487
1 parent 0dd311b commit 092c3b9

File tree

3 files changed

+52
-5
lines changed

3 files changed

+52
-5
lines changed

docs/content/2.guides/1.robots-txt.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,17 @@ The following rules are parsed from your `robots.txt` file:
6060
- `Disallow` - An array of paths to disallow for the user-agent.
6161
- `Allow` - An array of paths to allow for the user-agent.
6262
- `Sitemap` - An array of sitemap URLs to include in the generated sitemap.
63-
- `Content-Usage` - Content Signals for expressing AI usage preferences (see [Content Signals](#content-signals) below).
63+
- `Content-Usage` / `Content-Signal` - Directives for expressing AI usage preferences (see [Content Signals](#content-signals) below).
6464

6565
This parsed data will be shown for environments that are `indexable`.
6666

6767
## Content Signals
6868

69-
Content Signals allow you to express preferences about how AI systems should interact with your content using the `Content-Usage` directive.
69+
Content Signals allow you to express preferences about how AI systems should interact with your content. Both `Content-Usage` and `Content-Signal` directives are supported:
70+
71+
### Content-Usage (IETF Standard)
72+
73+
The `Content-Usage` directive follows the [IETF AI Preferences specification](https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/):
7074

7175
```txt [robots.txt]
7276
User-agent: *
@@ -76,7 +80,17 @@ Content-Usage: /public/ train-ai=y
7680
Content-Usage: /restricted/ ai=n train-ai=n
7781
```
7882

79-
See the emerging [IETF AI Preferences specification](https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/) for more details.
83+
### Content-Signal (Cloudflare Implementation)
84+
85+
The `Content-Signal` directive is [Cloudflare's implementation](https://blog.cloudflare.com/content-signals-policy/), widely deployed across millions of domains:
86+
87+
```txt [robots.txt]
88+
User-agent: *
89+
Allow: /
90+
Content-Signal: ai-train=no, search=yes, ai-input=yes
91+
```
92+
93+
Both directives are parsed identically and output as `Content-Usage` in the generated robots.txt. Use whichever format matches your preferences or existing tooling.
8094

8195
## Conflicting `public/robots.txt`
8296

src/util.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ export { AiBots, NonHelpfulBots }
3030
* - disallow: a URL path that may not be crawled.
3131
* - sitemap: the complete URL of a sitemap.
3232
* - host: the host name of the site, this is optional non-standard directive.
33+
* - content-usage / content-signal: AI content usage preferences (IETF spec / Cloudflare implementation).
3334
*
3435
* @see https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt
3536
* @see https://github.com/google/robotstxt/blob/86d5836ba2d5a0b6b938ab49501be0e09d9c276c/robots.cc#L714C1-L720C2
@@ -110,6 +111,7 @@ export function parseRobotsTxt(s: string): ParsedRobotsTxt {
110111
}
111112
break
112113
case 'content-usage':
114+
case 'content-signal':
113115
currentGroup.contentUsage = currentGroup.contentUsage || []
114116
currentGroup.contentUsage.push(val)
115117
break
@@ -315,7 +317,9 @@ export function generateRobotsTxt({ groups, sitemaps }: { groups: RobotsGroupRes
315317
for (const cleanParam of group.cleanParam || [])
316318
lines.push(`Clean-param: ${cleanParam}`)
317319

318-
// content signals / AI preferences (see https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/)
320+
// content signals / AI preferences
321+
// Both Content-Usage (IETF) and Content-Signal (Cloudflare) are accepted as input, output as Content-Usage
322+
// See: https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/
319323
for (const contentUsage of group.contentUsage || [])
320324
lines.push(`Content-Usage: ${contentUsage}`)
321325

test/unit/robotsTxtParser.test.ts

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -430,8 +430,37 @@ Content-Usage: /restricted/ ai=n train-ai=n
430430
User-Agent: *
431431
Content-Usage: invalid-preference
432432
Content-Usage: invalid-path ai=n
433-
Content-Usage:
433+
Content-Usage:
434434
`
435435
expect(parseRobotsTxt(robotsTxt).errors).toEqual([])
436436
})
437+
438+
it('content-signal directive parsing', () => {
439+
const robotsTxt = `
440+
User-Agent: *
441+
Allow: /
442+
Content-Signal: ai-train=no, search=yes, ai-input=yes
443+
`
444+
expect(parseRobotsTxt(robotsTxt)).toMatchInlineSnapshot(`
445+
{
446+
"errors": [],
447+
"groups": [
448+
{
449+
"allow": [
450+
"/",
451+
],
452+
"comment": [],
453+
"contentUsage": [
454+
"ai-train=no, search=yes, ai-input=yes",
455+
],
456+
"disallow": [],
457+
"userAgent": [
458+
"*",
459+
],
460+
},
461+
],
462+
"sitemaps": [],
463+
}
464+
`)
465+
})
437466
})

0 commit comments

Comments
 (0)