Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/dev-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,11 @@ jobs:
# it goes nowhere.
export BUILD_GOOGLE_ANALYTICS_ACCOUNT=UA-00000000-0

# Make sure every built page always has
# '<meta name="robots" content="noindex, nofollow">' nomatter what
# kind of document it is.
export BUILD_ALWAYS_NO_ROBOTS=true

yarn build

# TODO: When the deployer is available this is where we
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/stage-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,11 @@ jobs:
# origin domain isn't what that account expects.
export BUILD_GOOGLE_ANALYTICS_ACCOUNT=UA-36116321-5

# Make sure every built page always has
# '<meta name="robots" content="noindex, nofollow">' nomatter what
# kind of document it is.
export BUILD_ALWAYS_NO_ROBOTS=true

yarn build

du -sh client/build
Expand Down
6 changes: 6 additions & 0 deletions build/constants.js
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ const FIX_FLAWS_VERBOSE = JSON.parse(
process.env.BUILD_FIX_FLAWS_VERBOSE || "true"
);

// See explanation in docs/envvars.md
const ALWAYS_NO_ROBOTS = JSON.parse(
process.env.BUILD_ALWAYS_NO_ROBOTS || "false"
);

module.exports = {
BUILD_OUT_ROOT,
DEFAULT_FLAW_LEVELS,
Expand All @@ -67,4 +72,5 @@ module.exports = {
FIX_FLAWS,
FIX_FLAWS_DRY_RUN,
FIX_FLAWS_VERBOSE,
ALWAYS_NO_ROBOTS,
};
14 changes: 14 additions & 0 deletions docs/envvars.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,20 @@ the Google Analytics script tag it will use
`<script src="https://www.google-analytics.com/anaytics_debug.js"></script>`
instead which triggers additional console logging which is useful for developers.

### `BUILD_ALWAYS_NO_ROBOTS`

**Default: `false`**

This exists so we can forcibly always include
`<meta name="robots" content="noindex, nofollow">` into the HTML no matter what.
For example, on our stage or dev builds, none of the documents should be indexed,
so we'll set `BUILD_ALWAYS_NO_ROBOTS` to `true`.

We use this to make absolutely sure that no dev or stage build ever gets into
the Google index. Thankfully we _always_ used a canonical URL
(`<link rel="canonical" href="https://developer.mozilla.org/$uri">`) as a "second
line of defense" for dev/stage URLs that are public.

## Server

### `SERVER_PORT`
Expand Down
13 changes: 8 additions & 5 deletions ssr/render.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import cheerio from "./monkeypatched-cheerio";
import {
GOOGLE_ANALYTICS_ACCOUNT,
GOOGLE_ANALYTICS_DEBUG,
ALWAYS_NO_ROBOTS,
} from "../build/constants";

// When there are multiple options for a given language, this gives the
Expand Down Expand Up @@ -176,11 +177,13 @@ export default function render(
$('meta[name="description"]').attr("content", pageDescription);
}

if ((doc && !doc.noIndexing) || pageNotFound) {
$('<meta name="robots" content="noindex, nofollow">').insertAfter(
$("meta").eq(-1)
);
}
const robotsContent =
ALWAYS_NO_ROBOTS || (doc && doc.noIndexing) || pageNotFound
? "noindex, nofollow"
: "index, follow";
$(`<meta name="robots" content="${robotsContent}">`).insertAfter(
$("meta").eq(-1)
);

if (!pageNotFound) {
$('link[rel="canonical"]').attr("href", canonicalURL);
Expand Down
3 changes: 3 additions & 0 deletions testing/tests/index.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ test("content built foo page", () => {
expect($("link[rel=canonical]").attr("href")).toBe(
`https://developer.mozilla.org${doc.mdn_url}`
);

expect($('meta[name="robots"]').attr("content")).toBe("index, follow");
});

test("summary extracted correctly by span class", () => {
Expand Down Expand Up @@ -720,6 +722,7 @@ test("404 page", () => {
const $ = cheerio.load(html);
expect($("title").text()).toContain("Page not found");
expect($("h1").text()).toContain("Page not found");
expect($('meta[name="robots"]').attr("content")).toBe("noindex, nofollow");
});

test("bcd table extraction followed by h3", () => {
Expand Down