Skip to content

Commit 36da64a

Browse files
committed
feat: Modify image file names
------------------ Breaking change: Previously, image files names were a hash of all or part of the image url. To provide more stability and future-proofing, the default format is now `{page-slug}.{notion-block-id}`. (sillsdev#82) Users can opt in to the old format with `--image-file-name-format legacy`. ------------------ Feature: If desired instead, users can specify `--image-file-name-format content-hash` to use a hash of the image content as the file name. (sillsdev#76) ------------------ Potential breaking change for plugins: The exported type IDocuNotionContext changed from ``` export type IDocuNotionContext = { layoutStrategy: LayoutStrategy; options: DocuNotionOptions; getBlockChildren: IGetBlockChildrenFn; notionToMarkdown: NotionToMarkdown; directoryContainingMarkdown: string; relativeFilePathToFolderContainingPage: string; convertNotionLinkToLocalDocusaurusLink: (url: string) => string | undefined; pages: NotionPage[]; counts: ICounts; imports: string[]; }; ``` to ``` export type IDocuNotionContext = { layoutStrategy: LayoutStrategy; options: DocuNotionOptions; getBlockChildren: IGetBlockChildrenFn; notionToMarkdown: NotionToMarkdown; pageInfo: IDocuNotionContextPageInfo; convertNotionLinkToLocalDocusaurusLink: (url: string) => string | undefined; pages: NotionPage[]; counts: ICounts; imports: string[]; }; ``` where `IDocuNotionContextPageInfo` is ``` export type IDocuNotionContextPageInfo = { directoryContainingMarkdown: string; relativeFilePathToFolderContainingPage: string; slug: string; }; ```
1 parent bff12fd commit 36da64a

13 files changed

+279
-114
lines changed

.vscode/settings.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,17 @@
1111
"Greenshot",
1212
"imgur",
1313
"kanban",
14+
"sillsdev",
1415
"unlocalized"
1516
],
1617
"workbench.colorCustomizations": {
1718
"statusBar.background": "#d649ca",
1819
"statusBar.noFolderBackground": "#d649ca",
1920
"statussBar.prominentBackground": "#d649ca"
21+
},
22+
"markdownlint.config": {
23+
"MD025":false,
24+
"MD033":false,
25+
"MD040":false
2026
}
2127
}

README.md

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Example Site: https://sillsdev.github.io/docu-notion-sample-site/
66

77
# Instructions
88

9-
## 1. Set up your documentation site.
9+
## 1. Set up your documentation site
1010

1111
First, prepare your markdown-based static file system like [Docusaurus](https://docusaurus.io/). For a shortcut with github actions, search, and deployment to github pages, you can just copy [this template](https://github.com/sillsdev/docu-notion-sample-site).
1212

@@ -27,15 +27,15 @@ Go to the page that will be the root of your site. This page should have, as dir
2727

2828
<img width="318" alt="image" src="https://github.com/sillsdev/docu-notion/assets/8448/810c6dca-f9ab-4370-93b4-dc1479332af7">
2929

30-
## 5. Add your pages under your Outline page.
30+
## 5. Add your pages under your Outline page
3131

3232
Currently, docu-notion expects that each page has only one of the following: sub-pages, links to other pages, or normal content. Do not mix them. You can add content pages directly here, but then you won't be able to make use of the workflow features. If those matter to you, instead make new pages under the "Database" and then link to them in your outline pages.
3333

3434
## 6. Pull your pages
3535

36-
First, determine the id of your root page by clicking "Share" and looking at the url it gives you. E.g.
37-
https://www.notion.so/hattonjohn/My-Docs-0456aa5842946bdbea3a4f37c97a0e5
38-
means that the id is "0456aa5842946PRETEND4f37c97a0e5".
36+
First, determine the ID of your root page by clicking "Share" and looking at the url it gives you. E.g.
37+
`https://www.notion.so/hattonjohn/My-Docs-0456aa5842946PRETEND4f37c97a0e5`
38+
means that the ID is `0456aa5842946PRETEND4f37c97a0e5`.
3939

4040
Try it out:
4141

@@ -114,26 +114,27 @@ NOTE: if you just localize an image, it will not get picked up. You also must lo
114114

115115
# Automated builds with Github Actions
116116

117-
Here is a working Github Action script to copy and customize: https://github.com/BloomBooks/bloom-docs/blob/master/.github/workflows/release.yml
117+
Here is a [working Github Action script to copy and customize](https://github.com/BloomBooks/bloom-docs/blob/master/.github/workflows/release.yml).
118118

119119
# Command line
120120

121-
Usage: docu-notion -n <token> -r <root> [options]
121+
Usage: `docu-notion -n <token> -r <root> [options]`
122122

123123
Options:
124124

125125
| flag | required? | description |
126126
| ------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
127-
| -n, --notion-token <string> | required | notion api token, which looks like `secret_3bc1b50XFYb15123RHF243x43450XFY33250XFYa343` |
128-
| -r, --root-page <string> | required | The 31 character ID of the page which is the root of your docs page in notion. The code will look like `9120ec9960244ead80fa2ef4bc1bba25`. This page must have a child page named 'Outline' |
129-
| -m, --markdown-output-path <string> | | Root of the hierarchy for md files. WARNING: node-pull-mdx will delete files from this directory. Note also that if it finds localized images, it will create an i18n/ directory as a sibling. (default: "./docs") |
130-
| -t, --status-tag <string> | | Database pages without a Notion page property 'status' matching this will be ignored. Use '\*' to ignore status altogether. (default: `Publish`) |
131-
| --locales <codes> | | Comma-separated list of iso 639-2 codes, the same list as in docusaurus.config.js, minus the primary (i.e. 'en'). This is needed for image localization. (default: []) |
132-
| -l, --log-level <level> | | Log level (choices: `info`, `verbose`, `debug`) |
133-
| -i, --img-output-path <string> | | Path to directory where images will be stored. If this is not included, images will be placed in the same directory as the document that uses them, which then allows for localization of screenshots. |
134-
| -p, --img-prefix-in-markdown <string> | | When referencing an image from markdown, prefix with this path instead of the full img-output-path. Should be used only in conjunction with --img-output-path. |
135-
| --require-slugs | | If set, docu-notion will fail if any pages it would otherwise publish are missing a slug in Notion. |
136-
| -h, --help | | display help for command |
127+
| `-n, --notion-token <string>` | required | notion api token, which looks like `secret_3bc1b50XFYb15123RHF243x43450XFY33250XFYa343` |
128+
| `-r, --root-page <string>` | required | The 31 character ID of the page which is the root of your docs page in notion. The code will look like `9120ec9960244ead80fa2ef4bc1bba25`. This page must have a child page named 'Outline' |
129+
| `-m, --markdown-output-path <string>` | | Root of the hierarchy for md files. WARNING: node-pull-mdx will delete files from this directory. Note also that if it finds localized images, it will create an i18n/ directory as a sibling. (default: `./docs`) |
130+
| `-t, --status-tag <string>` | | Database pages without a Notion page property 'status' matching this will be ignored. Use '\*' to ignore status altogether. (default: `Publish`) |
131+
| `--locales <codes>` | | Comma-separated list of iso 639-2 codes, the same list as in docusaurus.config.js, minus the primary (i.e. 'en'). This is needed for image localization. (default: `[]`) |
132+
| `-l, --log-level <level>` | | Log level (choices: `info`, `verbose`, `debug`) |
133+
| `-i, --img-output-path <string>` | | Path to directory where images will be stored. If this is not included, images will be placed in the same directory as the document that uses them, which then allows for localization of screenshots. |
134+
| `-p, --img-prefix-in-markdown <string>` | | When referencing an image from markdown, prefix with this path instead of the full img-output-path. Should be used only in conjunction with --img-output-path. |
135+
| `--require-slugs` | | If set, docu-notion will fail if any pages it would otherwise publish are missing a slug in Notion. |
136+
| `--image-file-name-format <format>` | | choices:<ul><li>`default`: {page slug (if any)}.{image block ID}</li><li>`content-hash`: Use a hash of the image content.</li><li>`legacy`: Use the legacy (before v0.16) method of determining file names. Set this to maintain backward compatibility.</li></ul>All formats will use the original file extension. |
137+
| `-h, --help` | | display help for command |
137138

138139
# Plugins
139140

@@ -155,8 +156,10 @@ The default admonition type, if no matching icon is found, is "note".
155156
# Known Workarounds
156157

157158
### Start a numbered list at a number other than 1
159+
158160
In Notion, make sure the block is "Text," not "Numbered List".
161+
159162
- But make sure the number does NOT have a space in front of it. This can/will cause issues with sub-list items.
160163
- One way to get Notion to let you do this:
161-
- Create a numbered list item where the text duplicates the number you want. Convert that numbered list item to "Text."
162-
- i.e. Type "1. 1. Item one." Notion makes the first "1." into a number in a list. When you convert back to "Text," you're left with plain text "1. Item one."
164+
- Create a numbered list item where the text duplicates the number you want. Convert that numbered list item to "Text."
165+
- i.e. Type "1. 1. Item one." Notion makes the first "1." into a number in a list. When you convert back to "Text," you're left with plain text "1. Item one."

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@
1717
"pull-sample-site": "npm run ts -- -n $DOCU_NOTION_INTEGRATION_TOKEN -r $DOCU_NOTION_SAMPLE_ROOT_PAGE --log-level debug",
1818
"// test with a semi-stable/public site:": "",
1919
"pull-sample": "npm run ts -- -n $DOCU_NOTION_INTEGRATION_TOKEN -r $DOCU_NOTION_SAMPLE_ROOT_PAGE -m ./sample --locales en,es,fr,de --log-level verbose",
20-
"pull-sample-with-paths": "npm run ts -- -n $DOCU_NOTION_INTEGRATION_TOKEN -r $DOCU_NOTION_SAMPLE_ROOT_PAGE -m ./sample --img-output-path ./sample_img"
20+
"pull-sample-with-paths": "npm run ts -- -n $DOCU_NOTION_INTEGRATION_TOKEN -r $DOCU_NOTION_SAMPLE_ROOT_PAGE -m ./sample --img-output-path ./sample_img",
21+
"lint": "eslint . --ext .ts"
2122
},
2223
"//file-type": "have to use this version before they switched to ESM, which gives a compile error related to require()",
2324
"//chalk@4": "also ESM related problem",

src/MakeImagePersistencePlan.ts

Lines changed: 52 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,13 @@ import { ImageSet } from "./images";
22
import * as Path from "path";
33
import { error } from "./log";
44
import { exit } from "process";
5+
import crypto from "crypto";
6+
import { DocuNotionOptions } from "./pull";
57

68
export function makeImagePersistencePlan(
9+
options: DocuNotionOptions,
710
imageSet: ImageSet,
11+
imageBlockId: string,
812
imageOutputRootPath: string,
913
imagePrefix: string
1014
): void {
@@ -23,23 +27,55 @@ export function makeImagePersistencePlan(
2327
}
2428
}
2529

26-
// Since most images come from pasting screenshots, there isn't normally a filename. That's fine, we just make a hash of the url
27-
// Images that are stored by notion come to us with a complex url that changes over time, so we pick out the UUID that doesn't change. Example:
28-
// https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject
29-
// But around Sept 2023, they changed the url to be something like:
30-
// https://prod-files-secure.s3.us-west-2.amazonaws.com/d9a2b712-cf69-4bd6-9d65-87a4ceeacca2/d1bcdc8c-b065-4e40-9a11-392aabeb220e/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20230915%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20230915T161258Z&X-Amz-Expires=3600&X-Amz-Signature=28fca48e65fba86d539c3c4b7676fce1fa0857aa194f7b33dd4a468ecca6ab24&X-Amz-SignedHeaders=host&x-id=GetObject
31-
// The thing we want is the last UUID before the ?
30+
if (options.imageFileNameFormat === "legacy") {
31+
// Original behavior and comment:
32+
// Since most images come from pasting screenshots, there isn't normally a filename. That's fine, we just make a hash of the url
33+
// Images that are stored by notion come to us with a complex url that changes over time, so we pick out the UUID that doesn't change. Example:
34+
// https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject
35+
// But around Sept 2023, they changed the url to be something like:
36+
// https://prod-files-secure.s3.us-west-2.amazonaws.com/d9a2b712-cf69-4bd6-9d65-87a4ceeacca2/d1bcdc8c-b065-4e40-9a11-392aabeb220e/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20230915%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20230915T161258Z&X-Amz-Expires=3600&X-Amz-Signature=28fca48e65fba86d539c3c4b7676fce1fa0857aa194f7b33dd4a468ecca6ab24&X-Amz-SignedHeaders=host&x-id=GetObject
37+
// The thing we want is the last UUID before the ?
38+
const thingToHash = findLastUuid(urlBeforeQuery) ?? urlBeforeQuery;
3239

33-
const thingToHash = findLastUuid(urlBeforeQuery) ?? urlBeforeQuery;
40+
const hash = hashOfString(thingToHash);
41+
imageSet.outputFileName = `${hash}.${imageFileExtension}`;
42+
} else if (options.imageFileNameFormat === "content-hash") {
43+
// This was requested by a user: https://github.com/sillsdev/docu-notion/issues/76.
44+
// We chose not to include it in the default file name because we want to maintain
45+
// as much stability in the file name as feasible for an image localization workflow.
46+
// However, particularly in a workflow which is not concerned with localization,
47+
// this could be a good option. One benefit is that the image only needs to exist once
48+
// in the file system regardless of how many times it is used in the site.
49+
const imageHash = hashOfBufferContent(imageSet.primaryBuffer!);
50+
imageSet.outputFileName = `${imageHash}.${imageFileExtension}`;
51+
} else {
52+
// We decided not to do this for the default format because it means
53+
// instability for the file name in Crowdin, which causes loss of localizations.
54+
// If we decide to include it in the future, we should add a unit test.
55+
// const imageFileName = Path.basename(urlBeforeQuery);
56+
// const imageFileNameWithoutExtension = Path.parse(imageFileName).name;
57+
// const originalFileNamePart = ["untitled", "unnamed"].includes(
58+
// imageFileNameWithoutExtension.toLocaleLowerCase()
59+
// )
60+
// ? ""
61+
// : `${imageFileNameWithoutExtension.substring(0, 50)}.`;
3462

35-
const hash = hashOfString(thingToHash);
36-
imageSet.outputFileName = `${hash}.${imageFileExtension}`;
63+
// Format is page slug (if there is one) followed by the image block ID from Notion.
64+
// The image block ID will remain stable as long as any changes to the image are done
65+
// using the Replace feature. Also, image blocks can be moved using the Move To feature.
66+
// We decided to include the page slug for easier workflow during localization, particularly in Crowdin.
67+
// The block ID is a unique GUID and thus provides a unique file name.
68+
const pageSlugPart = imageSet.pageInfo?.slug
69+
? `${imageSet.pageInfo.slug.replace(/^\//, "")}.`
70+
: "";
71+
imageSet.outputFileName = `${pageSlugPart}${imageBlockId}.${imageFileExtension}`;
72+
}
3773

3874
imageSet.primaryFileOutputPath = Path.posix.join(
3975
imageOutputRootPath?.length > 0
4076
? imageOutputRootPath
41-
: imageSet.pathToParentDocument!,
42-
imageSet.outputFileName
77+
: imageSet.pageInfo!.directoryContainingMarkdown,
78+
decodeURI(imageSet.outputFileName)
4379
);
4480

4581
if (imageOutputRootPath && imageSet.localizedUrls.length) {
@@ -73,3 +109,8 @@ export function hashOfString(s: string): number {
73109

74110
return Math.abs(hash);
75111
}
112+
113+
function hashOfBufferContent(buffer: Buffer): string {
114+
const hash = crypto.createHash("sha256").update(buffer).digest("hex");
115+
return hash.slice(0, 20);
116+
}

0 commit comments

Comments
 (0)