Skip to content

Commit ce12474

Browse files
committed
docs(documentation): Use content addressable storage
1 parent 3da477d commit ce12474

File tree

1 file changed

+35
-81
lines changed

1 file changed

+35
-81
lines changed

rfcs/0017-incremental-build.md

Lines changed: 35 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,11 @@ This shall enable the following workflow:
4646

4747
1. **Action:** Build is started
4848
1. Task A, Task B and Task C are executed in sequence, writing their results into individual writer stages.
49-
1. *The writer stages and `cache-info.json` are serialized onto disk.*
49+
1. *Task outputs are written to a content-addressable store and the `cache-info.json` metadata is serialized to disk.*
5050
1. Build finishes and the resources of all writer stages and the source reader are combined and written into the target output directory.
5151
* Resources present in later writer stages (and higher versions) are preferred over competing resources with the same path.
5252
1. **Action:** A source file is modified and a new build is triggered
53-
1. *The `cache-info.json` is read from disk and the writer stages are imported into the project instance.*
53+
1. *The `cache-info.json` is read from disk, allowing the build to access cached content from the content-addressable store.*
5454
1. The build determines which tasks need to be executed using the imported cache and information about the modified source file.
5555
* In this example, it is determined that Task A and Task C need to be executed since they requested the modified resource in their previous execution.
5656
1. Task A is executed. The output is written into a new **version** (v2) of the associated writer stage.
@@ -59,18 +59,20 @@ This shall enable the following workflow:
5959
* Task A can't access v1 of its writer stage. It can only access the combined resources of all previous writer stages.
6060
1. The `Project Build Cache` determines whether the resources produced in this latest execution of Task A are relevant for Task B. If yes, the content of those resources is compared to the cached content of the resources Task B has received during its last execution. In this example, the output of Task A is not relevant for Task B and it is skipped.
6161
1. Task C is called and has access to both versions (v1 and v2) of the writer stage of Task A. Allowing it to access all resources produced in all previous executions of Task A.
62-
1. *Writer stages and `cache-info.json` are serialized onto disk.*
62+
1. *Task outputs are written to the content-addressable store and the `cache-info.json` is updated.*
6363
1. The build finishes. The combined resources of all writer stages and the source reader are written to the target output directory.
6464

6565
![Diagram illustrating an initial and a successive build leveraging the build cache](./resources/0017-incremental-build/Build_With_Cache.png)
6666

6767
### Cache Creation
6868

69-
The build cache shall be serialized onto disk in order to use it in successive UI5 Tooling executions. A standardized directory should be used for this, so that UI5 Tooling can automatically find and use the cache.
69+
The build cache shall be serialized onto disk in order to use it in successive UI5 Tooling executions. This will be done using a **Content-Addressable Store (CAS)** model, which separates file content from metadata. This ensures that each unique piece of content is stored only once on disk, greatly reducing disk space usage and improving I/O performance.
7070

71-
Every project has its own cache. This allows for reuse of a project's cache across multiple consuming projects. For example, the `sap.ui.core` library could be built once and the build cache can then be reused in the build of multiple applications.
71+
Every project has its own cache metadata. This allows for reuse of a project's cache across multiple consuming projects. For example, the `sap.ui.core` library could be built once and the build cache can then be reused in the build of multiple applications.
7272

73-
The cache consists of a `cache-info.json` file with the below data structure and multiple directories with the serialized writer stages.
73+
The cache consists of two main parts:
74+
1. A global **object store (the CAS)** where all file contents are stored, named by a hash of their content.
75+
2. A per-project `cache-info.json` file which acts as a lightweight **metadata index**, mapping logical file paths to their content hashes in the object store.
7476

7577
#### cache-info.json
7678

@@ -91,25 +93,19 @@ The cache consists of a `cache-info.json` file with the below data structure and
9193
"pathsRead": [],
9294
"patterns": []
9395
},
94-
"resourcesRead": {
95-
"/resources/project/namespace/Component.js": {
96-
"sha256": "d41d8cd98f00b204e9899998ecf8427e",
97-
"lastModified": 1734005532120
98-
}
96+
"inputs": {
97+
// Map of logical paths read to their content hashes
98+
"/resources/project/namespace/Component.js": "d41d8cd98f00b204e9899998ecf8427e"
9999
},
100-
"resourcesWritten": {
101-
"/resources/project/namespace/Component.js": {
102-
"sha256": "c1c77edc5c689a471b12fe8ba79c51d1",
103-
"lastModified": 1734005532120
104-
}
100+
"outputs": {
101+
// Map of logical paths written to their content hashes
102+
"/resources/project/namespace/Component.js": "c1c77edc5c689a471b12fe8ba79c51d1"
105103
}
106104
}
107105
}],
108106
"sourceMetadata": {
109-
"/resources/project/namespace/Component.js": {
110-
"sha256": "d41d8cd98f00b204e9800998ecf8427e",
111-
"lastModified": 1734005532120
112-
}
107+
// Map of source paths to their content hashes
108+
"/resources/project/namespace/Component.js": "d41d8cd98f00b204e9800998ecf8427e"
113109
}
114110
}
115111
````
@@ -121,83 +117,42 @@ The cache key can be used to identify the cache. It shall be based on the projec
121117

122118
**taskCache**
123119

124-
An array of objects, each representing a task that was executed during the build. The object contains the name of the task, the project resources that were read and written by the task, and the resources that were read from the project's dependencies. If the task used glob patterns to read resources, those patterns are stored instead of the resolved paths so that the pattern can later be matched against newly created resources that might invalidate the task.
125-
126-
For each resource that has been read or written, the SHA256 hash of the content and the timestamp of last modification are stored. This allows the UI5 Tooling to determine whether the resource has changed since the last build and whether the task cache is still valid.
120+
An array of objects, each representing a task that was executed during the build. The object contains the name of the task and its resource requests. `inputs` maps the logical path of resources read by the task to their content hash, and `outputs` does the same for resources written by the task. This hash acts as a pointer to the actual file content in the shared CAS object store. If the task used glob patterns to read resources, those patterns are stored so that they can be matched against newly created resources.
127121

128122
**sourceMetadata**
129123

130-
For each *source* file of the project, the SHA256 hash of the content and the timestamp of last modification are stored. This allows the UI5 Tooling to determine whether the source files have changed since the last build.
124+
For each *source* file of the project, this object maps the logical path to the SHA256 hash of its content. This allows the UI5 Tooling to quickly determine whether source files have changed since the last build.
131125

132126
#### Cache directory structure
133127

128+
The directory structure is flat and efficient. A global `cas/` directory stores all unique file contents from all builds, while project-specific directories contain only their lightweight metadata.
129+
134130
```
135131
.ui5-cache
132+
├── cas/ <-- Global Content-Addressable Store (shared across all projects)
133+
│ ├── c1c77edc5c689a471b12fe8ba79c51d1 (Content of one file)
134+
│ ├── d41d8cd98f00b204e9899998ecf8427e (Content of another file)
135+
│ └── ... (all other unique file contents)
136+
136137
├── openui5-sample-app-0.5.0-bb0a3262d093fcb9acf16
137-
│ ├── cache-info.json
138-
│ └── taskCache
139-
│ ├── 0-escapeNonAsciiCharacters
140-
│ │ └── resources
141-
│ ├── 3-minify
142-
│ │ └── resources
143-
│ ├── 4-enhanceManifest
144-
│ │ └── resources
145-
│ └── 6-generateComponentPreload
146-
│ └── resources
138+
│ └── cache-info.json
147139
├── sap.m-1.132.0-SNAPSHOT-bb0a3262d093fcb9acf16
148-
│ ├── cache-info.json
149-
│ └── taskCache
150-
│ ├── 0-escapeNonAsciiCharacters
151-
│ │ └── test-resources
152-
│ ├── 1-replaceCopyright
153-
│ │ ├── resources
154-
│ │ └── test-resources
155-
│ ├── 2-replaceVersion
156-
│ │ └── resources
157-
│ ├── 4-minify
158-
│ │ └── resources
159-
│ ├── 5-generateLibraryManifest
160-
│ │ └── resources
161-
│ ├── 7-generateLibraryPreload
162-
│ │ └── resources
163-
│ └── 8-buildThemes
164-
│ └── resources
140+
│ └── cache-info.json
165141
└── sap.ui.core-1.132.0-SNAPSHOT-bb0a3262d093fcb9acf16
166-
├── cache-info.json
167-
└── taskCache
168-
├── 0-escapeNonAsciiCharacters
169-
│ └── test-resources
170-
├── 1-replaceCopyright
171-
│ ├── resources
172-
│ └── test-resources
173-
├── 2-replaceVersion
174-
│ ├── resources
175-
│ └── test-resources
176-
├── 3-replaceBuildtime
177-
│ └── resources
178-
├── 4-minify
179-
│ └── resources
180-
├── 5-generateLibraryManifest
181-
│ └── resources
182-
├── 7-generateLibraryPreload
183-
│ └── resources
184-
├── 8-generateBundle
185-
│ └── resources
186-
└── 9-buildThemes
187-
└── resources
142+
└── cache-info.json
188143
```
189144

190-
The directories inside `taskCache/` shall each represent a writer stage, prefixed by an integer number reflecting the order of creation in the build. The directories contain all resources that have been *written* by the task associated with that stage.
145+
All unique file contents from all projects and their builds are stored **once** in the global `cas` directory, named by their content hash. This automatic deduplication leads to significant disk space savings.
191146

192147
![Diagram illustrating the creation of a build cache](./resources/0017-incremental-build/Create_Cache.png)
193148

194149
### Cache Import
195150

196-
Before building a project, UI5 Tooling shall scan for a cache directory with the respective cache key and import the cache if one is found.
151+
Before building a project, UI5 Tooling shall scan for a cache directory with the respective cache key and import the cache if one is found.
197152

198-
The import process mainly populates the `Build Task Cache` instances with the information from the `cache-info.json` file and creates readers for the individual `taskCache` directories (representing the writers of each task's previous execution). Those readers are then set as the initial version (v1) writer stages in the corresponding `Project` instance.
153+
The import process is very fast, as it only involves reading the lightweight `cache-info.json` file to populate the `Build Task Cache` instances with their metadata. When the build process needs to access a cached resource, it uses the metadata map to find the content hash and reads the corresponding file directly from the global `cas` store.
199154

200-
This allows executing individual tasks and provide them with the results of all tasks that would normally have been executed before them. Also, the task can decide to only process a few changed resources while the build result will still contain all resources that were written by any of the the task's previous executions.
155+
This allows executing individual tasks and providing them with the results of all preceding tasks without the overhead of creating numerous file system readers or managing physical copies of files for each build stage.
201156

202157
![Diagram illustrating the import of a build cache](./resources/0017-incremental-build/Import_Cache.png)
203158

@@ -229,7 +184,7 @@ The UI5 Tooling server shall integrate the incremental build as a mean to pre-pr
229184

230185
Middleware like `serveThemes` (used for compiling LESS resources to CSS) would become obsolete with this, since the `buildThemes` task will be executed instead.
231186

232-
If any project (root or dependency) configures custom tasks, those tasks are executed in the server as well. This makes it possible to easily integrate projects with custom tasks as dependencies.
187+
If any project (root or dependency) defines custom tasks, those tasks are executed in the server as well. This makes it possible to easily integrate projects with custom tasks as dependencies.
233188

234189
Since executing a full build requires more time than the on-the-fly processing of resources currently implemented in the UI5 Tooling server, users shall be able to disable individual tasks that are not necessarily needed during development. This can be done using CLI parameters as well as ui5.yaml configuration.
235190

@@ -256,7 +211,7 @@ All of this should be communicated in the UI5 Tooling documentation and in blog
256211
* Projects might have to adapt their configurations
257212
* Custom tasks might need to be adapted. Before they could only access the sources of a project. With this change, they will access the build result instead. Access to the sources is still possible but requires the use of a dedicated API
258213
* UI5 Tooling standard tasks need to be adapted to use the new cache API. Especially the bundling tasks currently have no concept for partially re-creating bundles. However, this is an essential requirement to achieve fast incremental builds.
259-
* The project build cache might become very large and consume a lot of disk space. On systems with restricted disk space or slow I/O operations, this could lead to a worse performance.
214+
* While the content-addressable cache is highly efficient at deduplication, the central cache can still grow very large over time. A robust purging mechanism is critical for managing disk space.
260215

261216
## Alternatives
262217

@@ -267,10 +222,9 @@ An alternative to using the incremental build in the UI5 Tooling server would be
267222
* Measure performance in BAS. Find out whether this approach results in acceptable performance.
268223
* How to distinguish projects with build cache from pre-built projects (with project manifest)
269224
* Cache related topics
270-
* Clarify cache key
225+
* Clarify cache key
271226
* Current POC: project version + dependency versions + build config + UI5 Tooling module versions
272227
* Include resource tags in cache
273-
* Compress cache to reduce memory pressure
274228
* Allow tasks to store additional information in the cache
275229
* Cache Purging
276230
* Some tasks might be relevant for the server only (e.g. code coverage), come up with a way to configure that

0 commit comments

Comments
 (0)