Skip to content

Commit 637bcdf

Browse files
committed
Add TerminusDB internals pages
1 parent a3094ed commit 637bcdf

File tree

5 files changed

+733
-2
lines changed

5 files changed

+733
-2
lines changed
Lines changed: 393 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,393 @@
1+
---
2+
title: Document Unfolding and Cycle Detection Reference
3+
nextjs:
4+
metadata:
5+
title: Document Unfolding and Cycle Detection Reference
6+
description: Understanding @unfoldable annotation, cycle detection, and performance characteristics of document traversal in TerminusDB
7+
keywords: TerminusDB, @unfoldable, document unfolding, cycle detection, self-referencing documents, performance
8+
openGraph:
9+
images: https://assets.terminusdb.com/docs/technical-documentation-terminuscms-og.png
10+
alternates:
11+
canonical: https://terminusdb.org/docs/document-unfolding-reference/
12+
media: []
13+
---
14+
15+
TerminusDB provides automatic document unfolding for linked documents marked with the `@unfoldable` schema annotation. This reference guide explains how unfolding works, how cycle detection prevents infinite recursion, and performance characteristics of the implementation.
16+
17+
--> Valid as of the 11.2 release.
18+
19+
## What is Document Unfolding?
20+
21+
Document unfolding is the process of automatically expanding referenced documents when retrieving data through the Document API, GraphQL, or WOQL. When a class is marked with `@unfoldable: []`, any references to documents of that class are automatically expanded inline instead of returning just an ID reference.
22+
23+
### Example Schema
24+
25+
```json
26+
{
27+
"@type": "Class",
28+
"@id": "Person",
29+
"@unfoldable": [],
30+
"name": "xsd:string",
31+
"friend": {
32+
"@type": "Set",
33+
"@class": "Person"
34+
}
35+
}
36+
```
37+
38+
### Unfolded vs Non-Unfolded Results
39+
40+
**Without `@unfoldable` (Reference Only):**
41+
```json
42+
{
43+
"@id": "Person/Alice",
44+
"@type": "Person",
45+
"name": "Alice",
46+
"friend": "Person/Bob" // Just an ID string
47+
}
48+
```
49+
50+
**With `@unfoldable` (Automatically Expanded):**
51+
```json
52+
{
53+
"@id": "Person/Alice",
54+
"@type": "Person",
55+
"name": "Alice",
56+
"friend": [
57+
{
58+
"@id": "Person/Bob",
59+
"@type": "Person",
60+
"name": "Bob"
61+
}
62+
]
63+
}
64+
```
65+
66+
## Cycle Detection
67+
68+
When documents reference themselves directly or indirectly, TerminusDB's cycle detection mechanism prevents infinite recursion while ensuring all nodes are properly rendered.
69+
70+
### How Cycle Detection Works
71+
72+
The unfolding implementation uses a **path stack** to track the current traversal from root to the current node. When a document ID is encountered that's already in the current path, a cycle is detected:
73+
74+
1. **Path Stack Maintained**: As traversal descends into children, document IDs are pushed onto the stack
75+
2. **Cycle Check**: Before expanding a document, check if its ID is already in the current path
76+
3. **ID Reference Returned**: If a cycle is detected, return just the `@id` string instead of expanding
77+
4. **Backtrack**: When returning from a child, pop its ID from the stack
78+
79+
### Cycle Detection Behavior Examples
80+
81+
#### Direct Self-Reference
82+
83+
**Schema:**
84+
```json
85+
{
86+
"@type": "Class",
87+
"@id": "LinguisticObject",
88+
"@unfoldable": [],
89+
"name": "xsd:string",
90+
"partOf": {
91+
"@type": "Set",
92+
"@class": "LinguisticObject"
93+
}
94+
}
95+
```
96+
97+
**Data:**
98+
```json
99+
{
100+
"@id": "LinguisticObject/self",
101+
"@type": "LinguisticObject",
102+
"name": "Self Referencing",
103+
"partOf": ["LinguisticObject/self"] // Points to itself
104+
}
105+
```
106+
107+
**Result:**
108+
```json
109+
{
110+
"@id": "LinguisticObject/self",
111+
"@type": "LinguisticObject",
112+
"name": "Self Referencing",
113+
"partOf": ["LinguisticObject/self"] // ID string, not expanded
114+
}
115+
```
116+
117+
#### Circular Reference Chain (A→B→A)
118+
119+
**Data:**
120+
```json
121+
[
122+
{
123+
"@id": "Node/A",
124+
"@type": "Node",
125+
"name": "Node A",
126+
"next": "Node/B"
127+
},
128+
{
129+
"@id": "Node/B",
130+
"@type": "Node",
131+
"name": "Node B",
132+
"next": "Node/A" // Back to A
133+
}
134+
]
135+
```
136+
137+
**Result (retrieving Node/A):**
138+
```json
139+
{
140+
"@id": "Node/A",
141+
"@type": "Node",
142+
"name": "Node A",
143+
"next": {
144+
"@id": "Node/B",
145+
"@type": "Node",
146+
"name": "Node B",
147+
"next": "Node/A" // Cycle detected, ID string returned
148+
}
149+
}
150+
```
151+
152+
#### Multiple Circular Paths
153+
154+
For complex graphs with multiple interconnected cycles, each path is tracked independently. Nodes are expanded until they appear again in the current traversal path.
155+
156+
**Graph:**
157+
```
158+
A → B → C → A (cycle)
159+
A → D → A (cycle)
160+
B → D
161+
```
162+
163+
The cycle detection ensures no node is expanded more than once per path, preventing infinite recursion while rendering all reachable nodes.
164+
165+
### Deep Nested Structures
166+
167+
For long chains (e.g., 100+ nodes without cycles), TerminusDB traverses the entire structure:
168+
169+
```json
170+
{
171+
"@id": "ChainNode/0",
172+
"value": 0,
173+
"next": {
174+
"@id": "ChainNode/1",
175+
"value": 1,
176+
"next": {
177+
"@id": "ChainNode/2",
178+
"value": 2,
179+
// ... continues for all 100 nodes
180+
}
181+
}
182+
}
183+
```
184+
185+
## Work Limit Protection
186+
187+
To prevent excessive resource consumption during document unfolding, TerminusDB implements a work limit that caps the total number of operations during traversal.
188+
189+
### Configuration
190+
191+
**Environment Variable:** `TERMINUSDB_DOC_WORK_LIMIT`
192+
193+
**Default:** 500,000 operations
194+
195+
**Setting Custom Limit:**
196+
```bash
197+
# Linux/macOS
198+
export TERMINUSDB_DOC_WORK_LIMIT=1000000
199+
200+
# Docker
201+
docker run -e TERMINUSDB_DOC_WORK_LIMIT=1000000 terminusdb/terminusdb-server:latest
202+
203+
# Kubernetes ConfigMap
204+
env:
205+
- name: TERMINUSDB_DOC_WORK_LIMIT
206+
value: "1000000"
207+
```
208+
209+
### When Work Limit is Exceeded
210+
211+
If document traversal exceeds the work limit:
212+
213+
1. **Traversal Terminates**: Document retrieval stops
214+
2. **Error Returned**: Returns `DocRetrievalError::LimitExceeded`
215+
3. **Partial Results**: No partial data is returned
216+
4. **Document IRI Included**: Error message includes the document IRI that triggered the limit
217+
218+
**Recommended Limits by Use Case:**
219+
220+
| Use Case | Recommended Limit | Rationale |
221+
|----------|-------------------|-----------|
222+
| Simple documents | 100,000 | Default for most use cases |
223+
| Complex hierarchies | 500,000 (default) | Balanced performance/safety |
224+
| Large knowledge graphs | 1,000,000 - 5,000,000 | Deep traversals needed |
225+
| Real-time APIs | 50,000 - 100,000 | Prioritize response time |
226+
227+
## Performance Characteristics
228+
229+
### Path Stack Implementation
230+
231+
TerminusDB uses a **Vec-based path stack** for cycle detection, which is optimal for this use case:
232+
233+
**Why Vec (not HashSet):**
234+
- **Path stack semantics**: The `visited` collection tracks the current DFS path, not all visited nodes
235+
- **Small size**: Path depth is typically 10-50 nodes, not thousands
236+
- **Cache-friendly**: Sequential access pattern
237+
- **Stack mirroring**: Push/pop operations naturally mirror traversal stack
238+
239+
Performance benchmarks show approx double speed of Vec across both small and large documents.
240+
241+
**Empirical Results:**
242+
- For path depth < 100: Vec is faster than HashSet (no hash overhead)
243+
- For path depth > 100: Difference is negligible in practice
244+
- Real-world path depths: typically 10-50 nodes
245+
246+
### Schema Design Recommendations
247+
248+
**1. Limit Depth:**
249+
```json
250+
{
251+
"@type": "Class",
252+
"@id": "Category",
253+
"@unfoldable": [],
254+
"name": "xsd:string",
255+
"parent": {
256+
"@type": "Optional",
257+
"@class": "Category" // Parent-child hierarchy
258+
},
259+
"subcategories": {
260+
"@type": "Set",
261+
"@class": "SubCategory" // Use different class for children
262+
}
263+
}
264+
```
265+
266+
**2. Separate Unfoldable and Non-Unfoldable Relationships:**
267+
```json
268+
{
269+
"@type": "Class",
270+
"@id": "Person",
271+
"@unfoldable": [],
272+
"name": "xsd:string",
273+
"profile": {
274+
"@type": "Optional",
275+
"@class": "Profile" // Profile is @unfoldable
276+
},
277+
"posts": {
278+
"@type": "Set",
279+
"@class": "Post" // Post is NOT @unfoldable (too many)
280+
}
281+
}
282+
```
283+
284+
**3. Use Optional or Set/Cardinality for Potentially Circular References:**
285+
```json
286+
{
287+
"@type": "Class",
288+
"@id": "Node",
289+
"@unfoldable": [],
290+
"next": {
291+
"@type": "Optional", // Allows termination, similar to Set/Cardinality
292+
"@class": "Node"
293+
}
294+
}
295+
```
296+
297+
## Troubleshooting
298+
299+
### Document Retrieval Returns Just IDs
300+
301+
**Symptom:** Expected nested objects, got ID strings
302+
303+
**Cause:** Cycle detected or class not marked `@unfoldable`
304+
305+
**Solution:**
306+
1. Verify class has `@unfoldable: []` annotation
307+
2. Check if circular reference exists (expected behavior)
308+
3. Review schema for proper unfoldable annotations
309+
310+
### Work Limit Exceeded Errors
311+
312+
**Symptom:** `DocRetrievalError::LimitExceeded` during retrieval
313+
314+
**Cause:** Document graph too large or deeply nested
315+
316+
**Solutions:**
317+
1. **Increase limit**: Set `TERMINUSDB_DOC_WORK_LIMIT` environment variable
318+
2. **Reduce unfoldable depth**: Mark fewer classes as `@unfoldable`
319+
3. **Break circular references**: Ensure proper data structure
320+
4. **Use pagination**: Fetch large collections separately
321+
322+
### Performance Degradation
323+
324+
**Symptom:** Slow document retrieval
325+
326+
**Cause:** Large unfoldable graphs
327+
328+
**Solutions:**
329+
1. **Profile query**: Check path depth and node count
330+
2. **Reduce unfoldable scope**: Only unfold necessary relationships
331+
332+
## API Examples
333+
334+
### Document API
335+
336+
```bash
337+
# Retrieve with automatic unfolding (default)
338+
curl -X GET "http://localhost:6363/api/document/admin/mydb" \
339+
-H "Authorization: Basic YWRtaW46cm9vdA==" \
340+
-d '{"graph_type": "instance", "id": "Person/Alice", "as_list": true}'
341+
```
342+
343+
### GraphQL
344+
345+
```graphql
346+
# Unfolding happens automatically for @unfoldable classes
347+
query {
348+
Person {
349+
name
350+
friend { # Automatically expanded
351+
name
352+
friend { # Nested expansion
353+
name
354+
}
355+
}
356+
}
357+
}
358+
```
359+
360+
### WOQL
361+
362+
```javascript
363+
// Using WOQL to read documents with unfolding
364+
WOQL.read_document("Person/Alice", "v:Doc")
365+
```
366+
367+
## Related Documentation
368+
369+
- [Schema Reference Guide](/docs/schema-reference-guide) - Complete schema annotation reference
370+
- [Document API Reference](/docs/document-insertion) - HTTP API for documents
371+
- [GraphQL Reference](/docs/graphql-query-reference) - GraphQL query syntax
372+
- [Path Queries](/docs/path-query-reference-guide) - Advanced path traversal
373+
374+
## Summary
375+
376+
**Key Takeaways:**
377+
- `@unfoldable` automatically expands linked documents
378+
- Cycle detection prevents infinite recursion using path stack
379+
- Vec-based implementation is optimal for path-bounded traversal
380+
- `TERMINUSDB_DOC_WORK_LIMIT` protects against excessive operations
381+
- ID references returned when cycles detected (not an error)
382+
- Path depth typically 10-50 nodes (not total document count)
383+
384+
**Performance Notes:**
385+
- Vec path stack: O(d) lookup where d = depth (typically < 50)
386+
- Work limit default: 500,000 operations
387+
- Memory overhead: 8 bytes per path depth level
388+
- Cache-friendly sequential access pattern
389+
390+
---
391+
392+
**Last Updated:** October 31, 2025
393+
**Applies to:** TerminusDB 11.2+

0 commit comments

Comments
 (0)