Commit 925dd1c
authored
feat(conversion): extract pages in conversion step (#260)
Because
- The service layer is ready for adding the following metadata:
- Page length to file
- Page delimiters to converted file
- Page references to chunk
This commit
- Adds utility functions for page extraction
- Returns one string per page on conversion pipeline
- Parses conversion results from pipeline
- Adds page length check in integration-tests1 parent 56f0368 commit 925dd1c
File tree
12 files changed
+317
-68
lines changed- integration-test
- proto/artifact/artifact/v1alpha
- pkg/service
- preset/pipelines/indexing-advanced-convert-doc
- v1.3.2
- v1.4.0
12 files changed
+317
-68
lines changedLines changed: 35 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
418 | 443 | | |
419 | 444 | | |
420 | 445 | | |
| |||
484 | 509 | | |
485 | 510 | | |
486 | 511 | | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
487 | 516 | | |
488 | 517 | | |
489 | 518 | | |
| |||
526 | 555 | | |
527 | 556 | | |
528 | 557 | | |
529 | | - | |
530 | | - | |
| 558 | + | |
| 559 | + | |
531 | 560 | | |
532 | | - | |
533 | | - | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
534 | 565 | | |
535 | 566 | | |
536 | 567 | | |
| |||
Lines changed: 0 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
380 | 380 | | |
381 | 381 | | |
382 | 382 | | |
383 | | - | |
384 | 383 | | |
385 | 384 | | |
386 | 385 | | |
| |||
395 | 394 | | |
396 | 395 | | |
397 | 396 | | |
398 | | - | |
399 | 397 | | |
400 | 398 | | |
401 | 399 | | |
| |||
Lines changed: 26 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
37 | 46 | | |
38 | 47 | | |
39 | 48 | | |
40 | 49 | | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | 50 | | |
46 | 51 | | |
47 | 52 | | |
| |||
50 | 55 | | |
51 | 56 | | |
52 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
53 | 75 | | |
54 | 76 | | |
55 | 77 | | |
| |||
Lines changed: 0 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
| |||
40 | 39 | | |
41 | 40 | | |
42 | 41 | | |
43 | | - | |
44 | 42 | | |
45 | 43 | | |
46 | 44 | | |
| |||
59 | 57 | | |
60 | 58 | | |
61 | 59 | | |
62 | | - | |
63 | 60 | | |
64 | 61 | | |
65 | 62 | | |
| |||
70 | 67 | | |
71 | 68 | | |
72 | 69 | | |
73 | | - | |
74 | 70 | | |
75 | 71 | | |
76 | 72 | | |
| |||
82 | 78 | | |
83 | 79 | | |
84 | 80 | | |
85 | | - | |
86 | 81 | | |
87 | 82 | | |
88 | 83 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
340 | 340 | | |
341 | 341 | | |
342 | 342 | | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
343 | 356 | | |
344 | 357 | | |
345 | 358 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
42 | | - | |
43 | | - | |
44 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
45 | 47 | | |
46 | | - | |
| 48 | + | |
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
50 | | - | |
| 52 | + | |
51 | 53 | | |
52 | 54 | | |
| 55 | + | |
| 56 | + | |
53 | 57 | | |
54 | | - | |
55 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
56 | 65 | | |
57 | | - | |
58 | | - | |
59 | | - | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
60 | 79 | | |
61 | 80 | | |
62 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
63 | 90 | | |
64 | 91 | | |
65 | 92 | | |
| |||
107 | 134 | | |
108 | 135 | | |
109 | 136 | | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
115 | 143 | | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
116 | 148 | | |
117 | 149 | | |
118 | 150 | | |
| |||
130 | 162 | | |
131 | 163 | | |
132 | 164 | | |
133 | | - | |
| 165 | + | |
134 | 166 | | |
135 | 167 | | |
136 | 168 | | |
137 | 169 | | |
138 | | - | |
| 170 | + | |
139 | 171 | | |
140 | 172 | | |
141 | 173 | | |
142 | 174 | | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
153 | 178 | | |
154 | 179 | | |
155 | 180 | | |
156 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
0 commit comments