Skip to content

Commit 4c1f508

Browse files
authored
feat(component,data,artifact): update instillartifact component to latest protobuf (#1139)
### Because - The artifact-backend protobuf definitions have been updated to use "knowledge-base" terminology instead of "catalog", "chunk-type" instead of "content-type", and changed API structures (CreateFileRequest now takes a File object instead of individual fields) - API endpoints were renamed (GetObjectURL → GetObjectURLAdmin, GetObject → GetObjectAdmin) - NextPageToken field changed from string to *string (optional pointer) - Various enum values and field names were updated in the latest protobuf generation ### This commit - Updated all instillartifact component tasks to use new field names: KnowledgeBaseId, Id, ChunkType, File object structure - Changed API calls in external.go from GetObjectURL/GetObject to GetObjectURLAdmin/GetObjectAdmin - Fixed pointer type handling for NextPageToken in pagination logic - Updated enum values (File_FILE_TYPE_PDF → File_TYPE_PDF, File_PROCESS_STATUS_* → FileProcessStatus_FILE_PROCESS_STATUS_*) - Changed from FileIds array to Filter string in ListFilesRequest - Regenerated component mocks to match updated protobuf definitions - Updated all test expectations to match new enum string representations (TYPE_PDF, UNIT_PAGE) - Fixed task definitions to use TASK_SEARCH instead of TASK_RETRIEVE, TASK_CREATE_FILE instead of TASK_UPLOAD_FILE, etc.
1 parent 3344938 commit 4c1f508

36 files changed

+7519
-10681
lines changed

.github/workflows/integration-test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ jobs:
9898
- name: Launch Instill Core CE (commit hash)
9999
working-directory: instill-core
100100
run: |
101-
make compose-dev EDITION=docker-ce:test ENV_SECRETS_COMPONENT=.env.secrets.component.test PIPELINE_BACKEND_VERSION=${{ env.COMMIT_SHORT_SHA }}
101+
make compose-dev EDITION=docker-ce:test CI=true PIPELINE_BACKEND_VERSION=${{ env.COMMIT_SHORT_SHA }}
102102
103103
- name: Run integration-test
104104
working-directory: pipeline-backend

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ require (
4545
github.com/h2non/filetype v1.1.3
4646
github.com/iancoleman/strcase v0.3.0
4747
github.com/influxdata/influxdb-client-go/v2 v2.14.0
48-
github.com/instill-ai/protogen-go v0.3.3-alpha.0.20250910144745-40a99f482d4b
48+
github.com/instill-ai/protogen-go v0.3.3-alpha.0.20251103134954-719cc9a5a07d
4949
github.com/instill-ai/usage-client v0.4.0
5050
github.com/instill-ai/x v0.10.0-alpha
5151
github.com/itchyny/gojq v0.12.17

go.sum

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -460,8 +460,8 @@ github.com/influxdata/influxdb-client-go/v2 v2.14.0 h1:AjbBfJuq+QoaXNcrova8smSjw
460460
github.com/influxdata/influxdb-client-go/v2 v2.14.0/go.mod h1:Ahpm3QXKMJslpXl3IftVLVezreAUtBOTZssDrjZEFHI=
461461
github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf h1:7JTmneyiNEwVBOHSjoMxiWAqB992atOeepeFYegn5RU=
462462
github.com/influxdata/line-protocol v0.0.0-20210922203350-b1ad95c89adf/go.mod h1:xaLFMmpvUxqXtVkUJfg9QmT88cDaCJ3ZKgdZ78oO8Qo=
463-
github.com/instill-ai/protogen-go v0.3.3-alpha.0.20250910144745-40a99f482d4b h1:N09H+4itA8AeNvstkqRP925zAGNirsr+qfaySooEJ8Q=
464-
github.com/instill-ai/protogen-go v0.3.3-alpha.0.20250910144745-40a99f482d4b/go.mod h1:bCnBosofpaUxKBuTTJM3/I3thAK37kvfBnKByjnLsl4=
463+
github.com/instill-ai/protogen-go v0.3.3-alpha.0.20251103134954-719cc9a5a07d h1:Ir3ak+T9Cf87aDdzRKBpBEOHXzdVDMdM8IOv/2A9m7U=
464+
github.com/instill-ai/protogen-go v0.3.3-alpha.0.20251103134954-719cc9a5a07d/go.mod h1:bCnBosofpaUxKBuTTJM3/I3thAK37kvfBnKByjnLsl4=
465465
github.com/instill-ai/usage-client v0.4.0 h1:xf1hAlO4a8lZwZzz9bprZOJqU3ghIcIsavUUB7UURyg=
466466
github.com/instill-ai/usage-client v0.4.0/go.mod h1:zZ9LRoXps2u63ARYPAbR2YvqTib3dWJLObZn+9YqhF0=
467467
github.com/instill-ai/x v0.10.0-alpha h1:I83WJc+21J+IgI4aJDn755ON/BX4cDvKCVVguI77r14=

pkg/component/ai/fireworksai/v0/fireworks_client_interface_mock_test.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/component/ai/groq/v0/groq_client_interface_mock_test.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/component/ai/ollama/v0/ollama_client_interface_mock.gen.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/component/application/freshdesk/v0/freshdesk_interface_mock_test.go

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/component/data/instillartifact/v0/README.mdx

Lines changed: 35 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,13 @@ description: "Learn about how to set up a Instill Artifact component https://git
77

88
The Instill Artifact component is a data component that allows users to access files and perform RAG-based search and retrieval through catalogs in the Instill Core platform.
99
It can carry out the following tasks:
10-
- [Upload File](#upload-file)
11-
- [Upload Files](#upload-files)
10+
- [Create File](#create-file)
11+
- [Create Files](#create-files)
1212
- [Get Files Metadata](#get-files-metadata)
1313
- [Get Chunks Metadata](#get-chunks-metadata)
1414
- [Get File in Markdown](#get-file-in-markdown)
1515
- [Match File Status](#match-file-status)
16-
- [Retrieve](#retrieve)
17-
- [Ask](#ask)
16+
- [Search](#search)
1817

1918
To use Artifact Component, you will need to set up the OpenAI API key for self-hosted deployment of Instill Core.
2019
You can do this by setting the `OPENAI_API_KEY` environment variable.
@@ -32,27 +31,27 @@ The component definition and tasks are defined in the [definition.yaml](https://
3231

3332
## Supported Tasks
3433

35-
### Upload File
34+
### Create File
3635

3736
Upload and process the files into chunks into Catalog.
3837

3938
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
4039

4140
| Input | Field ID | Type | Description |
4241
| :--- | :--- | :--- | :--- |
43-
| Task ID (required) | `task` | string | `TASK_UPLOAD_FILE` |
44-
| [Options](#upload-file-options) (required) | `options` | object | Choose to upload the files to existing catalog or create a new catalog. |
42+
| Task ID (required) | `task` | string | `TASK_CREATE_FILE` |
43+
| [Options](#create-file-options) (required) | `options` | object | Choose to upload the files to existing catalog or create a new catalog. |
4544

4645
</div>
4746

4847
<details>
4948
<summary>The <code>options</code> Object </summary>
5049

51-
<h4 id="upload-file-options">Options</h4>
50+
<h4 id="create-file-options">Options</h4>
5251

5352
`options` must fulfill one of the following schemas:
5453

55-
<h5 id="upload-file-existing-catalog"><code>Existing Catalog</code></h5>
54+
<h5 id="create-file-existing-catalog"><code>Existing Catalog</code></h5>
5655

5756

5857
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
@@ -66,7 +65,7 @@ Upload and process the files into chunks into Catalog.
6665
| Option | `option` | string | Must be `"existing catalog"` |
6766
</div>
6867

69-
<h5 id="upload-file-create-new-catalog"><code>Create New Catalog</code></h5>
68+
<h5 id="create-file-create-new-catalog"><code>Create New Catalog</code></h5>
7069

7170

7271
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
@@ -86,16 +85,16 @@ Upload and process the files into chunks into Catalog.
8685

8786
| Output | Field ID | Type | Description |
8887
| :--- | :--- | :--- | :--- |
89-
| [File](#upload-file-file) | `file` | object | Result of uploading file into catalog. |
88+
| [File](#create-file-file) | `file` | object | Result of uploading file into catalog. |
9089
| Status | `status` | boolean | The status of trigger file processing, if succeeded, return true. |
9190

9291
</div>
9392

9493

9594
<details>
96-
<summary> Output Objects in Upload File</summary>
95+
<summary> Output Objects in Create File</summary>
9796

98-
<h4 id="upload-file-file">File</h4>
97+
<h4 id="create-file-file">File</h4>
9998

10099
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
101100

@@ -113,27 +112,27 @@ Upload and process the files into chunks into Catalog.
113112
</details>
114113

115114

116-
### Upload Files
115+
### Create Files
117116

118117
Upload and process the files into chunks into Catalog.
119118

120119
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
121120

122121
| Input | Field ID | Type | Description |
123122
| :--- | :--- | :--- | :--- |
124-
| Task ID (required) | `task` | string | `TASK_UPLOAD_FILES` |
125-
| [Options](#upload-files-options) (required) | `options` | object | Choose to upload the files to existing catalog or create a new catalog. |
123+
| Task ID (required) | `task` | string | `TASK_CREATE_FILES` |
124+
| [Options](#create-files-options) (required) | `options` | object | Choose to upload the files to existing catalog or create a new catalog. |
126125

127126
</div>
128127

129128
<details>
130129
<summary>The <code>options</code> Object </summary>
131130

132-
<h4 id="upload-files-options">Options</h4>
131+
<h4 id="create-files-options">Options</h4>
133132

134133
`options` must fulfill one of the following schemas:
135134

136-
<h5 id="upload-files-existing-catalog"><code>Existing Catalog</code></h5>
135+
<h5 id="create-files-existing-catalog"><code>Existing Catalog</code></h5>
137136

138137

139138
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
@@ -147,7 +146,7 @@ Upload and process the files into chunks into Catalog.
147146
| Option | `option` | string | Must be `"existing catalog"` |
148147
</div>
149148

150-
<h5 id="upload-files-create-new-catalog"><code>Create New Catalog</code></h5>
149+
<h5 id="create-files-create-new-catalog"><code>Create New Catalog</code></h5>
151150

152151

153152
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
@@ -167,16 +166,16 @@ Upload and process the files into chunks into Catalog.
167166

168167
| Output | Field ID | Type | Description |
169168
| :--- | :--- | :--- | :--- |
170-
| [Files](#upload-files-files) | `files` | array[object] | Files metadata in catalog. |
169+
| [Files](#create-files-files) | `files` | array[object] | Files metadata in catalog. |
171170
| Status | `status` | boolean | The status of trigger file processing, if ALL succeeded, return true. |
172171

173172
</div>
174173

175174

176175
<details>
177-
<summary> Output Objects in Upload Files</summary>
176+
<summary> Output Objects in Create Files</summary>
178177

179-
<h4 id="upload-files-files">Files</h4>
178+
<h4 id="create-files-files">Files</h4>
180179

181180
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
182181

@@ -336,102 +335,22 @@ Check if the specified file's processing status is done.
336335

337336
</div>
338337

339-
### Retrieve
338+
### Search
340339

341-
search the chunks in the catalog.
340+
search the chunks in the knowledge base.
342341

343342
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
344343

345344
| Input | Field ID | Type | Description |
346345
| :--- | :--- | :--- | :--- |
347-
| Task ID (required) | `task` | string | `TASK_RETRIEVE` |
348-
| Catalog ID (required) | `catalog-id` | string | Catalog ID that you input to search files in the Catalog. |
346+
| Task ID (required) | `task` | string | `TASK_SEARCH` |
347+
| Knowledge Base ID (required) | `knowledge-base-id` | string | Knowledge Base ID that you input to search files in the Knowledge Base. |
349348
| Namespace (required) | `namespace` | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
350349
| Text Prompt (required) | `text-prompt` | string | The prompt string to search the chunks. |
351350
| Top K | `top-k` | integer | The number of top chunks to return. The range is from 1~20, and default is 5. |
352351
| File UIDs | `file-uids` | array[string] | Optional list of file UIDs to filter the results by. The elements of the list must be UUID-formatted strings. |
353352
| File Media Type | `file-media-type` | string | The media type to filter, empty for all. <br/><details><summary><strong>Enum values</strong></summary><ul><li>`document`</li><li>`image`</li><li>`audio`</li><li>`video`</li></ul></details> |
354-
| Content Type | `content-type` | string | The content type to filter, empty for all. <br/><details><summary><strong>Enum values</strong></summary><ul><li>`chunk`</li><li>`summary`</li><li>`augmented`</li></ul></details> |
355-
356-
</div>
357-
358-
359-
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
360-
361-
| Output | Field ID | Type | Description |
362-
| :--- | :--- | :--- | :--- |
363-
| [Chunks](#retrieve-chunks) | `chunks` | array[object] | Chunks data from smart search. |
364-
365-
</div>
366-
367-
368-
<details>
369-
<summary> Output Objects in Retrieve</summary>
370-
371-
<h4 id="retrieve-chunks">Chunks</h4>
372-
373-
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
374-
375-
| Field | Field ID | Type | Note |
376-
| :--- | :--- | :--- | :--- |
377-
| Chunk UID | `chunk-uid` | string | The unique identifier of the chunk. |
378-
| [Chunk Reference](#retrieve-chunk-reference) | `reference` | object | Reference to the position of the chunk within the original file. |
379-
| Similarity | `similarity-score` | number | The similarity score of the chunk. |
380-
| Source File Name | `source-file-name` | string | The name of the source file. |
381-
| Source File UID | `source-file-uid` | string | The UID of the source file. |
382-
| Text Content | `text-content` | string | The text content of the chunk. |
383-
384-
</div>
385-
386-
<h4 id="retrieve-chunk-reference">Chunk Reference</h4>
387-
388-
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
389-
390-
| Field | Field ID | Type | Note |
391-
| :--- | :--- | :--- | :--- |
392-
| [File Position](#retrieve-file-position) | `end` | object | Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. |
393-
| [File Position](#retrieve-file-position) | `start` | object | Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. |
394-
395-
</div>
396-
397-
<h4 id="retrieve-file-position">File Position</h4>
398-
399-
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
400-
401-
| Field | Field ID | Type | Note |
402-
| :--- | :--- | :--- | :--- |
403-
| Coordinates | `coordinates` | array | Position value. |
404-
| Unit | `unit` | string | Unit of measurement for a position within a file. <br/><details><summary><strong>Enum values</strong></summary><ul><li>`UNIT_CHARACTER`</li><li>`UNIT_PAGE`</li><li>`UNIT_TIME_MS`</li><li>`UNIT_PIXEL`</li></ul></details> |
405-
406-
</div>
407-
408-
<h4 id="retrieve-file-position">File Position</h4>
409-
410-
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
411-
412-
| Field | Field ID | Type | Note |
413-
| :--- | :--- | :--- | :--- |
414-
| Coordinates | `coordinates` | array | Position value. |
415-
| Unit | `unit` | string | Unit of measurement for a position within a file. <br/><details><summary><strong>Enum values</strong></summary><ul><li>`UNIT_CHARACTER`</li><li>`UNIT_PAGE`</li><li>`UNIT_TIME_MS`</li><li>`UNIT_PIXEL`</li></ul></details> |
416-
417-
</div>
418-
</details>
419-
420-
421-
### Ask
422-
423-
Reply the questions based on the files in the catalog.
424-
425-
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
426-
427-
| Input | Field ID | Type | Description |
428-
| :--- | :--- | :--- | :--- |
429-
| Task ID (required) | `task` | string | `TASK_ASK` |
430-
| Catalog ID (required) | `catalog-id` | string | Catalog ID that you input to search files in the Catalog. |
431-
| Namespace (required) | `namespace` | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
432-
| Question (required) | `question` | string | The question to reply. |
433-
| Top K | `top-k` | integer | The number of top answers to return. The range is from 1~20, and default is 5. |
434-
| File UIDs | `file-uids` | array[string] | Optional list of file UIDs to filter the results by. The elements of the list must be UUID-formatted strings. |
353+
| Chunk Type | `chunk-type` | string | The chunk type to filter, empty for all. <br/><details><summary><strong>Enum values</strong></summary><ul><li>`content`</li><li>`summary`</li><li>`augmented`</li></ul></details> |
435354

436355
</div>
437356

@@ -440,42 +359,41 @@ Reply the questions based on the files in the catalog.
440359

441360
| Output | Field ID | Type | Description |
442361
| :--- | :--- | :--- | :--- |
443-
| Answer | `answer` | string | Answers data from smart search. |
444-
| [Chunks](#ask-chunks) (optional) | `chunks` | array[object] | Chunks data to answer question. |
362+
| [Chunks](#search-chunks) | `chunks` | array[object] | Chunks data from smart search. |
445363

446364
</div>
447365

448366

449367
<details>
450-
<summary> Output Objects in Ask</summary>
368+
<summary> Output Objects in Search</summary>
451369

452-
<h4 id="ask-chunks">Chunks</h4>
370+
<h4 id="search-chunks">Chunks</h4>
453371

454372
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
455373

456374
| Field | Field ID | Type | Note |
457375
| :--- | :--- | :--- | :--- |
458376
| Chunk UID | `chunk-uid` | string | The unique identifier of the chunk. |
459-
| [Chunk Reference](#ask-chunk-reference) | `reference` | object | Reference to the position of the chunk within the original file. |
377+
| [Chunk Reference](#search-chunk-reference) | `reference` | object | Reference to the position of the chunk within the original file. |
460378
| Similarity | `similarity-score` | number | The similarity score of the chunk. |
461379
| Source File Name | `source-file-name` | string | The name of the source file. |
462380
| Source File UID | `source-file-uid` | string | The UID of the source file. |
463381
| Text Content | `text-content` | string | The text content of the chunk. |
464382

465383
</div>
466384

467-
<h4 id="ask-chunk-reference">Chunk Reference</h4>
385+
<h4 id="search-chunk-reference">Chunk Reference</h4>
468386

469387
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
470388

471389
| Field | Field ID | Type | Note |
472390
| :--- | :--- | :--- | :--- |
473-
| [File Position](#ask-file-position) | `end` | object | Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. |
474-
| [File Position](#ask-file-position) | `start` | object | Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. |
391+
| [File Position](#search-file-position) | `end` | object | Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. |
392+
| [File Position](#search-file-position) | `start` | object | Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. |
475393

476394
</div>
477395

478-
<h4 id="ask-file-position">File Position</h4>
396+
<h4 id="search-file-position">File Position</h4>
479397

480398
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
481399

@@ -486,7 +404,7 @@ Reply the questions based on the files in the catalog.
486404

487405
</div>
488406

489-
<h4 id="ask-file-position">File Position</h4>
407+
<h4 id="search-file-position">File Position</h4>
490408

491409
<div class="markdown-col-no-wrap" data-col-1 data-col-2>
492410

pkg/component/data/instillartifact/v0/config/definition.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
availableTasks:
2-
- TASK_UPLOAD_FILE
3-
- TASK_UPLOAD_FILES
2+
- TASK_CREATE_FILE
3+
- TASK_CREATE_FILES
44
- TASK_GET_FILES_METADATA
55
- TASK_GET_CHUNKS_METADATA
66
- TASK_GET_FILE_IN_MARKDOWN
77
- TASK_MATCH_FILE_STATUS
8-
- TASK_RETRIEVE
9-
- TASK_ASK
8+
- TASK_SEARCH
109
documentationUrl: https://docs.instill-ai.com/docs/instill-artifact
1110
icon: assets/instill-artifact.svg
1211
id: instill-artifact

0 commit comments

Comments
 (0)