Skip to content

Commit b0919d9

Browse files
committed
feat(gooddata-sdk): add CSV upload support for GDSTORAGE data sources
Add five new methods to CatalogDataSourceService: - staging_upload — upload a CSV to the staging area - analyze_csv — detect columns, types, and parse config - import_csv — import a staged CSV into a GDSTORAGE data source - delete_csv_files — remove files from a data source - upload_csv — end-to-end convenience wrapper Also add CatalogDataSourceGdStorage entity model with _NoCredentials support (GDSTORAGE sources require no authentication), export the new symbol, add the `result` spec to the Makefile download target, and add documentation pages for all new methods. jira: TRIVIAL risk: low
1 parent 05798a6 commit b0919d9

File tree

10 files changed

+350
-0
lines changed

10 files changed

+350
-0
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ download:
6666
$(call download_client,scan)
6767
$(call download_client,"export")
6868
$(call download_client,automation)
69+
$(call download_client,result)
6970

7071
.PHONY: type-check
7172
type-check:

docs/content/en/latest/data/data-source/_index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,14 @@ See [Connect Data](https://www.gooddata.com/docs/cloud/connect-data/) to learn h
3737
* [scan_schemata](./scan_schemata/)
3838
* [scan_sql](./scan_sql/)
3939

40+
### CSV Upload Methods
41+
42+
* [staging_upload](./staging_upload/)
43+
* [analyze_csv](./analyze_csv/)
44+
* [import_csv](./import_csv/)
45+
* [delete_csv_files](./delete_csv_files/)
46+
* [upload_csv](./upload_csv/)
47+
4048

4149
## Example
4250

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: "analyze_csv"
3+
linkTitle: "analyze_csv"
4+
weight: 191
5+
superheading: "catalog_data_source."
6+
---
7+
8+
9+
10+
``analyze_csv(location: str)``
11+
12+
Analyzes an uploaded CSV file in the staging area. Returns column metadata, detected types, preview data, and a config object that can be passed to import_csv.
13+
14+
{{% parameters-block title="Parameters"%}}
15+
16+
{{< parameter p_name="location" p_type="string" >}}
17+
Location string returned by staging_upload.
18+
{{< /parameter >}}
19+
20+
{{% /parameters-block %}}
21+
22+
{{% parameters-block title="Returns"%}}
23+
24+
{{< parameter p_type="AnalyzeCsvResponse" >}}
25+
Analysis result with columns, preview data, and config.
26+
{{< /parameter >}}
27+
28+
{{% /parameters-block %}}
29+
30+
## Example
31+
32+
```python
33+
# Analyze a previously uploaded CSV file
34+
analysis = sdk.catalog_data_source.analyze_csv(location="staging/some-location")
35+
for col in analysis["columns"]:
36+
print(f"{col['name']}: {col['type']}")
37+
```
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: "delete_csv_files"
3+
linkTitle: "delete_csv_files"
4+
weight: 193
5+
superheading: "catalog_data_source."
6+
---
7+
8+
9+
10+
``delete_csv_files(data_source_id: str, file_names: list[str])``
11+
12+
Deletes files from a GDSTORAGE data source.
13+
14+
{{% parameters-block title="Parameters"%}}
15+
16+
{{< parameter p_name="data_source_id" p_type="string" >}}
17+
Data source identification string.
18+
{{< /parameter >}}
19+
20+
{{< parameter p_name="file_names" p_type="list[string]" >}}
21+
List of file names to delete.
22+
{{< /parameter >}}
23+
24+
{{% /parameters-block %}}
25+
26+
{{% parameters-block title="Returns" None="yes"%}}
27+
{{% /parameters-block %}}
28+
29+
## Example
30+
31+
```python
32+
# Delete specific files from a GDSTORAGE data source
33+
sdk.catalog_data_source.delete_csv_files(
34+
data_source_id="my-gdstorage-ds",
35+
file_names=["my_table.csv"],
36+
)
37+
```
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: "import_csv"
3+
linkTitle: "import_csv"
4+
weight: 192
5+
superheading: "catalog_data_source."
6+
---
7+
8+
9+
10+
``import_csv(data_source_id: str, table_name: str, location: str, config: Optional[dict] = None)``
11+
12+
Imports a CSV file from the staging area into a GDSTORAGE data source.
13+
14+
{{% parameters-block title="Parameters"%}}
15+
16+
{{< parameter p_name="data_source_id" p_type="string" >}}
17+
Data source identification string.
18+
{{< /parameter >}}
19+
20+
{{< parameter p_name="table_name" p_type="string" >}}
21+
Name for the table to create or replace.
22+
{{< /parameter >}}
23+
24+
{{< parameter p_name="location" p_type="string" >}}
25+
Location string returned by staging_upload.
26+
{{< /parameter >}}
27+
28+
{{< parameter p_name="config" p_type="Optional[dict]" >}}
29+
Source config dict, typically from analyze_csv response. Optional.
30+
{{< /parameter >}}
31+
32+
{{% /parameters-block %}}
33+
34+
{{% parameters-block title="Returns" None="yes"%}}
35+
{{% /parameters-block %}}
36+
37+
## Example
38+
39+
```python
40+
# Import a CSV into a GDSTORAGE data source using config from analysis
41+
analysis = sdk.catalog_data_source.analyze_csv(location=location)
42+
config = analysis.to_dict().get("config")
43+
sdk.catalog_data_source.import_csv(
44+
data_source_id="my-gdstorage-ds",
45+
table_name="my_table",
46+
location=location,
47+
config=config,
48+
)
49+
```
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: "staging_upload"
3+
linkTitle: "staging_upload"
4+
weight: 190
5+
superheading: "catalog_data_source."
6+
---
7+
8+
9+
10+
``staging_upload(csv_file: Path)``
11+
12+
Uploads a CSV file to the staging area and returns a location string that can be used in subsequent calls to analyze_csv and import_csv.
13+
14+
{{% parameters-block title="Parameters"%}}
15+
16+
{{< parameter p_name="csv_file" p_type="Path" >}}
17+
Path to the CSV file to upload.
18+
{{< /parameter >}}
19+
20+
{{% /parameters-block %}}
21+
22+
{{% parameters-block title="Returns"%}}
23+
24+
{{< parameter p_type="string" >}}
25+
Location string referencing the uploaded file in staging.
26+
{{< /parameter >}}
27+
28+
{{% /parameters-block %}}
29+
30+
## Example
31+
32+
```python
33+
from pathlib import Path
34+
35+
# Upload a CSV file to staging
36+
location = sdk.catalog_data_source.staging_upload(csv_file=Path("data.csv"))
37+
```
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: "upload_csv"
3+
linkTitle: "upload_csv"
4+
weight: 194
5+
superheading: "catalog_data_source."
6+
---
7+
8+
9+
10+
``upload_csv(data_source_id: str, csv_file: Path, table_name: str)``
11+
12+
Convenience method that uploads a CSV file and imports it into a GDSTORAGE data source in a single call. Orchestrates the full flow: staging_upload → analyze_csv → import_csv → register_upload_notification.
13+
14+
{{% parameters-block title="Parameters"%}}
15+
16+
{{< parameter p_name="data_source_id" p_type="string" >}}
17+
Data source identification string for a GDSTORAGE data source.
18+
{{< /parameter >}}
19+
20+
{{< parameter p_name="csv_file" p_type="Path" >}}
21+
Path to the CSV file to upload.
22+
{{< /parameter >}}
23+
24+
{{< parameter p_name="table_name" p_type="string" >}}
25+
Name for the table to create or replace in the data source.
26+
{{< /parameter >}}
27+
28+
{{% /parameters-block %}}
29+
30+
{{% parameters-block title="Returns" None="yes"%}}
31+
{{% /parameters-block %}}
32+
33+
## Example
34+
35+
```python
36+
from pathlib import Path
37+
38+
# Upload a CSV file end-to-end in a single call
39+
sdk.catalog_data_source.upload_csv(
40+
data_source_id="my-gdstorage-ds",
41+
csv_file=Path("data.csv"),
42+
table_name="my_table",
43+
)
44+
```

packages/gooddata-sdk/src/gooddata_sdk/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
CatalogDataSource,
3030
CatalogDataSourceBigQuery,
3131
CatalogDataSourceDatabricks,
32+
CatalogDataSourceGdStorage,
3233
CatalogDataSourceGreenplum,
3334
CatalogDataSourceMariaDb,
3435
CatalogDataSourceMotherDuck,

packages/gooddata-sdk/src/gooddata_sdk/catalog/data_source/entity_model/data_source.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,3 +296,25 @@ class MotherDuckAttributes(DatabaseAttributes):
296296
class CatalogDataSourceMotherDuck(CatalogDataSource):
297297
_URL_TMPL: ClassVar[str] = "jdbc:duckdb:md:{db_name}"
298298
type: str = "MOTHERDUCK"
299+
300+
301+
class _NoCredentials(Credentials):
302+
"""Placeholder credentials for data sources that do not require authentication."""
303+
304+
def to_api_args(self) -> dict[str, Any]:
305+
return {}
306+
307+
@classmethod
308+
def is_part_of_api(cls, entity: dict[str, Any]) -> bool:
309+
return True
310+
311+
@classmethod
312+
def from_api(cls, entity: dict[str, Any]) -> _NoCredentials:
313+
return cls()
314+
315+
316+
@define(kw_only=True, eq=False)
317+
class CatalogDataSourceGdStorage(CatalogDataSource):
318+
type: str = "GDSTORAGE"
319+
schema: str = ""
320+
credentials: Credentials = field(factory=_NoCredentials, repr=False)

packages/gooddata-sdk/src/gooddata_sdk/catalog/data_source/service.py

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,16 @@
77

88
from gooddata_api_client.exceptions import NotFoundException
99

10+
# CSV upload related imports from generated API client
11+
from gooddata_api_client.model.analyze_csv_request import AnalyzeCsvRequest
12+
from gooddata_api_client.model.analyze_csv_request_item import AnalyzeCsvRequestItem
13+
from gooddata_api_client.model.analyze_csv_response import AnalyzeCsvResponse
14+
from gooddata_api_client.model.delete_files_request import DeleteFilesRequest
15+
from gooddata_api_client.model.import_csv_request import ImportCsvRequest
16+
from gooddata_api_client.model.import_csv_request_table import ImportCsvRequestTable
17+
from gooddata_api_client.model.import_csv_request_table_source import ImportCsvRequestTableSource
18+
from gooddata_api_client.model.import_csv_request_table_source_config import ImportCsvRequestTableSourceConfig
19+
1020
from gooddata_sdk.catalog.catalog_service_base import CatalogServiceBase
1121
from gooddata_sdk.catalog.data_source.action_model.requests.ldm_request import (
1222
CatalogGenerateLdmRequest,
@@ -504,6 +514,110 @@ def test_data_sources_connection(
504514
message.append(f"Test connection for data source id {k} ended with the following error {v}.")
505515
raise ValueError("\n".join(message))
506516

517+
# CSV upload methods are listed below
518+
519+
def staging_upload(self, csv_file: Path) -> str:
520+
"""Upload a CSV file to the staging area.
521+
522+
Args:
523+
csv_file (Path):
524+
Path to the CSV file to upload.
525+
526+
Returns:
527+
str:
528+
Location string referencing the uploaded file in staging.
529+
"""
530+
with open(csv_file, "rb") as f:
531+
response = self._actions_api.staging_upload(file=f)
532+
return response["location"]
533+
534+
def analyze_csv(self, location: str) -> AnalyzeCsvResponse:
535+
"""Analyze an uploaded CSV file in the staging area.
536+
537+
Returns column metadata, detected types, and a config object
538+
that can be passed directly to import_csv.
539+
540+
Args:
541+
location (str):
542+
Location string returned by staging_upload.
543+
544+
Returns:
545+
AnalyzeCsvResponse:
546+
Analysis result with columns, preview data, and config.
547+
"""
548+
request = AnalyzeCsvRequest(analyze_requests=[AnalyzeCsvRequestItem(location=location)])
549+
responses = self._actions_api.analyze_csv(request, _check_return_type=False)
550+
return responses[0]
551+
552+
def import_csv(
553+
self,
554+
data_source_id: str,
555+
table_name: str,
556+
location: str,
557+
config: dict[str, Any] | None = None,
558+
) -> None:
559+
"""Import a CSV file from staging into a GDSTORAGE data source.
560+
561+
Args:
562+
data_source_id (str):
563+
Data source identification string.
564+
table_name (str):
565+
Name for the table to create or replace.
566+
location (str):
567+
Location string returned by staging_upload.
568+
config (Optional[dict[str, Any]]):
569+
Source config dict, typically from analyze_csv response.
570+
Passed as ImportCsvRequestTableSourceConfig kwargs.
571+
572+
Returns:
573+
None
574+
"""
575+
source_kwargs: dict[str, Any] = {"location": location}
576+
if config:
577+
source_kwargs["config"] = ImportCsvRequestTableSourceConfig(**config)
578+
source = ImportCsvRequestTableSource(**source_kwargs)
579+
table = ImportCsvRequestTable(name=table_name, source=source)
580+
request = ImportCsvRequest(tables=[table])
581+
self._actions_api.import_csv(data_source_id, request)
582+
583+
def delete_csv_files(self, data_source_id: str, file_names: list[str]) -> None:
584+
"""Delete files from a GDSTORAGE data source.
585+
586+
Args:
587+
data_source_id (str):
588+
Data source identification string.
589+
file_names (list[str]):
590+
List of file names to delete.
591+
592+
Returns:
593+
None
594+
"""
595+
request = DeleteFilesRequest(file_names=file_names)
596+
self._actions_api.delete_files(data_source_id, request)
597+
598+
def upload_csv(self, data_source_id: str, csv_file: Path, table_name: str) -> None:
599+
"""Upload a CSV file and import it into a GDSTORAGE data source.
600+
601+
Convenience method that orchestrates the full flow:
602+
staging_upload → analyze_csv → import_csv → register_upload_notification.
603+
604+
Args:
605+
data_source_id (str):
606+
Data source identification string for a GDSTORAGE data source.
607+
csv_file (Path):
608+
Path to the CSV file to upload.
609+
table_name (str):
610+
Name for the table to create or replace in the data source.
611+
612+
Returns:
613+
None
614+
"""
615+
location = self.staging_upload(csv_file)
616+
analysis = self.analyze_csv(location)
617+
config = analysis.to_dict().get("config")
618+
self.import_csv(data_source_id, table_name, location, config=config)
619+
self.register_upload_notification(data_source_id)
620+
507621
# Help methods are listed below
508622

509623
@staticmethod

0 commit comments

Comments
 (0)