Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: support for S3-compatible services other than AWS #71

Open
0xhanh opened this issue Oct 8, 2024 · 5 comments
Open

[Bug]: support for S3-compatible services other than AWS #71

0xhanh opened this issue Oct 8, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@0xhanh
Copy link

0xhanh commented Oct 8, 2024

Steps to reproduce

if let s3::Region::Custom { endpoint, .. } = &s3_settings.region {

if let s3::Region::Custom { endpoint, .. } = &s3_settings.region {
    if endpoint.starts_with("https://") || endpoint.starts_with("http://") {
        storage_options.insert("AWS_ENDPOINT_URL".to_string(), endpoint.to_string());
    } else {
        storage_options.insert(
            "AWS_ENDPOINT_URL".to_string(),
            format!("https://{endpoint}"),
        );
    }
    // TODO: add support for S3-compatible services other than AWS
    storage_options.insert("AWS_ALLOW_HTTP".to_string(), "True".to_string());
    storage_options.insert("AWS_STORAGE_ALLOW_HTTP".to_string(), "True".to_string());
} else {
    storage_options.insert("AWS_REGION".to_string(), s3_settings.region.to_string());
}

Relevant log output

No log

What did you expect to happen?

def read_results():
table = pw.io.deltalake.read(
base_path + "timezone_unified",
schema=InputStreamSchema,
autocommit_duration_ms=100,
s3_connection_settings=s3_connection_settings,
)
pw.io.csv.write(table, "./results.csv")
pw.run(monitoring_level=pw.MonitoringLevel.NONE)

Version

0.15.0

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

@0xhanh 0xhanh added the bug Something isn't working label Oct 8, 2024
@dxtrous
Copy link
Member

dxtrous commented Oct 8, 2024

Hi @0xhanh thanks for the report. While there is indeed a big internal TODO around variable name cleanup for internal variable names starting with AWS_..., have you checked that what you need actually does not work with non-AWS S3 setup, despite the misleading names? If so, could you specify the setup you are using?

Please see also:

@0xhanh
Copy link
Author

0xhanh commented Oct 9, 2024

I tested with a non-AWS setup, I used a local minio, I ran the example in examples/projects/kafka-alternatives/minio-ETL in the local-minio environment

--base.py

from dotenv import load_dotenv

import pathway as pw

load_dotenv()

bucket = "lab1"
base_path = f"s3://{bucket}/"
str_repr = "%Y-%m-%d %H:%M:%S.%f %z"

s3_connection_settings = pw.io.minio.MinIOSettings(
    bucket_name=bucket,
    access_key=os.environ["MINIO_S3_ACCESS_KEY"],
    secret_access_key=os.environ["MINIO_S3_SECRET_ACCESS_KEY"],
    endpoint="http://localhost:9090",
    region = "us-east-1",
)

custom_settings = pw.io.s3.AwsS3Settings(
    endpoint="http://localhost:9090",
    bucket_name=,
    access_key=os.environ["MINIO_S3_ACCESS_KEY"],
    secret_access_key=os.environ["MINIO_S3_SECRET_ACCESS_KEY"],
    with_path_style=True,
    region="us-east-1",
)
$ python read-results.py 
[2024-10-09T02:22:27]:INFO:Preparing Pathway computation
[2024-10-09T02:22:27]:INFO:Telemetry enabled
[2024-10-09T02:22:27]:WARNING:{"AWS_BUCKET_NAME": "lab1", "AWS_ACCESS_KEY_ID": "minio", "AWS_VIRTUAL_HOSTED_STYLE_REQUEST": "False", "AWS_S3_ALLOW_UNSAFE_RENAME": "True", "AWS_SECRET_ACCESS_KEY": "minio123", "AWS_ENDPOINT_URL": "http://localhost:9090"}
[2024-10-09T02:22:27]:INFO:Using Static credential provider
thread 'pathway:work-0' panicked at src/engine/report_error.rs:83:50:
OSError: Failed to connect to DeltaLake: Failed to read delta log object: Generic S3 error: Error after 0 retries in 3.668µs, max_retries:10, retry_timeout:180s, source:builder error for url (http://localhost:9090/lab1/timezone_unified/_delta_log/_last_checkpoint)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@0xhanh
Copy link
Author

0xhanh commented Oct 9, 2024

run within fixed venv

 python read-results.py 
[2024-10-09T09:34:50]:INFO:Preparing Pathway computation
[2024-10-09T09:34:50]:WARNING:{"AWS_ALLOW_HTTP": "True", "AWS_BUCKET_NAME": "lab1", "AWS_STORAGE_ALLOW_HTTP": "True", "AWS_ACCESS_KEY_ID": "minio", "AWS_VIRTUAL_HOSTED_STYLE_REQUEST": "False", "AWS_ENDPOINT_URL": "http://localhost:9090", "AWS_SECRET_ACCESS_KEY": "minio123", "AWS_S3_ALLOW_UNSAFE_RENAME": "True"}
[2024-10-09T09:34:50]:INFO:Using Static credential provider
[2024-10-09T09:34:50]:INFO:DeltaTableReader-0: 0 entries (1 minibatch(es)) have been sent to the engine
[2024-10-09T09:34:50]:INFO:FileWriter-0: Done writing 0 entries, time 1728441290884. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-10-09T09:34:55]:INFO:DeltaTableReader-0: 71 entries (51 minibatch(es)) have been sent to the engine
[2024-10-09T09:34:55]:INFO:FileWriter-0: Done writing 71 entries, time 1728441295984. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-10-09T09:35:01]:INFO:DeltaTableReader-0: 0 entries (51 minibatch(es)) have been sent to the engine
[2024-10-09T09:35:01]:INFO:FileWriter-0: Done writing 0 entries, time 1728441301084. Current batch writes took: 0 ms. All writes so far took: 0 ms.

@0xhanh 0xhanh changed the title [Bug]: TODO: add support for S3-compatible services other than AWS [Bug]: support for S3-compatible services other than AWS Oct 9, 2024
@dxtrous
Copy link
Member

dxtrous commented Oct 9, 2024

@0xhanh I am not sure how to read your second comment - did you manage to fix the issue? If so, how?

@0xhanh
Copy link
Author

0xhanh commented Oct 10, 2024

I tried fixing it, it seems ok, comment at the top see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants