Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Added a generic FileStream (still in active development!) #2654

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Sep 6, 2024

@edgarrmondragon edgarrmondragon linked an issue Sep 6, 2024 that may be closed by this pull request
Copy link

codspeed-hq bot commented Sep 6, 2024

CodSpeed Performance Report

Merging #2654 will not alter performance

Comparing 2648-feat-add-a-generic-filestream-interface (4628468) with main (e997deb)

Summary

✅ 6 untouched benchmarks

Copy link

codecov bot commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@f1e8114). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2654   +/-   ##
=======================================
  Coverage        ?   90.46%           
=======================================
  Files           ?       62           
  Lines           ?     4992           
  Branches        ?      974           
=======================================
  Hits            ?     4516           
  Misses          ?      330           
  Partials        ?      146           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@edgarrmondragon edgarrmondragon force-pushed the 2648-feat-add-a-generic-filestream-interface branch 4 times, most recently from d3d86fe to 4dfdd17 Compare September 10, 2024 03:40
@edgarrmondragon edgarrmondragon force-pushed the 2648-feat-add-a-generic-filestream-interface branch from 4dfdd17 to bd5c138 Compare September 10, 2024 03:56
@edgarrmondragon edgarrmondragon self-assigned this Sep 10, 2024
@edgarrmondragon edgarrmondragon added this to the v0.41.0 milestone Sep 16, 2024
@edgarrmondragon edgarrmondragon marked this pull request as ready for review September 23, 2024 15:01
@edgarrmondragon edgarrmondragon requested a review from a team as a code owner September 23, 2024 15:01
@edgarrmondragon edgarrmondragon changed the title feat: Added a generic FileStream refactor: Added a generic FileStream Sep 23, 2024
@edgarrmondragon edgarrmondragon changed the title refactor: Added a generic FileStream refactor: Added a generic FileStream (still in active development!) Sep 23, 2024
Copy link
Collaborator Author

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for review. Everything here is subject to change: naming conventions, implementation, abstractions so feel free to comment on those.

Comment on lines +185 to +186
# https://github.com/boto/boto3/issues/3889
# "ignore:datetime\\.datetime\\.utcnow\\(\\) is deprecated:DeprecationWarning:botocore",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# https://github.com/boto/boto3/issues/3889
# "ignore:datetime\\.datetime\\.utcnow\\(\\) is deprecated:DeprecationWarning:botocore",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reviewing this module in split view. All that's left after the refactor are the get_schema and read_file implementations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reviewing this module in split view.

CSV-specific settings were added and custom discovery was removed in favor of the default implementation.

Comment on lines +68 to +71
@property
def partitions(self) -> list[dict[str, t.Any]]:
"""Return the list of partitions for this stream."""
return self._partitions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using partitions allows us to track state for each individual file in merge mode.

@edgarrmondragon
Copy link
Collaborator Author

These are working now

FTP

{
    "filesystem": "ftp",
    "path": "fixtures/csv",
    "read_mode": "one_stream_per_file",
    "delimiter": "\t",
    "ftp": {
        "host": "127.0.0.1",
        "port": 21,
        "username": "my_ftp_user",
        "password": "my_ftp_password"
    }
}

SFTP

{
    "filesystem": "sftp",
    "path": "fixtures/csv",
    "read_mode": "one_stream_per_file",
    "delimiter": "\t",
    "sftp": {
        "host": "127.0.0.1",
        "port": 2022,
        "username": "my_ftp_user",
        "password": "my_ftp_password"
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Up Next
Development

Successfully merging this pull request may close these issues.

feat: Add a generic FileStream interface
1 participant