Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate ORC Writer to pylibcudf #17310

Merged
merged 25 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4e2f60c
[WIP] Migrate ORC Writer to pylibcudf
Matt711 Nov 13, 2024
3648886
Merge branch 'branch-24.12' into pylibcudf-io-orc
Matt711 Nov 15, 2024
763b870
add orc chuncked writer
Matt711 Nov 18, 2024
120ec07
check style
Matt711 Nov 18, 2024
db110c2
Merge branch 'branch-24.12' into pylibcudf-io-orc
Matt711 Nov 19, 2024
9f20820
add a test
Matt711 Nov 19, 2024
b904747
clean up, address review
Matt711 Nov 19, 2024
0819020
Merge branch 'branch-24.12' into pylibcudf-io-orc
Matt711 Nov 19, 2024
5fc0eec
use a pointer instead
Matt711 Nov 19, 2024
866fdc2
add doc strings
Matt711 Nov 20, 2024
734bbcd
Merge branch 'branch-24.12' into pylibcudf-io-orc
Matt711 Nov 20, 2024
d094bcf
Merge branch 'pylibcudf-io-orc' of github.com:Matt711/cudf into pylib…
Matt711 Nov 20, 2024
50e5e16
skip test if pandas version < 2.2.3
Matt711 Nov 20, 2024
52b1c77
address review
Matt711 Nov 20, 2024
fd4c6cd
address review
Matt711 Nov 21, 2024
960b7d4
merge conflict and add gc test
Matt711 Nov 21, 2024
826f8c8
merge conflict
Matt711 Nov 25, 2024
3bd27a7
clean up
Matt711 Nov 25, 2024
31853a8
try a different approach in gc test
Matt711 Nov 25, 2024
b2a154a
enable diable gc in test
Matt711 Nov 25, 2024
8ae013f
address review
Matt711 Nov 26, 2024
1cf3078
Merge branch 'branch-25.02' into pylibcudf-io-orc
Matt711 Nov 26, 2024
b51c926
Update python/pylibcudf/pylibcudf/tests/io/test_types.py
Matt711 Nov 26, 2024
49953e5
Merge branch 'branch-25.02' into pylibcudf-io-orc
Matt711 Nov 26, 2024
5b12ef5
address review
Matt711 Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[WIP] Migrate ORC Writer to pylibcudf
  • Loading branch information
Matt711 committed Nov 13, 2024
commit 4e2f60c734e8fb2a3b38e9a41af52111ceea7915
27 changes: 26 additions & 1 deletion python/pylibcudf/pylibcudf/io/orc.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,21 @@ from libcpp cimport bool
from libcpp.optional cimport optional
from libcpp.string cimport string
from libcpp.vector cimport vector
from pylibcudf.io.types cimport SourceInfo, TableWithMetadata
from pylibcudf.io.types cimport (
SourceInfo,
TableWithMetadata,
CompressionType,
StatisticsFreq,
)
from pylibcudf.libcudf.io.orc_metadata cimport (
column_statistics,
parsed_orc_statistics,
statistics_type,
)
from pylibcudf.libcudf.io.orc cimport (
orc_writer_options,
orc_writer_options_builder,
)
from pylibcudf.libcudf.types cimport size_type
from pylibcudf.types cimport DataType

Expand Down Expand Up @@ -48,3 +57,19 @@ cdef class ParsedOrcStatistics:
cpdef ParsedOrcStatistics read_parsed_orc_statistics(
SourceInfo source_info
)


cdef class OrcWriterOptions:
cdef orc_writer_options c_obj

@staticmethod
cdef OrcWriterOptionsBuilder builder(SinkInfo sink, Table table)


cdef class OrcWriterOptionsBuilder:
cdef orc_writer_options_builder c_obj
cpdef OrcWriterOptionsBuilder compression(self, CompressionType comp)
cpdef OrcWriterOptionsBuilder enable_statistics(self, StatisticsFreq val)
cpdef OrcWriterOptionsBuilder key_value_metadata(self, object kvm)
cpdef OrcWriterOptionsBuilder metadata(self, TableWithMetadata meta)
cpdef OrcWriterOptions build(self)
28 changes: 27 additions & 1 deletion python/pylibcudf/pylibcudf/io/orc.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@

from typing import Any

from pylibcudf.io.types import SourceInfo, TableWithMetadata
from pylibcudf.io.types import (
CompressionType,
SinkInfo,
SourceInfo,
StatisticsFreq,
TableWithMetadata,
)
from pylibcudf.table import Table
from pylibcudf.types import DataType

def read_orc(
Expand Down Expand Up @@ -39,3 +46,22 @@ class ParsedOrcStatistics:
def read_parsed_orc_statistics(
source_info: SourceInfo,
) -> ParsedOrcStatistics: ...

class OrcWriterOptions:
def __init__(self): ...
@staticmethod
def builder(sink: SinkInfo, table: Table) -> OrcWriterOptionsBuilder: ...

class OrcWriterOptionsBuilder:
def __init__(self): ...
def compression(
self, comp: CompressionType
) -> OrcWriterOptionsBuilder: ...
def enable_statistics(
self, val: StatisticsFreq
) -> OrcWriterOptionsBuilder: ...
def key_value_metadata(self, kvm: object) -> OrcWriterOptionsBuilder: ...
Matt711 marked this conversation as resolved.
Show resolved Hide resolved
def metadata(self, meta: TableWithMetadata) -> OrcWriterOptionsBuilder: ...
def build(self) -> OrcWriterOptions: ...

def write_orc(options: OrcWriterOptions) -> None: ...
Loading