Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport #1690 to release-1.4 #1794

Merged
merged 5 commits into from
Oct 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/python-ci-minimal.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ jobs:
python_version: ${{ matrix.python-version }}
cc: ${{ matrix.cc }}
cxx: ${{ matrix.cxx }}
is_mac: ${{ contains(matrix.os, 'macos') }}
report_codecov: ${{ matrix.python-version == '3.10' }}
run_lint: ${{ matrix.python-version == '3.10' }}
secrets: inherit
37 changes: 33 additions & 4 deletions .github/workflows/r-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ on:
branches:
- main
- 'release-*'
workflow_dispatch:

env:
COVERAGE_FLAGS: "r"
COVERAGE_TOKEN: ${{ secrets.CODECOV_TOKEN }}
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

jobs:
ci:
Expand All @@ -37,11 +39,28 @@ jobs:
- name: Bootstrap
run: cd apis/r && tools/r-ci.sh bootstrap

- name: Install 'old' tiledb-r to satisfy its dependencies
run: cd apis/r && tools/r-ci.sh tiledb-r
- name: Install BioConductor package SingleCellExperiment
run: cd apis/r && tools/r-ci.sh install_bioc SingleCellExperiment

- name: Install new tiledb-r 0.20.2
run: cd apis/r && Rscript -e 'if (Sys.info()[["sysname"]] == "Linux") bspm::disable(); install.packages("tiledb", repos = c("https://tiledb-inc.r-universe.dev", "https://cloud.r-project.org"))'
# The next two stanzas are necessary given propagation delay for binaries at
# https://r2u.stat.illinois.edu/ which may lack binaries (for short periods of usually a day)
# when sources have been updated. The default installation could switch to installation from
# sources --- but that would require installing all build dependencies. To see what the most
# recent binary is run e.g. docker run --rm -ti rocker/r2u bash -c 'apt update -qq &&
# apt-cache show r-cran-tiledb' Using the r-universe builds (as below) is a suitable fallback
# as they update more frequently than CRAN.

# Uncomment these next two stanzas as needed whenever we've just released a new tiledb-r for
# which source is available but binaries are not yet:

#- name: Install r-universe build of tiledb-r (macOS)
# if: ${{ matrix.os == 'macOS-latest' }}
# run: cd apis/r && Rscript -e "install.packages('tiledb', repos = c('https://eddelbuettel.r-universe.dev', 'https://cloud.r-project.org'))"

# docker run --rm -ti rocker/r2u Rscript -e 'install.packages("tiledb")'
#- name: Install r-universe build of tiledb-r (linux)
# if: ${{ matrix.os != 'macOS-latest' }}
# run: cd apis/r && Rscript -e "options(bspm.version.check=TRUE); install.packages('tiledb', repos = c('https://eddelbuettel.r-universe.dev/bin/linux/jammy/4.3/', 'https://cloud.r-project.org'))"

- name: Dependencies
run: cd apis/r && tools/r-ci.sh install_all
Expand All @@ -58,6 +77,12 @@ jobs:
#- name: Call ldconfig
# if: ${{ matrix.os == 'ubuntu-latest' }}
# run: sudo ldconfig
#
- name: Update Packages
run: Rscript -e 'update.packages(ask=FALSE)'

- name: Pin TileDB-R
run: Rscript -e 'remotes::install_github(repo="TileDB-Inc/TileDB-R@0.20.3")'

- name: Test
if: ${{ matrix.covr == 'no' }}
Expand All @@ -67,6 +92,10 @@ jobs:
run: cat $HOME/work/TileDB-SOMA/TileDB-SOMA/apis/r/tiledbsoma.Rcheck/00install.out
if: failure()

- name: View Test Output
run: cat $HOME/work/TileDB-SOMA/TileDB-SOMA/apis/r/tiledbsoma.Rcheck/00check.log
if: failure()

- name: Coverage
if: ${{ matrix.os == 'ubuntu-latest' && matrix.covr == 'yes' }}
run: cd apis/r && tools/r-ci.sh coverage
37 changes: 31 additions & 6 deletions .github/workflows/r-python-interop-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,20 @@ name: TileDB-SOMA R-Python interop testing

on:
pull_request:
paths:
- "apis/python/**"
- "apis/r/**"
- "apis/system/**"
# TODO: leave this enabled for pre-merge signal for now. At some point we may want to go back to
# only having this signal post-merge.
#paths:
# - "apis/python/**"
# - "apis/r/**"
# - "apis/system/**"
push:
branches:
- main
- 'release-*'
workflow_dispatch:

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

jobs:
ci:
Expand Down Expand Up @@ -38,6 +44,17 @@ jobs:
- name: MkVars
run: mkdir ~/.R && echo "CXX17FLAGS=-Wno-deprecated-declarations -Wno-deprecated" > ~/.R/Makevars

#- name: Install r-universe build of tiledb-r (macOS)
# if: ${{ matrix.os == 'macOS-latest' }}
# run: cd apis/r && Rscript -e "install.packages('tiledb', repos = c('https://eddelbuettel.r-universe.dev', 'https://cloud.r-project.org'))"
#
#- name: Install r-universe build of tiledb-r (linux)
# if: ${{ matrix.os != 'macOS-latest' }}
# run: cd apis/r && Rscript -e "options(bspm.version.check=TRUE); install.packages('tiledb', repos = c('https://eddelbuettel.r-universe.dev/bin/linux/jammy/4.3/', 'https://cloud.r-project.org'))"

- name: Pin TileDB-R
run: Rscript -e 'remotes::install_github(repo="TileDB-Inc/TileDB-R@0.20.3")'

- name: Build and install libtiledbsoma
run: sudo scripts/bld --prefix=/usr/local && sudo ldconfig

Expand All @@ -48,6 +65,9 @@ jobs:
FILE=$(ls -1t *.tar.gz | head -n 1)
R CMD INSTALL $FILE

- name: Show R package versions
run: Rscript -e 'tiledbsoma::show_package_versions()'

- name: Install testing prereqs
run: python -m pip -v install -U pip pytest-cov 'typeguard<3.0' types-setuptools

Expand All @@ -61,8 +81,13 @@ jobs:
- name: Install tiledbsoma
run: python -m pip -v install -e apis/python

- name: Show package versions
run: python scripts/show-versions.py
- name: Show Python package versions
run: |
python -c 'import tiledbsoma; tiledbsoma.show_package_versions()'
python scripts/show-versions.py

- name: Update Packages
run: Rscript -e 'update.packages(ask=FALSE)'

- name: Interop Tests
run: python -m pytest apis/system/tests/
17 changes: 10 additions & 7 deletions apis/python/src/tiledbsoma/_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@
"""
Implementation of a SOMA DataFrame
"""
from typing import Any, Optional, Sequence, Tuple, Type, Union, cast
from typing import Any, Dict, Optional, Sequence, Tuple, Type, Union, cast

import numpy as np
import pandas as pd
import pyarrow as pa
import somacore
import tiledb
Expand Down Expand Up @@ -379,9 +380,9 @@ def write(
"""
_util.check_type("values", values, (pa.Table,))

del platform_config # unused
dim_cols_map = {}
attr_cols_map = {}
dim_cols_map: Dict[str, pd.DataFrame] = {}
attr_cols_map: Dict[str, pd.DataFrame] = {}

dim_names_set = self.index_column_names
n = None

Expand All @@ -403,14 +404,17 @@ def write(
dim_cols_list = [dim_cols_map[name] for name in self.index_column_names]
dim_cols_tuple = tuple(dim_cols_list)
self._handle.writer[dim_cols_tuple] = attr_cols_map
self._consolidate_and_vacuum_fragment_metadata()
tiledb_create_options = TileDBCreateOptions.from_platform_config(
platform_config
)
if tiledb_create_options.consolidate_and_vacuum:
self._consolidate_and_vacuum()

return self

def _set_reader_coord(
self, sr: clib.SOMAArray, dim_idx: int, dim: tiledb.Dim, coord: object
) -> bool:

if coord is None:
return True # No constraint; select all in this dimension

Expand Down Expand Up @@ -548,7 +552,6 @@ def _set_reader_coord_by_py_seq_or_np_array(
def _set_reader_coord_by_numeric_slice(
self, sr: clib.SOMAArray, dim_idx: int, dim: tiledb.Dim, coord: Slice[Any]
) -> bool:

try:
lo_hi = _util.slice_to_numeric_range(coord, dim.domain)
except _util.NonNumericDimensionError:
Expand Down
7 changes: 5 additions & 2 deletions apis/python/src/tiledbsoma/_dense_nd_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,12 @@ def write(
"""
_util.check_type("values", values, (pa.Tensor,))

del platform_config # Currently unused.
self._handle.writer[coords] = values.to_numpy()
self._consolidate_and_vacuum_fragment_metadata()
tiledb_create_options = TileDBCreateOptions.from_platform_config(
platform_config
)
if tiledb_create_options.consolidate_and_vacuum:
self._consolidate_and_vacuum()
return self

@classmethod
Expand Down
22 changes: 18 additions & 4 deletions apis/python/src/tiledbsoma/_sparse_nd_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,14 +182,20 @@ def write(
Lifecycle:
Experimental.
"""
del platform_config # Currently unused.

arr = self._handle.writer
tiledb_create_options = TileDBCreateOptions.from_platform_config(
platform_config
)

if isinstance(values, pa.SparseCOOTensor):
data, coords = values.to_numpy()
arr[tuple(c for c in coords.T)] = data
self._consolidate_and_vacuum_fragment_metadata()

if tiledb_create_options.consolidate_and_vacuum:
# Consolidate non-bulk data
self._consolidate_and_vacuum()

return self

if isinstance(values, (pa.SparseCSCMatrix, pa.SparseCSRMatrix)):
Expand All @@ -200,7 +206,11 @@ def write(
# TODO: the ``to_scipy`` function is not zero copy. Need to explore zero-copy options.
sp = values.to_scipy().tocoo()
arr[sp.row, sp.col] = sp.data
self._consolidate_and_vacuum_fragment_metadata()

if tiledb_create_options.consolidate_and_vacuum:
# Consolidate non-bulk data
self._consolidate_and_vacuum()

return self

if isinstance(values, pa.Table):
Expand All @@ -211,7 +221,11 @@ def write(
for n in range(coord_tbl.num_columns)
)
arr[coords] = data
self._consolidate_and_vacuum_fragment_metadata()

if tiledb_create_options.consolidate_and_vacuum:
# Consolidate non-bulk data
self._consolidate_and_vacuum()

return self

raise TypeError(
Expand Down
29 changes: 25 additions & 4 deletions apis/python/src/tiledbsoma/_tiledb_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import ctypes
import os
import sys
from typing import Any, Dict, Optional, Sequence, Tuple
from typing import Any, Dict, List, Optional, Sequence, Tuple

import pyarrow as pa
import tiledb
Expand Down Expand Up @@ -194,20 +194,41 @@ def _create_internal(
cls._set_create_metadata(handle)
return handle

def _consolidate_and_vacuum_fragment_metadata(self) -> None:
def _consolidate_and_vacuum(
self, modes: List[str] = ["fragment_meta", "commits"]
) -> None:
"""
This post-ingestion helper consolidates and vacuums fragment metadata and commit files --
this is quick to do, and positively impacts query performance. It does _not_ consolidate
bulk array data, which is more time-consuming and should be done at the user's opt-in
discretion.
"""

for mode in ["fragment_meta", "commits"]:
for mode in modes:
self._consolidate(modes=[mode])
self._vacuum(modes=[mode])

def _consolidate(self, modes: List[str] = ["fragment_meta", "commits"]) -> None:
"""
This post-ingestion helper consolidates by default fragment metadata and commit files --
this is quick to do, and positively impacts query performance.
"""

for mode in modes:
cfg = self._ctx.config()
cfg["sm.consolidation.mode"] = mode
cfg["sm.vacuum.mode"] = mode
ctx = tiledb.Ctx(cfg)

tiledb.consolidate(self.uri, ctx=ctx)

def _vacuum(self, modes: List[str] = ["fragment_meta", "commits"]) -> None:
"""
This post-ingestion helper vacuums by default fragment metadata and commit files. Vacuuming is not multi-process safe and requires coordination that nothing is currently reading the files that will be vacuumed.
"""

for mode in modes:
cfg = self._ctx.config()
cfg["sm.vacuum.mode"] = mode
ctx = tiledb.Ctx(cfg)

tiledb.vacuum(self.uri, ctx=ctx)
3 changes: 3 additions & 0 deletions apis/python/src/tiledbsoma/options/_tiledb_create_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ class TileDBCreateOptions:
attrs: Mapping[str, _ColumnConfig] = attrs_.field(
factory=dict, converter=_normalize_columns
)
consolidate_and_vacuum: bool = attrs_.field(
validator=vld.instance_of(bool), default=False
)

@classmethod
def from_platform_config(
Expand Down
Binary file added test/soco.tgz
Binary file not shown.
Loading