Skip to content

Conversation

@jmarshall
Copy link
Member

The declarations in pysam/libchtslib.pxd have mostly not been updated since pysam's early days. This PR brings them up-to-date with HTSlib 1.21 (which pysam currently wraps) and rearranges them to correspond to the order of the C declarations in htslib/*.h for ease of future updates.

jmarshall added 7 commits July 2, 2025 17:15
…/*.h

This reformatted function documentation duplicates that in htslib/*.h
and is not easily updated when the original documentation changes.
Moreover it'll be easier to keep pysam/libchtslib.pxd updated for
htslib/*.h API additions when it solely contains declarations.
Tidy up whitepace, canonicalise pointer declarations as `int *p`,
reformat onto a single line where appropriate. Fix `bed` typos
for `beg`, spell out parameter names as per htslib/*.h declarations.

Prefix struct/union/enum with cdef or ctypedef as appropriate.
This PyFile_AsFile declaration from Python.h is long-since unused.
Keep hfile.h, bgzf.h, hts.h, sam.h first as other headers require types
declared therein, but alphabetise the remaining header sections.
Add `const`; apply bam_hdr_t to sam_hdr_t renaming; rewrite most SAM_hdr
typedef usage; use samFile typedef; use array parameter forms.
Some BGZF, htsFile, hts_itr_t, and sam_hdr_t struct fields are omitted
from these Cython declarations. This commit updates the types of existing
fields but generally does not add new or previously omitted fields, as we
aim in future to make these structs (and some others) properly opaque.

Mark deprecated HTSlib API functions as deprecated. Some may be removed
at a later date.

Fix bam_cigar_gen() arguments (`l` is the length) and use `int` for
CIGAR opcodes in these functions.

Expose only the HTSlib 1.12+ cram_block_method enumerator items.
We intend to make these structs opaque in future: their fields are
present in the C declarations for implementation reasons but should
not be used by user code, so there is no reason why the corresponding
Cython declarations should not be as incomplete structs.

Cython needs named types for some fields that can be anonymous structs
in C. Rename some of these pysam-local structs so that they follow
hts_opt_val_union's pattern and avoid polluting the namespace.
@jmarshall jmarshall merged commit ea43c08 into pysam-developers:master Nov 15, 2025
13 checks passed
@jmarshall jmarshall deleted the pxd branch November 15, 2025 08:59
nh13 added a commit to nh13/pysam that referenced this pull request Nov 22, 2025
…ents

Add new IteratorColumnRecords class that generates pileup columns from a
collection of AlignedSegment objects using htslib's push-based pileup API
(bam_plp_push/bam_plp64_next).

Key features:
- Accepts any iterable of AlignedSegments (requires coordinate-sorted order)
- Supports optional reference sequence (fastafile parameter)
- Includes add_reference(), has_reference(), and seq_len property
- Configurable min_base_quality parameter
- Uses 64-bit position types (hts_pos_t) for extended chromosome support

Implementation notes:
- Uses bam_plp_push/bam_plp64_next instead of callback-based approach
- Records consumed during initialization for push-based API
- Includes required NULL push to signal end-of-input
- Leverages 64-bit APIs (bam_plp64_next, faidx_fetch_seq64) introduced in PR pysam-developers#1362

Testing:
- 12 new tests covering reference support, edge cases, and parameters
- Documented known limitation: minor depth differences vs samtools mpileup
  due to push-based vs pull-based filtering differences

Changes:
- Add IteratorColumnRecords class in pysam/libcalignmentfile.pyx
- Add type definitions for bam_plp_* functions in pysam/libchtslib.pxd
- Add type stub in pysam/libcalignmentfile.pyi
- Add parameterized to test dependencies for parameterized tests
- Update CI workflows to install parameterized package

Closes pysam-developers#1352
nh13 added a commit to nh13/pysam that referenced this pull request Nov 22, 2025
…ents

Add new IteratorColumnRecords class that generates pileup columns from a
collection of AlignedSegment objects using htslib's push-based pileup API
(bam_plp_push/bam_plp64_next).

Key features:
- Accepts any iterable of AlignedSegments (requires coordinate-sorted order)
- Supports optional reference sequence (fastafile parameter)
- Includes add_reference(), has_reference(), and seq_len property
- Configurable min_base_quality parameter
- Uses 64-bit position types (hts_pos_t) for extended chromosome support

Implementation notes:
- Uses bam_plp_push/bam_plp64_next instead of callback-based approach
- Records consumed during initialization for push-based API
- Includes required NULL push to signal end-of-input
- Leverages 64-bit APIs (bam_plp64_next, faidx_fetch_seq64) from PR pysam-developers#1362
- Uses opaque bam_plp_s struct (no direct field access needed)

Testing:
- 12 new tests covering reference support, edge cases, and parameters
- Documented known limitation: minor depth differences vs samtools mpileup
  due to push-based vs pull-based filtering differences

Changes:
- Add IteratorColumnRecords class in pysam/libcalignmentfile.pyx
- Update to use 64-bit pileup APIs (bam_plp64_next, faidx_fetch_seq64)
- Add type stub in pysam/libcalignmentfile.pyi
- Add parameterized to test dependencies for parameterized tests
- Update CI workflows to install parameterized package

Closes pysam-developers#1352
nh13 added a commit to nh13/pysam that referenced this pull request Nov 22, 2025
…ents

Add new IteratorColumnRecords class that generates pileup columns from a
collection of AlignedSegment objects using htslib's push-based pileup API
(bam_plp_push/bam_plp64_next).

Key features:
- Accepts any iterable of AlignedSegments (requires coordinate-sorted order)
- Supports optional reference sequence (fastafile parameter)
- Includes add_reference(), has_reference(), and seq_len property
- Configurable min_base_quality parameter
- Uses 64-bit position types (hts_pos_t) for extended chromosome support

Implementation notes:
- Uses bam_plp_push/bam_plp64_next instead of callback-based approach
- Records consumed during initialization for push-based API
- Includes required NULL push to signal end-of-input
- Leverages 64-bit APIs (bam_plp64_next, faidx_fetch_seq64) from PR pysam-developers#1362
- Uses opaque bam_plp_s struct (no direct field access needed)

Testing:
- 12 new tests covering reference support, edge cases, and parameters
- Documented known limitation: minor depth differences vs samtools mpileup
  due to push-based vs pull-based filtering differences

Changes:
- Add IteratorColumnRecords class in pysam/libcalignmentfile.pyx
- Update to use 64-bit pileup APIs (bam_plp64_next, faidx_fetch_seq64)
- Add type stub in pysam/libcalignmentfile.pyi
- Add parameterized to test dependencies for parameterized tests
- Update CI workflows to install parameterized package

Closes pysam-developers#1352
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant