Implement `splitcode_demux_fastqs` entry point for splitcode-based demultiplexing of Illumina DRAGEN paired fastq files

### Summary
Add two new entry points in `illumina.py`:
1. **`illumina_metadata`**: Generate metadata JSONs from RunInfo.xml and SampleSheet (run once per sequencing run)
2. **`splitcode_demux_fastqs`**: Perform splitcode-driven demultiplexing from paired DRAGEN FASTQs using custom third inline barcode (run in parallel per FASTQ pair)

This separation enables efficient parallel processing by generating shared metadata once, then running multiple demux jobs simultaneously.

---

### New Entry Point #1: `illumina_metadata`

#### Purpose
Generate metadata JSON files from Illumina run metadata files **without processing reads**. Run once per sequencing run to create metadata that's shared across all parallel demux jobs.

#### Inputs
1. **RunInfo.xml** (required): Illumina run metadata
2. **SampleSheet.csv** (required): Illumina/DRAGEN samplesheet
3. **Lane number** (required): Lane to process
4. **Sequencing center** (optional): Default "Broad"

#### Outputs
1. **run_info.json**: Run metadata (flowcell, dates, read structure, instrument info)
2. **meta_by_sample.json**: Sample metadata indexed by sample name
3. **meta_by_filename.json**: Sample metadata indexed by filename/library ID

#### Implementation Notes
- Reuses existing `build_run_info_json()` utility function
- Extracts duplicated metadata generation logic from `illumina_demux` and `splitcode_demux`
- No read processing - pure metadata extraction

**Status**: ✅ **COMPLETE** - All 7 tests passing

---

### New Entry Point #2: `splitcode_demux_fastqs`

#### Purpose
Perform splitcode-based demultiplexing directly from a single paired DRAGEN FASTQ file set, using a custom third inline barcode scheme. Designed to run in parallel across multiple FASTQ pairs.

#### Inputs
1. **Paired FASTQ files** (R1/R2): Exactly one pair from DRAGEN output
2. **Custom 3-barcode samplesheet**: TSV format defining third inline barcode sequences
   - Maps composite (index1 + index2 + inline) barcode → sample name
   - May include rows with empty `barcode_3` (2-barcode samples that bypass splitcode)
3. **Output directory**: Where to write BAM files and metrics

**Note**: RunInfo.xml and Illumina SampleSheet.csv are **NOT** required - metadata JSONs are generated separately via `illumina_metadata`.

#### Processing for 3-barcode samples (barcode_3 present):
1. Parse FASTQ filenames to extract pool/sample metadata
2. Extract outer barcodes (index1+index2) from DRAGEN FASTQ headers
3. Filter samplesheet to matching outer barcodes
4. Generate splitcode configuration from inline barcode definitions
5. Run splitcode demultiplexing
6. Convert splitcode output to per-sample unaligned BAMs
7. Generate demux metrics

#### Processing for 2-barcode samples (barcode_3 empty):
1. Skip splitcode demultiplexing entirely
2. Perform direct FASTQ → BAM conversion
3. Output exactly one BAM file (the pool itself)
4. Generate metrics

#### Outputs
1. **Per-sample unaligned BAMs**: One BAM per resolved sample
2. **demux_metrics.json**: Read counts per sample, unmatched reads, etc.

**Note**: Does **NOT** output:
- ~~barcodes_common.txt~~ (removed from spec - use `illumina_demux` for comprehensive barcode reporting)
- ~~barcodes_outliers.txt~~ (removed from spec - use `illumina_demux` for comprehensive barcode reporting)
- run_info.json, meta_by_sample.json, meta_by_filename.json (use `illumina_metadata` instead)

**Status**: ✅ **COMPLETE** - All 9 tests passing

---

### Typical Workflow

```bash
# Step 1: Generate metadata once per run
illumina_metadata \
  --runinfo RunInfo.xml \
  --samplesheet SampleSheet.csv \
  --lane 1 \
  --out_runinfo run_info.json \
  --out_meta_by_sample meta_by_sample.json \
  --out_meta_by_filename meta_by_filename.json

# Step 2: Run demux in parallel for each pool
for pool in Pool1 Pool2 Pool3 Pool4; do
  splitcode_demux_fastqs \
    --inFastq1 ${pool}_R1.fastq.gz \
    --inFastq2 ${pool}_R2.fastq.gz \
    --sampleSheet samples_3bc.tsv \
    --outDir demux_out/${pool} &
done
wait
```

---

### Refactoring Benefits

1. **Eliminates duplication**: Both `illumina_demux` and `splitcode_demux` currently duplicate run_info.json generation code (100% identical)
2. **Uses existing code**: Leverages already-implemented `build_run_info_json()` utility
3. **Enables parallelization**: Metadata generated once, then many demux jobs run simultaneously
4. **Simplifies interface**: `splitcode_demux_fastqs` has fewer required inputs
5. **Clear separation of concerns**: Metadata extraction vs read processing

---

### Implementation Status

#### ✅ Phase 1: Shared Utilities - COMPLETE
- parse_illumina_fastq_filename() - 15 tests passing
- build_run_info_json() - 5 tests passing
- normalize_barcode() - 11 tests passing

#### ✅ Phase 2: Test Infrastructure - COMPLETE
- TestIlluminaMetadata test class created - 7 tests
- TestSplitcodeDemuxFastqs test class created - 9 tests
- Test data files created (RunInfo.xml, SampleSheet.csv, FASTQs)

#### ✅ Phase 3: Implementation - COMPLETE
- illumina_metadata() implemented - **7/7 tests passing**
- splitcode_demux_fastqs() implemented - **9/9 tests passing**
- Refactored illumina_demux to use build_run_info_json()
- Refactored splitcode_demux to use build_run_info_json()

#### ⬜ Phase 4: Documentation & Validation - TODO
- Update command-line documentation
- Final validation with CI
- Code review

---

### Test Data

**For `illumina_metadata`**:
- Synthetic RunInfo.xml with flowcell TESTFC01
- Synthetic SampleSheet.csv in DRAGEN format (3 pools)
- Validates output JSON schemas match existing demux outputs

**For `splitcode_demux_fastqs`**:
- **TestPool1** (3-barcode sample):
  - 100 reads: AAAAAAAA (TestSample1)
  - 75 reads: CCCCCCCC (TestSample2)
  - 50 reads: GGGGTTTT (TestSample3)
  - 0 reads: TTTTGGGG (TestSampleEmpty)
  - 25 reads: Outlier barcodes (GGAATTTT, CCCCAAAA, ATATAGAG)
  
- **TestPool3** (2-barcode sample):
  - 80 reads: No inline barcode (TestSampleNoSplitcode)
  - Tests bypass of splitcode for 2-barcode samples


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `splitcode_demux_fastqs` entry point for splitcode-based demultiplexing of Illumina DRAGEN paired fastq files #122

Summary

New Entry Point #1: `illumina_metadata`

Purpose

Inputs

Outputs

Implementation Notes

New Entry Point #2: `splitcode_demux_fastqs`

Purpose

Inputs

Processing for 3-barcode samples (barcode_3 present):

Processing for 2-barcode samples (barcode_3 empty):

Outputs

Typical Workflow

Refactoring Benefits

Implementation Status

✅ Phase 1: Shared Utilities - COMPLETE

✅ Phase 2: Test Infrastructure - COMPLETE

✅ Phase 3: Implementation - COMPLETE

⬜ Phase 4: Documentation & Validation - TODO

Test Data

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement splitcode_demux_fastqs entry point for splitcode-based demultiplexing of Illumina DRAGEN paired fastq files #122

Description

Summary

New Entry Point #1: illumina_metadata

Purpose

Inputs

Outputs

Implementation Notes

New Entry Point #2: splitcode_demux_fastqs

Purpose

Inputs

Processing for 3-barcode samples (barcode_3 present):

Processing for 2-barcode samples (barcode_3 empty):

Outputs

Typical Workflow

Refactoring Benefits

Implementation Status

✅ Phase 1: Shared Utilities - COMPLETE

✅ Phase 2: Test Infrastructure - COMPLETE

✅ Phase 3: Implementation - COMPLETE

⬜ Phase 4: Documentation & Validation - TODO

Test Data

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement `splitcode_demux_fastqs` entry point for splitcode-based demultiplexing of Illumina DRAGEN paired fastq files #122

New Entry Point #1: `illumina_metadata`

New Entry Point #2: `splitcode_demux_fastqs`