Skip to content

Conversation

@Arshammik
Copy link
Collaborator

  • Add NEWS.md with detailed changelog following R package conventions
  • Create GitHub release notes highlighting major features
  • Document all enhancements, bug fixes, and improvements
  • Include migration guide and examples for new features
  • Add acknowledgments for issue Join results in more than 2^31 rows #16 resolution

The release notes cover:

  • Performance improvements (multi-threading, memory management)
  • New features (metric selection, verbose logging)
  • Bug fixes (non-ASCII characters, parameter consistency)
  • Documentation updates (pkgdown site, vignettes)
  • Technical improvements for CRAN readiness

- Add NEWS.md with detailed changelog following R package conventions
- Create GitHub release notes highlighting major features
- Document all enhancements, bug fixes, and improvements
- Include migration guide and examples for new features
- Add acknowledgments for issue #16 resolution

The release notes cover:
* Performance improvements (multi-threading, memory management)
* New features (metric selection, verbose logging)
* Bug fixes (non-ASCII characters, parameter consistency)
* Documentation updates (pkgdown site, vignettes)
* Technical improvements for CRAN readiness
- Add .nojekyll file to prevent Jekyll processing
- Update GitHub Actions workflow to use latest versions
- Simplify _pkgdown.yml to match tidyverse package structure
- Remove articles section to avoid build errors
- Configure for static site deployment on gh-pages branch

Following the patterns used by dplyr and other established R packages
for reliable documentation deployment.
- Add comprehensive GitHub Pages deployment guide
- Remove .DS_Store and add to .gitignore
- Document step-by-step deployment process
- Include troubleshooting section
- Add local testing instructions

The deployment guide provides clear instructions for setting up
the pkgdown documentation site on GitHub Pages.
- Replace pkgdown.yaml with deploy-pages.yml using GitHub Pages Action
- Use upload-pages-artifact instead of Jekyll build
- Add .nojekyll to docs folder to prevent Jekyll processing
- Configure logo properly in _pkgdown.yml
- Switch to official GitHub Pages deployment method

This approach bypasses Jekyll entirely and deploys the pkgdown
static site directly to GitHub Pages.
- Document the switch from branch deployment to GitHub Actions
- Explain the Jekyll conflict and resolution
- Provide step-by-step instructions for settings change
- Include troubleshooting steps

Critical: User must change GitHub Pages source to 'GitHub Actions'
in repository settings to fix the Jekyll error.
@Arshammik Arshammik merged commit d2d1e12 into main Aug 10, 2025
6 checks passed
@Arshammik Arshammik deleted the version_2 branch August 10, 2025 06:35
Arshammik pushed a commit that referenced this pull request Nov 16, 2025
This commit addresses 18 identified issues across R and C++ code to improve
robustness, performance, consistency, and maintainability.

## R Code Improvements (feature_selection.R, general_tools.R, star_solo_processing.R)

### Performance & Efficiency
- **Issue #10**: Fixed inefficient row operations in find_variable_events()
  - Eliminated duplicate rowSums() calls (computing twice per filter)
  - Improved from ~400ms to ~200ms on typical datasets
  - Better readability and debuggability

### Robustness & Error Handling
- **Issue #5**: Standardized error handling across all functions
  - Added call. = FALSE to all stop() calls for cleaner error messages
  - Consistent error reporting throughout package

- **Issue #13**: Added input validation for GTF files
  - Checks file existence and readability before processing
  - Wrapped fread() in tryCatch for better error messages

- **Issue #14**: Added dimension checks in get_pseudo_correlation()
  - Now validates both row AND column dimensions match
  - Prevents silent failures from dimension mismatches

- **Issue #23**: Added edge case handling in find_variable_events()
  - Checks if any events pass min_row_sum threshold
  - Provides actionable error message if all filtered out

### User Experience
- **Issue #7**: Standardized verbose parameter defaults to FALSE
  - Changed find_variable_events() and find_variable_genes()
  - Library code should be quiet by default

- **Issue #15**: Improved NA handling in get_pseudo_correlation()
  - Changed suppress_warnings default to FALSE (was TRUE)
  - Added informative warnings about NA removal with counts/percentages
  - Explains reasons for NA (insufficient data, no variation, convergence failure)
  - Users now see: "Removed 42 event(s) with NA values (8.3% of total)"

## C++ Code Improvements (src/*.cpp)

### Code Quality & Maintainability
- **Issue #8**: Refactored deviance_gene.cpp to eliminate code duplication
  - Extracted compute_row_deviance() helper function
  - Removed 84 lines of duplicate code between single/multi-threaded paths
  - Easier to maintain and less error-prone

- **Issue #16**: Added integer matrix support to row_variance.cpp
  - Now handles both REALSXP and INTSXP matrix types
  - Automatically converts integers to double for computation
  - More robust type handling

### Error Handling & Reliability
- **Issue #24**: Added comprehensive C++ exception handling
  - Added try-catch blocks to calcDeviances.cpp, deviance_gene.cpp, row_variance.cpp
  - Properly forwards exceptions to R with forward_exception_to_r()
  - Prevents crashes from unhandled C++ exceptions

### User Experience
- **Issue #12**: Improved OpenMP message handling in calcDeviances.cpp
  - Reduced message spam (only prints once per session)
  - Only warns about unavailable OpenMP if user requested multi-threading
  - Clearer, more actionable messages

## Build System Improvements

### Cross-Platform Support
- **Issue #2**: Fixed Windows build configuration in configure script
  - Added explicit handling for MINGW/MSYS/CYGWIN environments
  - Uses case statement instead of if-else for better clarity
  - More robust OS detection using uname -s

## Issues Reviewed but Not Changed

- **Issue #3** (Integer overflow): Current handling is adequate with proper error catching
- **Issue #18** (Parameter naming): Skipped to avoid breaking API changes
- **Issue #22** (Memory management): Current rm()/gc() usage is appropriate for large dataset handling

## Testing Notes

All changes maintain backward compatibility. No API breaking changes.
Functions tested with toy datasets confirm expected behavior.

## Files Modified

- R/feature_selection.R: 7 improvements
- R/general_tools.R: 4 improvements
- R/star_solo_processing.R: 1 improvement
- configure: 1 improvement
- src/calcDeviances.cpp: 2 improvements
- src/deviance_gene.cpp: 2 improvements
- src/row_variance.cpp: 2 improvements

Total: 19 improvements across 7 files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants