Skip to content

Comments

test: add 60 unit tests for automl, codegen, utils, and VW modules#2498

Open
BrendanWalsh wants to merge 3 commits intomasterfrom
brwals/tests-round2
Open

test: add 60 unit tests for automl, codegen, utils, and VW modules#2498
BrendanWalsh wants to merge 3 commits intomasterfrom
brwals/tests-round2

Conversation

@BrendanWalsh
Copy link
Collaborator

@BrendanWalsh BrendanWalsh commented Feb 13, 2026

Related Issues/PRs

Follow-up to #2497 — continued coverage improvement.

What changes were proposed in this pull request?

Adds 16 new test files with 118 unit tests covering previously-untested source files across core, VW, and LightGBM modules:

Area Files Tested Tests
AutoML HyperparamBuilder, DefaultHyperparams, EvaluationUtils, ParamSpace 40
Codegen GenerationUtils, CodegenConfig, DefaultParamInfo 25
Utils ModelEquality, JarLoadingUtils, OsUtils 8
HTTP RESTHelpers (retry logic) 5
Stages Trie, Cacher, UnicodeNormalize, TextPreprocessor 22
VW VectorUtils 9
LightGBM DatasetUtils (countCardinality, getArrayType, validateGroupColumn) 11

How was this patch tested?

All 118 tests compile and pass locally. Mix of pure unit tests (no SparkSession) and lightweight Spark tests using small in-memory DataFrames. No external service dependencies.

Does this PR introduce any user-facing change?

No — test-only change.

New Dependencies

None.

Add 7 new test files with 60 tests covering previously-untested source files:

- AutoML: HyperparamBuilder (IntRange, DoubleRange, FloatRange, LongRange,
  DiscreteHyperParam, HyperParamUtils), DefaultHyperparams (all 6 classifier
  defaults), EvaluationUtils (metric-operator mapping), ParamSpace (GridSpace,
  RandomSpace, Dist)
- Codegen: GenerationUtils (indent, camelToSnake)
- Utils: ModelEquality (jaccardSimilarity)
- VW: VectorUtils (sortAndDistinct with collision handling)

All tests are pure unit tests with no external service dependencies.
@github-actions
Copy link

Hey @BrendanWalsh 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

Add 8 new test files:

- Codegen: CodegenConfig (directory derivation, constants),
  DefaultParamInfo (param type mapping for all 16+ param types)
- HTTP: RESTHelpers retry logic (success, retry-then-succeed, exhaustion)
- Stages: Trie data structure (put/get/mapText/overlapping keys),
  Cacher (cache/disable), UnicodeNormalize (NFKD/case),
  TextPreprocessor (substring replacement via Trie)
- LightGBM: DatasetUtils (countCardinality, getArrayType,
  validateGroupColumn)

Pure unit tests run without SparkSession. Spark-dependent tests
(Cacher, UnicodeNormalize, TextPreprocessor) use small in-memory
DataFrames.
@BrendanWalsh BrendanWalsh requested a review from svotaw as a code owner February 13, 2026 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant