Automatically compare two implementations of the same problem with property-based testing and performance benchmarks.
AB is an Elixir library that makes it effortless to verify that two implementations of the same function behave identically, while also comparing their performance characteristics. Perfect for refactoring, algorithm optimization, and A/B testing different approaches.
When you have two implementations of the same function:
- Refactoring - Ensure your optimized version produces identical results
- Algorithm comparison - Compare different algorithms solving the same problem
- Migration - Verify new code matches legacy behavior exactly
- Learning - Understand tradeoffs between different approaches
AB automatically generates property tests from your typespecs and runs comprehensive comparisons.
✅ Automatic property test generation from function typespecs
✅ Side-by-side comparison of two implementations
✅ Performance benchmarking with detailed statistics
✅ Invalid input testing to verify error handling
✅ Type consistency validation between specs and implementations
✅ Mix task for testing standalone Elixir files
✅ Zero boilerplate - just add macros to your tests
Add ab to your mix.exs dependencies:
def deps do
[
{:ab, "~> 0.1.0"}
]
enddefmodule Math do
# Implementation A: iterative
@spec factorial_iterative(non_neg_integer()) :: pos_integer()
def factorial_iterative(n), do: factorial_iter(n, 1)
defp factorial_iter(0, acc), do: acc
defp factorial_iter(n, acc), do: factorial_iter(n - 1, n * acc)
# Implementation B: recursive
@spec factorial_recursive(non_neg_integer()) :: pos_integer()
def factorial_recursive(0), do: 1
def factorial_recursive(n), do: n * factorial_recursive(n - 1)
enddefmodule MathTest do
use ExUnit.Case
use ExUnitProperties
import AB
# Automatically test both implementations produce identical results
compare_test {Math, :factorial_iterative}, {Math, :factorial_recursive}
# Benchmark performance differences
benchmark_test {Math, :factorial_iterative}, {Math, :factorial_recursive}
# Test each implementation matches its typespec
property_test Math, :factorial_iterative
property_test Math, :factorial_recursive
endThat's it! AB will:
- Generate random test data matching your typespec
- Verify both functions produce identical outputs
- Compare performance with detailed statistics
- Validate outputs match the declared return type
Generates property tests proving two implementations produce identical results:
# Basic comparison
compare_test {ModuleA, :function}, {ModuleB, :function}
# With verbose logging
compare_test {ModuleA, :function}, {ModuleB, :function}, verbose: trueThe macro will:
- Extract and compare typespecs (must be identical)
- Generate test data matching the input types
- Run both functions on the same inputs
- Assert outputs are identical
- Validate outputs match the return type
Example output:
property factorial_iterative and factorial_recursive produce identical results
✓ 100 successful comparison runs
✓ factorial_iterative and factorial_recursive produce identical results (1.2ms)
Generates benchmarks comparing two implementations:
# Basic benchmark
benchmark_test {ModuleA, :function}, {ModuleB, :function}
# Custom timing
benchmark_test {ModuleA, :function}, {ModuleB, :function},
time: 5, # 5 seconds of benchmarking
memory_time: 2 # 2 seconds of memory profilingExample output:
=== Benchmarking Math.factorial_iterative vs Math.factorial_recursive ===
Name ips average deviation median 99th %
Math.factorial_iterative 1.23 M 0.81 μs ±612.45% 0.75 μs 1.12 μs
Math.factorial_recursive 0.98 M 1.02 μs ±587.32% 0.96 μs 1.35 μs
Comparison:
Math.factorial_iterative 1.23 M
Math.factorial_recursive 0.98 M - 1.26x slower +0.21 μs
Automatically generates property tests from function typespecs:
# Basic property test
property_test MyModule, :my_function
# With verbose logging
property_test MyModule, :my_function, verbose: trueThe macro will:
- Parse the function's
@specdeclaration - Generate appropriate test data for all input types
- Call the function with generated inputs
- Validate outputs match the declared return type
- Test type consistency between
@typeand@spec
Supported types:
- Basic:
integer(),float(),number(),boolean(),atom(),binary(),bitstring(),String.t(),charlist(),nil,iodata,no_return - Collections:
list(type),tuple({type1, type2}),map(),keyword(),keyword(type) - Maps:
%{key => value},%{required(:key) => type},%{optional(:key) => type}(optional fields don't cause validation failures) - Functions:
(arg_type -> return_type),(arg1, arg2 -> return),(-> return)for callbacks and higher-order functions - Ranges:
0..100,pos_integer(),non_neg_integer(),neg_integer() - Structs: Custom struct types with
@type t :: %__MODULE__{...} - Union types:
integer() | String.t() - Literals: Specific atom or integer values (e.g.,
:ok,42) - Generic:
any(),term() - Complex: Nested structures, remote types
Validated against: Successfully parses all typespecs from real-world libraries like Jason
Important notes:
- Maps: Optional fields and extra keys are properly handled - only required fields must be present.
- Functions: Generated functions have correct arity and return correct types, but are "constant functions" that ignore their arguments. This still validates that tested functions accept and call function arguments correctly, but doesn't verify the lambda's internal logic.
Tests that functions properly reject invalid inputs:
# Test invalid input handling
robust_test MyModule, :my_function
# With verbose logging
robust_test MyModule, :my_function, verbose: trueThis generates inputs that don't match the typespec and verifies the function either:
- Raises an appropriate exception
- Has guards that prevent type mismatches
Great for ensuring functions fail gracefully rather than producing garbage output.
defmodule Sum do
# Implementation A: Enum.sum
@spec sum_builtin([integer()]) :: integer()
def sum_builtin(list), do: Enum.sum(list)
# Implementation B: manual recursion
@spec sum_recursive([integer()]) :: integer()
def sum_recursive([]), do: 0
def sum_recursive([head | tail]), do: head + sum_recursive(tail)
end
defmodule SumTest do
use ExUnit.Case
use ExUnitProperties
import AB
describe "Sum implementations" do
# Verify both produce identical results
compare_test {Sum, :sum_builtin}, {Sum, :sum_recursive}
# Compare performance
benchmark_test {Sum, :sum_builtin}, {Sum, :sum_recursive}
# Validate each against typespec
property_test Sum, :sum_builtin
property_test Sum, :sum_recursive
# Test error handling
robust_test Sum, :sum_builtin
robust_test Sum, :sum_recursive
end
endOutput:
SumTest
Sum implementations
property sum_builtin and sum_recursive produce identical results
✓ 100 successful comparison runs
✓ sum_builtin and sum_recursive produce identical results (1.8ms)
property sum_builtin satisfies its typespec
✓ 100 successful property test runs
✓ sum_builtin satisfies its typespec (2.1ms)
✓ sum_builtin type consistency validation (0.1ms)
property sum_recursive satisfies its typespec
✓ 100 successful property test runs
✓ sum_recursive satisfies its typespec (2.4ms)
✓ sum_recursive type consistency validation (0.1ms)
property sum_builtin properly rejects invalid input
✓ 100 successful invalid input test runs
✓ sum_builtin properly rejects invalid input (124.3ms)
property sum_recursive properly rejects invalid input
✓ 100 successful invalid input test runs
✓ sum_recursive properly rejects invalid input (127.8ms)
test benchmark sum_builtin vs sum_recursive
=== Benchmarking Sum.sum_builtin vs Sum.sum_recursive ===
Name ips average deviation
Sum.sum_builtin 1.45 M 0.69 μs ±652.34%
Sum.sum_recursive 0.87 M 1.15 μs ±723.12%
Comparison:
Sum.sum_builtin 1.45 M
Sum.sum_recursive 0.87 M - 1.67x slower +0.46 μs
✓ benchmark sum_builtin vs sum_recursive (7503.5ms)
Finished in 7.9 seconds
8 properties, 1 test, 0 failures
For manual testing and custom scenarios:
Extract typespec information:
{:ok, {input_types, output_type}} =
AB.get_function_spec(MyModule, :my_function)Compare two type specifications:
AB.types_equivalent?(type1, type2)
# => true | falseGet detailed type information from a value:
AB.infer_result_type([1, 2, 3])
# => "list(integer())"
AB.infer_result_type(%{name: "Alice", age: 30})
# => "%{age: integer(), name: binary()}"
AB.infer_result_type({:ok, true})
# => "{atom(), boolean()}"
AB.infer_result_type([])
# => "list(term())" # unknown element type
AB.infer_result_type([1, "a"])
# => "list(term())" # inconsistent types# Compare old vs new implementation
compare_test {Parser, :parse_legacy}, {Parser, :parse_optimized}
benchmark_test {Parser, :parse_legacy}, {Parser, :parse_optimized}# Test different search algorithms
compare_test {Search, :binary_search}, {Search, :interpolation_search}# Compare JSON encoding libraries
compare_test {Encoder, :encode_with_jason}, {Encoder, :encode_with_poison}You can test standalone Elixir files without setting up a full test suite using the mix ab.test task:
# Test a single file
mix ab.test path/to/file.ex
# Test with verbose output
mix ab.test path/to/file.ex --verboseThe task will:
- Compile the specified Elixir file
- Extract the module from the compiled code
- Find all exported functions with typespecs
- Run AB.property_test on each function
Example:
$ mix ab.test lib/my_module.ex
Testing file: lib/my_module.ex
Running property tests for module: MyModule
Found 3 functions with typespecs
=== Running property tests for 3 functions ===
Testing MyModule.add
✓ 100 successful property test runs
Testing MyModule.multiply
✓ 100 successful property test runs
Testing MyModule.divide
✓ 100 successful property test runs
✓ All property tests completed- stream_data - Property-based testing and data generation
- benchee - Performance benchmarking
- ex_unit - Elixir's built-in test framework
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
MIT License - see LICENSE file for details
Built with ❤️ using:
- StreamData by Andrea Leopardi
- Benchee by Tobias Pfeiffer
- Inspired by QuickCheck and property-based testing
Start comparing your implementations today! 🚀