Uncertain[T] - Working with Probabilistic Data

Uncertain-tee is a monte carlo simulation based and correlation preserving library for working with Uncertain data, called an Uncertain[T].

Uncertain[T] helps you work with data that isn't exact, like measurements with error, user behavior predictions, or any value that has uncertainty. Instead of just working with single values, you work with distributions of possible values.

val frontend = Uncertain.triangular(5, 10, 20)  // days
val backend = Uncertain.triangular(8, 15, 30)
val testing = Uncertain.triangular(3, 5, 10)

val total = frontend + backend + testing

val onTimeProb = (total < 30).probability(sampleCount = 100_000)
val percentiles = Quantiles.percentiles(total, sampleCount = 100_000)

println(s"50% chance: ${percentiles.percentile(50)} days")
println(s"90% chance: ${percentiles.percentile(90)} days")
println(s"Probability of finishing in 30 days: $onTimeProb")

When coding with uncertainty, you don't say "the user will click the button," instead we say "there's a 75% chance the user will click the button" - and write code that handles that uncertainty, without needing to hand-roll a big block of statistics-calculating-code.

The primary guarantee of this library is correlation preserving operations that make combining, calculating and composing Uncertain[T] instances safe and correct. The core idea revolves around the monadic Uncertain[T] (it provides a constructor and a .map and a .flatMap) that uses a memoized computation graph internally to preserve correlation. This makes it very flexible - as well as allow composition via for-comprehensions, leading to very legible code despite a complex statistical domain.

Installation

🦺🚧🏗️- library is still under construction/testing, not yet released to the wild.

Build Tool	Instruction
sbt	`libraryDependencies += "mostly" %% "uncertain-tee" % NOT_YET_RELEASED"`
mill	`ivy"mostly::uncertain-tee:NOT_YET_RELEASED"`
scala-cli	`//> using dep "mostly::uncertain-tee:NOT_YET_RELEASED"`

Quick Start

Quick start example

import mostly.uncertaintee.Uncertain
import mostly.uncertaintee.syntax.*

// Create an uncertain speed with some measurement error
val speed = Uncertain.normal(65.0, 5.0) // mean=65 mph, std dev=5 mph

// Check if we're probably speeding (speed limit is 60)
if (speed.gt(60).isProbable()) {
  println("You're probably speeding")
}

// Get a confidence interval
val (low, high) = speed.confidenceInterval(0.95)
println(s"95% confident speed is between $low and $high mph")

Creating Uncertain Values

The library provides several ways to create uncertain values:

// From common distributions
val temperature = Uncertain.normal(72.0, 3.0) // Normal distribution
val diceRoll = Uncertain.uniform(1, 7) // Uniform between 1-6 
val coinFlip = Uncertain.bernoulli(0.5) // True/false with 50% chance

// From observed data
val userRatings = List(4, 5, 3, 5, 4, 5)
val nextRating = Uncertain.empirical(userRatings)

// From explicit probabilities
val weather = Uncertain.categorical(Map(
  "Sunny" -> 0.7,
  "Cloudy" -> 0.2,
  "Rain" -> 0.1
))

Working with Uncertain Values

You can perform operations on uncertain values just like regular values:

val height = Uncertain.normal(5.8, 0.2) // feet
val width = Uncertain.normal(3.2, 0.1) // feet

// Calculate area (uncertainty propagates automatically)
val area = height * width

// Compare values
val isLargeRoom = area.gt(15.0)

Making Decisions with Uncertainty

The library provides statistical tests to help make decisions:

val conversionRate = Uncertain.normal(0.12, 0.02)

// Is conversion rate probably above 10%?
if (conversionRate.gt(0.10).isProbable()) {
  println("Conversion rate looks good")
}

// Are we 90% confident it's above 10%?
if (conversionRate.gt(0.10).probability(exceeds = 0.9)) {
  println("High confidence in good conversion rate")
}

Transforming and Chaining with .map and .flatMap

Besides using standard math operators, you can use .map and .flatMap for more complex operations, which allows you to build probabilistic workflows in a very 'scala native' way.

Transforming values with .map

Use .map to apply a simple function to the value inside an Uncertain container. It transforms the value, but keeps the underlying uncertainty.

// Let's say we have speed in miles per hour
val speedMph = Uncertain.uniform(50, 70)

// We can convert it to kilometers per hour
val speedKph = speedMph.map(_ * 1.60934)

println(speedKph.sample()) // A random value between 80.467 and 112.6538

You can .map to any type. For instance, you can classify a numeric value into a String:

val temperatureF = Uncertain.normal(75, 10) // 75°F ± 10°

val comfortLevel = temperatureF.map { temp =>
  if (temp > 85) "Hot"
  else if (temp > 65) "Comfortable"
  else "Cold"
}

// See the distribution of comfort levels
println(comfortLevel.histogram(1000))
// Might output: Map(Comfortable -> 831, Hot -> 148, Cold -> 21)

or .map multiple uncertain values into a case class, using a for-comprehension

final case class Rectangle(width: Double, height: Double) {
  def area: Double = width * height
}

val uncertainWidth = Uncertain.normal(10, 1)
val uncertainHeight = Uncertain.normal(5, 0.5)

// Create an uncertain Rectangle
val uncertainRectangle: Uncertain[Rectangle] =
  for {
    width <- uncertainWidth
    height <- uncertainHeight
  } yield Rectangle(width, height)

val uncertainArea = uncertainRectangle.map(_.area)
println(s"Expected area: ${uncertainArea.expectedValue()}")

Chaining operations with `.flatMap`

This is useful for creating conditional logic where the outcome of one probabilistic event determines the next one.

For example, let's model event attendance, which depends on the weather:

// First, model the weather as a probabilistic event
val weatherIsGood = Uncertain.bernoulli(0.7) // 70% chance of good weather

// Then, model attendance based on the weather
val attendance = weatherIsGood.flatMap { isGood =>
  if (isGood) {
    Uncertain.normal(100, 10) // If weather is good, expect 100 ± 10 people
  } else {
    Uncertain.normal(60, 15) // If bad, expect 60 ± 15 people
  }
}
println(s"Expected attendance: ${attendance.expectedValue()}")

Here, .flatMap chains the two uncertain events (weatherIsGood and attendance) together.

Measurement with Error

// Sensor reading with known error
val sensorReading = Uncertain.normal(actualValue = 23.5, error = 0.8)

// Check if it's within acceptable range
val isAcceptable = sensorReading > 20.0 && sensorReading < 25.0
if (isAcceptable.isProbable()) {
  println("Sensor reading is probably acceptable")
}

A/B Testing

val controlConversion = Uncertain.normal(0.08, 0.01) // 8% ± 1%
val testConversion = Uncertain.normal(0.11, 0.015) // 11% ± 1.5%

val improvement = testConversion - controlConversion
if ((improvement > 0.0).probability(exceeds = 0.95)) {
  println("Test variant is significantly better")
}

Risk Assessment

val serverLoad = Uncertain.normal(0.65, 0.15) // 65% ± 15%
val criticalThreshold = 0.9

val riskOfOverload = serverLoad > criticalThreshold
println(s"Risk of overload: ${riskOfOverload.expectedValue() * 100}%")

Understanding Correlation

One of the library's key features is preserving correlation. This means that x - x always equals zero:

val x = Uncertain.normal(10, 2)

// These behave differently:
val uncorrelated = Uncertain.normal(10, 2) - Uncertain.normal(10, 2) // Not always zero, as each normal distribution is radomly sampled
val correlated = x - x // Always exactly zero since x is memoized

println(uncorrelated.sample()) // Might be 1.5, -0.8, 2.1, etc.
println(correlated.sample()) // Always 0.0

This matters when you use the same uncertain value multiple times in a calculation.

Mixture Models

// Combine multiple distributions
val peakHours = Uncertain.normal(50, 5) // 50 users ± 5
val offHours = Uncertain.normal(15, 3) // 15 users ± 3

val userCount = Uncertain.mixture(Map(
  peakHours -> 0.3, // 30% of time it's peak hours
  offHours -> 0.7 // 70% of time it's off hours
))

Performance Notes

The library uses smart sampling techniques:

Lazy evaluation: Computations happen only when you request samples
Automatic sample sizing: Statistical tests use only as many samples as needed
Efficient hypothesis testing: Uses Sequential Probability Ratio Test (SPRT) instead of fixed large sample sizes

Continuous vs Discrete Distributions

val continuous = Uncertain.normal(5, 1)
val discrete = Uncertain.categorical(Map(1 -> 0.3, 2 -> 0.7))

// Mode works well for discrete:
println(discrete.mode()) // Meaningful result

// Mode is less useful for continuous:
println(continuous.mode()) // Probably not very meaningful

Testing

// Use fixed seeds for reproducible tests
val testValue = Uncertain.normal(10, 1)(Random(42))
// Results will be consistent across test runs

API Reference (‼️ out of date, library still WIP ‼️)

All functionality here is available via following two imports

import mostly.uncertaintee.Uncertain // brings in the core data type
import mostly.uncertaintee.syntax.* // brings in all syntax below

`Uncertain[T]` Methods

These are the fundamental methods available on any Uncertain[T] instance.

Method	Description	Example Use
`u.sample()`	Retrieves a single random sample from the distribution.	`val singleRoll = diceRoll.sample()`
`u.map(f)`	Transforms the value within the `Uncertain` container using a function.	`val speedKph = speedMph.map(_ * 1.609)`
`u.flatMap(f)`	Chains dependent uncertain computations together.	`val attendance = weatherIsGood.flatMap(isGood => ...)`
`u.filter(p)`	Filters the uncertain value, returning an `Uncertain[Option[T]]`.	`val validSpeed = speed.filter(s => s > 0 && s < 130)`
`u.iterator`	Returns an infinite iterator of samples from the distribution.	`val firstTenSamples = speed.iterator.take(10).toList`
`u.take(n)`	Collects a specified number of samples into a list.	`val samples = speed.take(1000)`

Constructing an `Uncertain[T]` instance

Method	Description	Example Use
`Uncertain[T](sampler)`	Creates an `Uncertain` from a custom sampling function.	`val custom = Uncertain(() => math.random() * 10)`
`Uncertain.point(value)`	Creates an `Uncertain` value that is always the same constant value.	`val fixedValue = Uncertain.point(42)`
`Uncertain.normal(mean, stdDev)`	Creates a normal (Gaussian) distribution.	`val temp = Uncertain.normal(72, 3)`
`Uncertain.uniform(min, max)`	Creates a uniform distribution.	`val diceRoll = Uncertain.uniform(1, 7)`
`Uncertain.bernoulli(probability)`	Creates a true/false distribution with a given probability of `true`.	`val coinFlip = Uncertain.bernoulli(0.5)`
`Uncertain.empirical(data)`	Creates a distribution by sampling from a collection of observed data.	`val nextRating = Uncertain.empirical(List(4, 5, 3, 5))`
`Uncertain.categorical(outcomes)`	Creates a distribution from a map of outcomes to their probabilities.	`val weather = Uncertain.categorical(Map("Sunny" -> 0.7, "Rainy" -> 0.3))`
`Uncertain.mixture(components)`	Creates a mixture of different uncertain distributions with specified weights.	`val userCount = Uncertain.mixture(Map(peakHours -> 0.3, offHours -> 0.7))`
`Uncertain.sequence(uncertains)`	Converts a `List[Uncertain[T]]` into a single `Uncertain[List[T]]`.	`val allRolls = Uncertain.sequence(List(die1, die2))`

Operations on Uncertain

Arithmetic Operations

They are brought in specifically by import mostly.uncertaintee.syntax.arithmetic.*.

Method	Description	Example Use
`+`	Adds two uncertain values or an uncertain value and a constant.	`val total = Uncertain.normal(10, 1) + Uncertain.normal(5, 2)`
`-`	Subtracts two uncertain values or a constant from an uncertain value.	`val difference = Uncertain.normal(10, 1) - 5.0`
`*`	Multiplies two uncertain values or an uncertain value by a constant.	`val area = uncertainWidth * uncertainHeight`
`/`	Divides an uncertain value by another or by a constant.	`val ratio = Uncertain.normal(100, 5) / 10.0`

Boolean and Logical Operations

These methods enable logical operations and statistical hypothesis testing on Uncertain[Boolean] values. They are brought in specifically by import mostly.uncertaintee.syntax.boolean.*.

Method	Description	Example Use
`unary_!`	Performs a logical NOT on an uncertain boolean.	`val isFailure = !isSuccess`
`&&`	Performs a logical AND between two uncertain booleans.	`val bothTrue = Uncertain.bernoulli(0.8) && Uncertain.bernoulli(0.5)`
\|\|	Performs a logical OR between two uncertain booleans.	`val atLeastOneTrue = a \|\| b`
`probability`	Tests if the probability of the uncertain boolean being true exceeds a given threshold.	`val isConfident = conversionRate.gt(0.10).probability(exceeds = 0.95)`
`isProbable`	A shorthand to check if the probability of being true is greater than 50%.	`if (isProfitable.isProbable()) { ... }`

Comparison Operations

These methods provide comparison operators for Uncertain values. They are brought in specifically by import mostly.uncertaintee.syntax.comparison.*.

Method	Description	Example Use
`===`	Compares two uncertain values for equality on a sample-by-sample basis.	`val areSame = dieRoll1 === dieRoll2`
`!==`	Compares two uncertain values for inequality.	`val areDifferent = dieRoll1 !== dieRoll2`
`>`	Performs a greater-than comparison.	`val isSpeeding = speed > 60.0`
`<`	Performs a less-than comparison.	`val isBelowFreezing = temp < 0.0`
`>=`	Performs a greater-than-or-equal-to comparison.	`val hasEnough = stock >= orderSize`
`<=`	Performs a less-than-or-equal-to comparison.	`val withinLimit = weight <= 100.0`

Functional Operations

These methods provide powerful, functional programming-style operators. They are brought in specifically by import mostly.uncertaintee.syntax.functional.*.

Method	Description	Example Use
`product`	Combines two uncertain values into an uncertain pair `Uncertain[(T, B)]`.	`val stats = height.product(weight)`
`zipWith`	Combines two uncertain values using a provided function.	`val bmi = height.zipWith(weight)(calculateBmi)`
`collect`	Filters and maps an uncertain value using a partial function, returning `Uncertain[Option[B]]`.	`val sqrtOfPositives = dist.collect { case x if x > 0 => sqrt(x) }`
`flatten`	Flattens a nested `Uncertain[Uncertain[T]]` into a single `Uncertain[T]`.	`val finalDist = chosenModel.flatten`
`mapN`	Applies a function to the results of multiple uncertain values.	`val rect = (width, height).mapN(Rectangle.apply)`

Option Operations

These methods help manage Uncertain[Option[T]] values. They are brought in specifically by import mostly.uncertaintee.syntax.option.*.

Method	Description	Example Use
`orElse`	Provides a fallback `Uncertain` value to use when a sample is `None`.	`val finalSpeed = validSpeed.orElse(fallbackModel)`
`getOrElse`	Provides a constant default value to use when a sample is `None`.	`val finalTemp = plausibleTemp.getOrElse(15.0)`

Statistical Operations

These methods are for performing statistical analysis on Uncertain values. They are brought in specifically by import mostly.uncertaintee.syntax.statistical.*.

Method	Description	Example Use
`mode`	Finds the most frequently occurring value in a set of samples.	`val mostLikelyOutcome = diceRoll.mode()`
`histogram`	Creates a map showing the frequency of each sampled value.	`val frequencies = diceRoll.histogram(1000)`
`entropy`	Estimates the information entropy (randomness) of the distribution in bits.	`val randomness = fairCoin.entropy()`
`expectedValue` / `mean`	Estimates the average value of the distribution by sampling.	`val avgSpeed = speed.mean()`
`populationStandardDeviation`	Estimates the standard deviation of the population.	`val popStdDev = distribution.populationStandardDeviation()`
`standardDeviation`	Estimates the sample standard deviation using Bessel's correction.	`val sampleStdDev = distribution.standardDeviation()`
`confidenceInterval`	Estimates an interval that contains the true value with a given confidence.	`val (low, high) = speed.confidenceInterval(0.95)`
`cdf`	Estimates the Cumulative Distribution Function (the probability that a sample is ≤ a given value).	`val prob = speed.cdf(60.0)`
`probabilityOfSuccess`	For `Uncertain[Option[T]]`, calculates the probability of it being a `Some`.	`val successRate = filteredValue.probabilityOfSuccess()`
`probabilityOfFailure`	For `Uncertain[Option[T]]`, calculates the probability of it being a `None`.	`val failureRate = filteredValue.probabilityOfFailure()`

Learn more

It originates from the research paper Uncertain<T>: A First-Order Type for Uncertain Data. (https://www.microsoft.com/en-us/research/publication/uncertaint-a-first-order-type-for-uncertain-data-2/)
You can learn more from this excellent Blog Post about the pattern: https://nshipster.com/uncertainty
- Author of which, @Mattt, has a swift implementation of the Uncertain[T] pattern here: https://github.com/mattt/Uncertain
Feel like contributing? Read the Code of Conduct first 🙏

Cats Support

The cats-support library exposes the following typeclasses for Uncertain[T]; Functor, Applicative, Monad, Monoid.

Use these instances by including the support module, and the following import:

import mostly.uncertaintee.cats.instances.given

Build Tool	Instruction
sbt	`libraryDependencies += "mostly" %% "uncertain-tee" % NOT_YET_RELEASED"`
mill	`ivy"mostly::uncertain-tee:NOT_YET_RELEASED"`
scala-cli	`//> using dep "mostly::uncertain-tee:NOT_YET_RELEASED"`

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
benchmarks/src/main/scala/mostly/scala/uncertaintee/benchmarks		benchmarks/src/main/scala/mostly/scala/uncertaintee/benchmarks
docs		docs
project		project
uncertain-tee-cats/src		uncertain-tee-cats/src
uncertain-tee-scalacheck/src/main/scala/mostly/uncertaintee/scalacheck		uncertain-tee-scalacheck/src/main/scala/mostly/uncertaintee/scalacheck
uncertain-tee-squants/src/main/scala/mostly/uncertaintee/squants		uncertain-tee-squants/src/main/scala/mostly/uncertaintee/squants
uncertain-tee/src		uncertain-tee/src
.gitignore		.gitignore
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

License

MostlyScala/uncertain-tee

Folders and files

Latest commit

History

Repository files navigation

Uncertain[T] - Working with Probabilistic Data

Installation

Quick Start

Creating Uncertain Values

Working with Uncertain Values

Making Decisions with Uncertainty

Transforming and Chaining with .map and .flatMap

Transforming values with .map

Chaining operations with .flatMap

Measurement with Error

A/B Testing

Risk Assessment

Understanding Correlation

Mixture Models

Performance Notes

Continuous vs Discrete Distributions

Testing

API Reference (‼️ out of date, library still WIP ‼️)

Uncertain[T] Methods

Constructing an Uncertain[T] instance

Operations on Uncertain

Arithmetic Operations

Boolean and Logical Operations

Comparison Operations

Functional Operations

Option Operations

Statistical Operations

Learn more

Cats Support

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Chaining operations with `.flatMap`

`Uncertain[T]` Methods

Constructing an `Uncertain[T]` instance

Packages