Skip to content

jas88/TypeGuesser

 
 

Repository files navigation

Type Guesser

Build, test and package Coverage Status NuGet Badge

Guess the C# Types for untyped strings e.g. "12.123".

What's New in v2.0

TypeGuesser v2.0 delivers significant performance improvements and thread-safety enhancements:

  • Thread-Safe Operations: Safe concurrent usage with internal locking
  • Zero-Allocation Performance: Up to 30x faster for hard-typed values using Math.Log10 and SqlDecimal
  • Object Pooling: Reduced GC pressure through automatic builder pooling
  • Advanced API: New StackTypeAccumulator for ultra-high-performance scenarios (10-50x faster)
  • Full Backward Compatibility: Existing code works without changes

Performance Improvements

Operation v1.x v2.0 Speedup
Hard-typed integers (1M) 850ms 45ms 18.9x
Hard-typed decimals (1M) 1,200ms 120ms 10.0x
Zero allocations for typed values -

See MIGRATION-V2.md for upgrade guide and docs/ for detailed documentation.

Quick Start

Basic Usage

var guesser = new Guesser();
guesser.AdjustToCompensateForValue("-12.211");
var guess = guesser.Guess;

The resulting guess in this case would be:

Property Value
guess.CSharpType typeof(decimal)
guess.Size.NumbersBeforeDecimalPlace 2
guess.Size.NumbersAfterDecimalPlace 3
guess.Width 7

Guesser also handles adjusting its guess based on multiple input strings e.g.

var guesser = new Guesser();
guesser.AdjustToCompensateForValue("1,000");
guesser.AdjustToCompensateForValue("0.001");
var guess = guesser.Guess;
Property Value
guess.CSharpType typeof(decimal)
guess.Size.NumbersBeforeDecimalPlace 4
guess.Size.NumbersAfterDecimalPlace 3
guess.Width 8

v2.0: Optimized for Typed Values

Pass hard-typed values directly for zero-allocation processing:

var guesser = new Guesser();
int[] numbers = { 1, 42, 999, -5 };

// v2.0: Zero allocations when passing typed values
foreach (var num in numbers)
{
    guesser.AdjustToCompensateForValue(num); // 18.9x faster than v1.x!
}

var guess = guesser.Guess;
// guess.CSharpType => typeof(int)
// guess.Size.NumbersBeforeDecimalPlace => 3

Advanced: Maximum Performance

For ultra-high-performance scenarios, use StackTypeAccumulator:

using TypeGuesser.Advanced;

var factory = new TypeDeciderFactory(CultureInfo.InvariantCulture);
var data = new int[] { 1, 99, 1000, -42 };

var accumulator = new StackTypeAccumulator(factory);
foreach (var value in data)
{
    accumulator.Add(value); // 30x faster than v1.x, zero heap allocations!
}

var result = accumulator.GetResult();
// result.CSharpType => typeof(int)
// result.Size.NumbersBeforeDecimalPlace => 4

Parsing Values

Once you have guessed a Type for all your strings you can convert all your values to the hard type:

var someStrings = new []{"13:11:59", "9AM"};
var guesser = new Guesser();
guesser.AdjustToCompensateForValues(someStrings);

var parsed = someStrings.Select(guesser.Parse).ToArray();

Assert.AreEqual(new TimeSpan(13, 11, 59), parsed[0]);
Assert.AreEqual(new TimeSpan(9, 0, 0), parsed[1]);

Thread-Safety

v2.0 is thread-safe by default:

var guesser = new Guesser();
var data = GetLargeDataset();

// Safe to process concurrently - internal locking handles synchronization
Parallel.ForEach(data, value =>
{
    guesser.AdjustToCompensateForValue(value);
});

var guess = guesser.Guess;

Type Guessing Details

Guess Order

The order in which Types are tried is (DatabaseTypeRequest.PreferenceOrder):

  • Bool
  • Int
  • Decimal
  • TimeSpan
  • DateTime
  • String

If a string has been accepted as one category e.g. "12" (Int) and an incompatible string arrived e.g. "0.1" then the Guess is changes to either the new Type (Decimal) or to String (i.e. untyped) based on whether the old and new Types are in the same TypeCompatibilityGroup

For example Bool and DateTime are incompatible

"Y" => Bool
"2001-01-01" => DateTime

Guess: String

Guesses are never revised back up again (once you accept a Decimal you never get Int again but you might end up at String)

Zero Prefixes

If an input string is a number that starts with zero e.g. "01" then the estimate will be changed to System.String. This is intended behaviour since some codes e.g. CHI / Barcodes have valid zero prefixes. If this is to be accurately preserved in the database then it must be stored as string (See TestGuesser_PrecedingZeroes). This also applies to values such as "-01"

Whitespace

Leading and trailing whitespace is ignored for the purposes of determining Type. E.g. " 0.1" is a valid System.Decimal. However it is recorded for the maximum Length required if we later fallback to System.String (See Test TestGuesser_Whitespace).

Strong Typed Objects

Guesser.AdjustToCompensateForValue takes a System.Object. If you are passing objects that are not System.String e.g. from a DataColumn that has an actual Type on it (e.g. System.Float) then Guesser will set the Guess.CSharpType to the provided object Type. It will still calculate the Guess.Size properties if appropriate (See test TestGuesser_HardTypeFloats).

The first time you pass a typed object (excluding DBNull.Value) then it will assume the entire input stream is strongly typed (See IsPrimedWithBonafideType). Any attempts to pass in different object Types in future (or if strings were previously passed in before) will result in a MixedTypingException.

Documentation

API Layers

TypeGuesser v2.0 provides three API layers for different use cases:

Layer 1: Compatible API (Guesser)

Simple, thread-safe, works everywhere. Use for string-based processing.

var guesser = new Guesser();
guesser.AdjustToCompensateForValue("12.45");

Layer 2: Automatic Optimization

Pass typed values for automatic zero-allocation processing.

var guesser = new Guesser();
guesser.AdjustToCompensateForValue(42); // Auto-optimized!

Layer 3: Advanced API (StackTypeAccumulator)

Maximum performance for specialized scenarios. Stack-only, zero heap allocations.

var factory = new TypeDeciderFactory(CultureInfo.InvariantCulture);
var accumulator = new StackTypeAccumulator(factory);
accumulator.Add(42); // Ultra-fast!

See API Layers Reference for complete details.

Performance Comparison

Processing 1 million integer values:

Approach Time Memory GC Collections
v1.x (strings) 850ms 76 MB 145/12/1
v2.0 Layer 2 (typed) 45ms 0 bytes 2/0/0
v2.0 Layer 3 (stack) 28ms 0 bytes 0/0/0

Up to 30x faster with zero heap allocations!

About

Guesses C# data types for untyped string inputs

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C# 100.0%