Skip to content

Commit

Permalink
Add support for DECIMAL types to ArgumentTypeFuzzer
Browse files Browse the repository at this point in the history
Summary:
ArgumentTypeFuzzer serves two purposes.

Given a generic function signature, generate a random valid return type.
Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

  map(K, V) -> array(K). 

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind the type variable: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

   ArgumentTypeFuzzer fuzzer(signature, rng)
   auto returnType = fuzzer.fuzzReturnType()

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
If (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

 	(decimal(p, s),...) -> decimal(p, s)

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

  (decimal(p, s), decimal(p, s)) -> boolean

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

  (decimal(p, s)) -> decimal(rp, 0)

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

  rp  = min(38, p - s + min(s, 1))

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Differential Revision: D55772808
  • Loading branch information
mbasmanova authored and facebook-github-bot committed Apr 5, 2024
1 parent 776ab24 commit 31eeb1f
Show file tree
Hide file tree
Showing 4 changed files with 315 additions and 6 deletions.
6 changes: 6 additions & 0 deletions velox/expression/ReverseSignatureBinder.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,12 @@ class ReverseSignatureBinder : private SignatureBinderBase {
return typeVariablesBindings_;
}

/// Return the integer bindings produced by 'tryBind'. This function should be
/// called after 'tryBind' and only if 'tryBind' returns true.
const std::unordered_map<std::string, int>& integerBindings() const {
return integerVariablesBindings_;
}

private:
/// Return whether there is a constraint on an integer variable in type
/// signature.
Expand Down
201 changes: 201 additions & 0 deletions velox/expression/tests/ArgumentTypeFuzzerTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -403,4 +403,205 @@ TEST_F(ArgumentTypeFuzzerTest, orderableConstraint) {
}
}

TEST_F(ArgumentTypeFuzzerTest, fuzzDecimalArgumentTypes) {
auto fuzzArgumentTypes = [](const exec::FunctionSignature& signature,
const TypePtr& returnType) {
std::mt19937 seed{0};
ArgumentTypeFuzzer fuzzer{signature, returnType, seed};
bool ok = fuzzer.fuzzArgumentTypes(kMaxVariadicArgs);
VELOX_CHECK(
ok,
"Signature: {}, Return type: {}",
signature.toString(),
returnType->toString());
return fuzzer.argumentTypes();
};

// Argument type must match return type.
auto signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
.build();

auto argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
ASSERT_EQ(1, argTypes.size());
EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString());

// Argument type can be any decimal.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("boolean")
.argumentType("decimal(p,s)")
.build();

argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
ASSERT_EQ(1, argTypes.size());
EXPECT_TRUE(argTypes[0]->isDecimal());

// Argument type can be any decimal with scale 30.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.returnType("boolean")
.argumentType("decimal(p,30)")
.build();

argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
ASSERT_EQ(1, argTypes.size());
EXPECT_TRUE(argTypes[0]->isDecimal());
EXPECT_EQ(30, getDecimalPrecisionScale(*argTypes[0]).second);

// Another way to specify fixed scale.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s", "3")
.returnType("boolean")
.argumentType("decimal(p,s)")
.build();

argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
ASSERT_EQ(1, argTypes.size());
EXPECT_TRUE(argTypes[0]->isDecimal());
EXPECT_EQ(3, getDecimalPrecisionScale(*argTypes[0]).second);

// Argument type can be any decimal with precision 3.
signature = exec::FunctionSignatureBuilder()
.integerVariable("s")
.returnType("boolean")
.argumentType("decimal(3,s)")
.build();

argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
ASSERT_EQ(1, argTypes.size());
EXPECT_TRUE(argTypes[0]->isDecimal());
EXPECT_EQ(3, getDecimalPrecisionScale(*argTypes[0]).first);

// Another way to specify fixed precision.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p", "30")
.integerVariable("s")
.returnType("boolean")
.argumentType("decimal(p,s)")
.build();

argTypes = fuzzArgumentTypes(*signature, BOOLEAN());
ASSERT_EQ(1, argTypes.size());
EXPECT_TRUE(argTypes[0]->isDecimal());
EXPECT_EQ(30, getDecimalPrecisionScale(*argTypes[0]).first);

// Multiple arguments. All must be the same as return type.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
.argumentType("decimal(p,s)")
.argumentType("decimal(p,s)")
.build();

argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
ASSERT_EQ(3, argTypes.size());
EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString());
EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[1]->toString());
EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[2]->toString());

// Multiple arguments. Some have fixed precision, scale or both.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
.argumentType("decimal(p,10)")
.argumentType("decimal(12,s)")
.argumentType("decimal(2,1)")
.build();

argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
ASSERT_EQ(4, argTypes.size());
EXPECT_EQ(DECIMAL(10, 7)->toString(), argTypes[0]->toString());
EXPECT_EQ(DECIMAL(10, 10)->toString(), argTypes[1]->toString());
EXPECT_EQ(DECIMAL(12, 7)->toString(), argTypes[2]->toString());
EXPECT_EQ(DECIMAL(2, 1)->toString(), argTypes[3]->toString());
}

TEST_F(ArgumentTypeFuzzerTest, fuzzDecimalReturnType) {
auto fuzzReturnType = [](const exec::FunctionSignature& signature) {
std::mt19937 seed{0};
ArgumentTypeFuzzer fuzzer{signature, seed};
return fuzzer.fuzzReturnType();
};

// Return type can be any decimal.
auto signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
.build();

auto returnType = fuzzReturnType(*signature);
EXPECT_TRUE(returnType->isDecimal());

// Return type can be any decimal with scale 3.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(p,3)")
.argumentType("decimal(p,s)")
.build();

returnType = fuzzReturnType(*signature);
EXPECT_TRUE(returnType->isDecimal());
EXPECT_EQ(3, getDecimalPrecisionScale(*returnType).second);

// Another way to specify that scale must be 3.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s", "3")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
.build();

returnType = fuzzReturnType(*signature);
EXPECT_TRUE(returnType->isDecimal());
EXPECT_EQ(3, getDecimalPrecisionScale(*returnType).second);

// Return type can be any decimal with precision 22.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(22,s)")
.argumentType("decimal(p,s)")
.build();

returnType = fuzzReturnType(*signature);
EXPECT_TRUE(returnType->isDecimal());
EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first);

// Another way to specify that precision must be 22.
signature = exec::FunctionSignatureBuilder()
.integerVariable("p", "22")
.integerVariable("s")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
.build();

returnType = fuzzReturnType(*signature);
EXPECT_TRUE(returnType->isDecimal());
EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first);

// Return type can only be DECIMAL(10, 7).
signature = exec::FunctionSignatureBuilder()
.integerVariable("p")
.integerVariable("s")
.returnType("decimal(10,7)")
.argumentType("decimal(p,s)")
.build();

returnType = fuzzReturnType(*signature);
EXPECT_EQ(DECIMAL(10, 7)->toString(), returnType->toString());
}

} // namespace facebook::velox::test
110 changes: 104 additions & 6 deletions velox/expression/tests/utils/ArgumentTypeFuzzer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@
namespace facebook::velox::test {

std::string typeToBaseName(const TypePtr& type) {
if (type->isDecimal()) {
return "decimal";
}
return boost::algorithm::to_lower_copy(std::string{type->kindName()});
}

Expand All @@ -35,6 +38,91 @@ std::optional<TypeKind> baseNameToTypeKind(const std::string& typeName) {
return tryMapNameToTypeKind(kindName);
}

namespace {

bool isDecimalBaseName(const std::string& typeName) {
auto normalized = boost::algorithm::to_lower_copy(typeName);

return normalized == "decimal";
}

/// Returns true only if 'str' contains digits.
bool isPositiveInteger(const std::string& str) {
return !str.empty() &&
std::find_if(str.begin(), str.end(), [](unsigned char c) {
return !std::isdigit(c);
}) == str.end();
}

int32_t rand(FuzzerGenerator& rng) {
return boost::random::uniform_int_distribution<int32_t>()(rng);
}
} // namespace

void ArgumentTypeFuzzer::determineUnboundedIntegerVariables(
const exec::TypeSignature& type) {
if (!isDecimalBaseName(type.baseName())) {
return;
}

VELOX_CHECK_EQ(2, type.parameters().size())

const auto& precision = type.parameters()[0].baseName();
const auto& scale = type.parameters()[1].baseName();

// Bind 'name' variable, if not already bound, using 'constant' constraint
// ('name'='123'). Return bound value if 'name' is already bound or was
// successfully bound to a constant value. Return std::nullopt otherwise.
auto tryFixedBinding = [&](const auto& name) -> std::optional<int> {
auto it = variables().find(name);
if (it == variables().end()) {
return std::stoi(name);
}

if (integerBindings_.count(name) > 0) {
return integerBindings_[name];
}

if (isPositiveInteger(it->second.constraint())) {
const auto value = std::stoi(it->second.constraint());
integerBindings_[name] = value;
return value;
}

return std::nullopt;
};

std::optional<int> p = tryFixedBinding(precision);
std::optional<int> s = tryFixedBinding(scale);

if (p.has_value() && s.has_value()) {
return;
}

if (s.has_value()) {
p = std::max(1, s.value());
if (p < LongDecimalType::kMaxPrecision) {
p = p.value() +
rand(rng_) % (LongDecimalType::kMaxPrecision - p.value() + 1);
}

integerBindings_[precision] = p.value();
return;
}

if (p.has_value()) {
s = rand(rng_) % (p.value() + 1);
integerBindings_[scale] = s.value();
return;
}

p = 1 + rand(rng_) % (LongDecimalType::kMaxPrecision);
s = rand(rng_) % (p.value() + 1);

integerBindings_[precision] = p.value();
integerBindings_[scale] = s.value();
}

void ArgumentTypeFuzzer::determineUnboundedTypeVariables() {
for (auto& [variableName, variableInfo] : variables()) {
if (!variableInfo.isTypeParameter()) {
Expand Down Expand Up @@ -68,26 +156,32 @@ bool ArgumentTypeFuzzer::fuzzArgumentTypes(uint32_t maxVariadicArgs) {
const auto& formalArgs = signature_.argumentTypes();
auto formalArgsCnt = formalArgs.size();

std::unordered_map<std::string, int> integerBindings;

if (returnType_) {
exec::ReverseSignatureBinder binder{signature_, returnType_};
if (!binder.tryBind()) {
return false;
}
bindings_ = binder.bindings();
integerBindings_ = binder.integerBindings();
} else {
for (const auto& [name, _] : signature_.variables()) {
bindings_.insert({name, nullptr});
}
}

determineUnboundedTypeVariables();
for (const auto& argType : signature_.argumentTypes()) {
determineUnboundedIntegerVariables(argType);
}
for (auto i = 0; i < formalArgsCnt; i++) {
TypePtr actualArg;
if (formalArgs[i].baseName() == "any") {
actualArg = randType();
} else {
actualArg = exec::SignatureBinder::tryResolveType(
formalArgs[i], variables(), bindings_);
formalArgs[i], variables(), bindings_, integerBindings_);
VELOX_CHECK(actualArg != nullptr);
}
argumentTypes_.push_back(actualArg);
Expand All @@ -114,15 +208,19 @@ TypePtr ArgumentTypeFuzzer::fuzzReturnType() {
"Only fuzzing uninitialized return type is allowed.");

determineUnboundedTypeVariables();
if (signature_.returnType().baseName() == "any") {
determineUnboundedIntegerVariables(signature_.returnType());

const auto& returnType = signature_.returnType();

if (returnType.baseName() == "any") {
returnType_ = randType();
return returnType_;
} else {
returnType_ = exec::SignatureBinder::tryResolveType(
signature_.returnType(), variables(), bindings_);
VELOX_CHECK_NE(returnType_, nullptr);
return returnType_;
returnType, variables(), bindings_, integerBindings_);
}

VELOX_CHECK_NOT_NULL(returnType_);
return returnType_;
}

} // namespace facebook::velox::test
4 changes: 4 additions & 0 deletions velox/expression/tests/utils/ArgumentTypeFuzzer.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ class ArgumentTypeFuzzer {
/// randomly generated type.
void determineUnboundedTypeVariables();

void determineUnboundedIntegerVariables(const exec::TypeSignature& type);

TypePtr randType();

/// Generates an orderable random type, including structs, and arrays.
Expand All @@ -83,6 +85,8 @@ class ArgumentTypeFuzzer {
/// Bindings between type variables and their actual types.
std::unordered_map<std::string, TypePtr> bindings_;

std::unordered_map<std::string, int> integerBindings_;

/// RNG to generate random types for unbounded type variables when necessary.
std::mt19937& rng_;
};
Expand Down

0 comments on commit 31eeb1f

Please sign in to comment.