Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DECIMAL types to ArgumentTypeFuzzer #9373

Closed

Conversation

mbasmanova
Copy link
Contributor

@mbasmanova mbasmanova commented Apr 5, 2024

Summary:
ArgumentTypeFuzzer serves two purposes.

Given a generic function signature, generate a random valid return type.
Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function:

map(K, V) -> array(K). 

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind the type variable: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
If (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

(decimal(p, s),...) -> decimal(p, s)

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

(decimal(p, s), decimal(p, s)) -> boolean

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}.

Let’s also look at a the signature of the ‘floor’ function:

(decimal(p, s)) -> decimal(rp, 0)

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

rp  = min(38, p - s + min(s, 1))

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2).

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Differential Revision: D55772808

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 5, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

Copy link

netlify bot commented Apr 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 9fe0e14
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/661499ed3f66f50008217958

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 5, 2024
…r#9373)

Summary:

ArgumentTypeFuzzer serves two purposes.

Given a generic function signature, generate a random valid return type.
Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

  map(K, V) -> array(K). 

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind the type variable: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

   ArgumentTypeFuzzer fuzzer(signature, rng)
   auto returnType = fuzzer.fuzzReturnType()

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
If (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

 	(decimal(p, s),...) -> decimal(p, s)

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

  (decimal(p, s), decimal(p, s)) -> boolean

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

  (decimal(p, s)) -> decimal(rp, 0)

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

  rp  = min(38, p - s + min(s, 1))

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Differential Revision: D55772808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

@mbasmanova
Copy link
Contributor Author

@rui-mo Rui, I tried to extract ArgumentTypeFuzzer changes needed to support DECIMAL types in the fuzzer. If would be great if you could take a look and share your thoughts? Does this approach make sense to you? Can we do something better? Looking forward to hearing from you.

An e2e sketch is available in #9358 for reference.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

Copy link
Contributor

@bikramSingh91 bikramSingh91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just have a few nits. Thank you for the detailed summary, was really helpful!

@@ -43,6 +43,12 @@ class ReverseSignatureBinder : private SignatureBinderBase {
return typeVariablesBindings_;
}

/// Return the integer bindings produced by 'tryBind'. This function should be
/// called after 'tryBind' and only if 'tryBind' returns true.
const std::unordered_map<std::string, int>& integerBindings() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make sense to add booleans that capture this state and enforce this condition to ensure the client does not mis-use it?

This function should be called after 'tryBind' and only if 'tryBind' returns true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Good idea. Added tryBindSucceeded_.

EXPECT_EQ(22, getDecimalPrecisionScale(*returnType).first);

// Return type can only be DECIMAL(10, 7).
signature = exec::FunctionSignatureBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test case where there are multiple variables for precision or scale like decimal(p1,s1), decimal(p2,s2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Happy to add more test cases, but would you clarify a bit more of what kind of test you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check that it handles cases where the signature has more than one precision and scale integer variables like:

  auto signature = exec::FunctionSignatureBuilder()
                       .integerVariable("p1")
                       .integerVariable("s1")
                       .integerVariable("p2")
                       .integerVariable("s2")
                       .returnType("boolean")
                       .argumentType("decimal(p1,s1)")
                       .argumentType("decimal(p2,s2)")
                       .build();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 Got it. Added

  // Multiple pairs of precision and scale variables.
  signature = exec::FunctionSignatureBuilder()
                  .integerVariable("p1")
                  .integerVariable("s1")
                  .integerVariable("p2")
                  .integerVariable("s2")
                  .returnType("bigint")
                  .argumentType("decimal(p1,s1)")
                  .argumentType("decimal(p2,s2)")
                  .argumentType("decimal(p1,s1)")
                  .argumentType("decimal(p2,s2)")
                  .build();
  argTypes = fuzzArgumentTypes(*signature, BIGINT());
  ASSERT_EQ(4, argTypes.size());
  EXPECT_TRUE(argTypes[0]->isDecimal());
  EXPECT_TRUE(argTypes[1]->isDecimal());
  EXPECT_EQ(argTypes[0]->toString(), argTypes[2]->toString());
  EXPECT_EQ(argTypes[1]->toString(), argTypes[3]->toString());

}

if (isPositiveInteger(it->second.constraint())) {
const auto value = std::stoi(it->second.constraint());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to throw incase there are constraints other than positive constants? This would ensure someone does not unknowingly misuses it.
Should we also mention this limitation in the header file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bikramSingh91 I'm not sure I understand. In most cases constraints are not constant, but rather expressions: s3 = max(s1, s2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not designed to fuzz arguments if a return type is provided and there are complex constraints, it is possible that a client can misuse it by providing a return type then calling fuzzArgumentTypes where L88 would return false and eventually return std::nullopt at L94 and p and s would be randomly assigned values. The client will never know that the complex constraint might not hold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzzArgumentTypes uses ReverseSignatureBinder to bind type and integer variables using return type. ReverseSignatureBinder::tryBind returns false if the signature has constraints.

    exec::ReverseSignatureBinder binder{signature_, returnType_};
    if (!binder.tryBind()) {
      return false;
    }


bool ReverseSignatureBinder::tryBind() {
  if (hasConstrainedIntegerVariable(signature_.returnType())) {
    return false;
  }
  return SignatureBinderBase::tryBind(signature_.returnType(), returnType_);
}

returnType_ = randType();
return returnType_;
} else {
returnType_ = exec::SignatureBinder::tryResolveType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, does tryResolveType apply the integer constraints when it generates the return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Masha, thanks! This approach avoids hard-coding, and is better from my view. Just added several minor comments.

auto tryFixedBinding = [&](const auto& name) -> std::optional<int> {
auto it = variables().find(name);
if (it == variables().end()) {
return std::stoi(name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check isPositiveInteger for name to make sure it is integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added check:

      VELOX_CHECK(
          isPositiveInteger(name),
          "Precision and scale of a decimal type must refer to a variable "
          "or specify a position integer constant: {}",
          name)

}

if (s.has_value()) {
p = std::max(1, s.value());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 1 could be replaced with kMinPrecision.

.integerVariable("p")
.integerVariable("s")
.returnType("decimal(p,s)")
.argumentType("decimal(p,s)")
Copy link
Collaborator

@rui-mo rui-mo Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add test case for varargs like the signature used by greatest and least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added

  signature = exec::FunctionSignatureBuilder()
                  .integerVariable("p")
                  .integerVariable("s")
                  .returnType("decimal(p,s)")
                  .argumentType("decimal(p,s)")
                  .argumentType("decimal(p,s)")
                  .variableArity()
                  .build();
  argTypes = fuzzArgumentTypes(*signature, DECIMAL(10, 7));
  ASSERT_LE(1, argTypes.size());
  for (const auto& argType : argTypes) {
    EXPECT_EQ(DECIMAL(10, 7)->toString(), argType->toString());
  }

@@ -68,26 +156,32 @@ bool ArgumentTypeFuzzer::fuzzArgumentTypes(uint32_t maxVariadicArgs) {
const auto& formalArgs = signature_.argumentTypes();
auto formalArgsCnt = formalArgs.size();

std::unordered_map<std::string, int> integerBindings;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this variable is not in use.

@@ -69,6 +69,8 @@ class ArgumentTypeFuzzer {
/// randomly generated type.
void determineUnboundedTypeVariables();

void determineUnboundedIntegerVariables(const exec::TypeSignature& type);
Copy link
Collaborator

@rui-mo rui-mo Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it seems only decimal type is supported. Do we need to document this in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added comments:

  // Bind integer variables used in specified decimal(p,s) type signature to
  // randomly generated values if not already bound.
  // Noop if 'type' is not a decimal type signature.
  void determineUnboundedIntegerVariables(const exec::TypeSignature& type);

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 8, 2024
…r#9373)

Summary:

ArgumentTypeFuzzer serves two purposes.

Given a generic function signature, generate a random valid return type.
Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

```
  map(K, V) -> array(K). 
```
This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind the type variable: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
If (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

@mbasmanova
Copy link
Contributor Author

@rui-mo @bikramSingh91 Rui, Bikram, thank you for reviewing. I updated the code to address your comments. Would you like to take another look?

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 8, 2024
…r#9373)

Summary:

ArgumentTypeFuzzer serves two purposes.

1. Given a generic function signature, generate a random valid return type.
2. Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

```
  map(K, V) -> array(K)
```

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
if (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature.

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 8, 2024
…r#9373)

Summary:

ArgumentTypeFuzzer serves two purposes.

1. Given a generic function signature, generate a random valid return type.
2. Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

```
  map(K, V) -> array(K)
```

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
if (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature.

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Apr 9, 2024
…r#9373)

Summary:

ArgumentTypeFuzzer serves two purposes.

1. Given a generic function signature, generate a random valid return type.
2. Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

```
  map(K, V) -> array(K)
```

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
if (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature.

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

…r#9373)

Summary:

ArgumentTypeFuzzer serves two purposes.

1. Given a generic function signature, generate a random valid return type.
2. Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function: 

```
  map(K, V) -> array(K)
```

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
if (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature.

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}. 

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2). 

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55772808

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masha, thanks. Just added two nits.

}

/// Returns true only if 'str' contains digits.
bool isPositiveInteger(const std::string& str) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are several functions defined as isPositiveInteger and they are the same. Do we consider extracting it out for reuse? If yes, I'm glad to work on this change in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Rui, indeed. I think we now have 2 places where we copy-paste isPositiveInteger. It would be nice to refactor. Thank you for helping with that.

/// Return whether there is a constraint on an integer variable in type
/// signature.
// Return whether there is a constraint on an integer variable in type
// signature.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is still needed.

ArgumentTypeFuzzer::fuzzArgumentTypes uses ReverseSignatureBinder to bind variables based on return type. This works well for functions like least and greatest where return type re-uses argument types as is. However, this doesn't work when return type is computed from argument types using formulas. In this case, we want ArgumentTypeFuzzer::fuzzArgumentTypes to "fail" to fuzz arguments and fallback to custom argument type generators.

Does this make sense? Do you have something else in mind?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not put it clearly. I was trying to say the change from /// to // in the comment. Thanks for the explain!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Sorry, I misunderstood. /// should be used for public APIs. // for private methods. This could be a separate change though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I understand this change now.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b719046.

Copy link

Conbench analyzed the 1 benchmark run on commit b719046d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…r#9373)

Summary:
Pull Request resolved: facebookincubator#9373

ArgumentTypeFuzzer serves two purposes.

1. Given a generic function signature, generate a random valid return type.
2. Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function:

```
  map(K, V) -> array(K)
```

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
if (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature.

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}.

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2).

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808

fbshipit-source-id: 708f202fc7270aaeaa59e28175aea5147b4a7981
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants