Description
This is:
- [ ] a bug report
- [X] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)
Preparation of Excel function implementations allowing them to be used in "array formulae".
Typically in MS Excel, arguments passed to a function are simple "scalar" values, e.g.
=ABS(-3)
But MS Excel does allow for that argument passed to a function also to be a matrix/array of values:
=ABS({-3, 2.5, -1; 0, -1, -2})
This is still considered a single argument, but the argument is a matrix rather than a simple "scalar". In this case, MS Excel will return a matrix with the same dimensions, where the function has been applied to each individual "scalar" value in that matrix. In the above case, this will result in a response of
{3, 2.5, 1; 0, 1, 2}
That result may then be processed by other Excel functions in the formula (e.g. MAX()
) to reduce it to a single value; or it may be displayed across multiple cells.
Currently, if we write a cell with the formula =MAX(ABS({-3, 2.5, -1; 0, -1, -12}))
, the Calculation Engine will make the call to ABS()
with the array of arguments, but will discard all but the first value before determining the absolute of that value, which gives a result of 3
(the absolute value of the first array argument of -3
); so the subsequent call to MAX()
will only receive the value 3
, giving a final (incorrect) result of 3
.
Evaluation Log:
Array
(
[0] => Testing cache value for cell Formula Test!A1
[1] => Evaluating formula for cell Formula Test!A1
[2] => Formula for cell Formula Test!A1 is MAX(ABS({-3, 2.5, -1; 0, -1, -12}))
[3] => Formula Test!A1 => Evaluating Negation of 3
[4] => Formula Test!A1 => Evaluation Result is an integer number with a value of -3
[5] => Formula Test!A1 => Evaluating Negation of 1
[6] => Formula Test!A1 => Evaluation Result is an integer number with a value of -1
[7] => Formula Test!A1 => Evaluating Negation of 1
[8] => Formula Test!A1 => Evaluation Result is an integer number with a value of -1
[9] => Formula Test!A1 => Evaluating Negation of 12
[10] => Formula Test!A1 => Evaluation Result is an integer number with a value of -12
[11] => Formula Test!A1 => Evaluating Function ABS() with 1 argument
[12] => Formula Test!A1 => Evaluating ABS( { -3, 2.5, -1; 0, -1, -12 } )
[13] => Formula Test!A1 => Evaluation Result for ABS() function call is an integer number with a value of 3
[14] => Formula Test!A1 => Evaluating Function MAX() with 1 argument
[15] => Formula Test!A1 => Evaluating MAX( 3 )
[16] => Formula Test!A1 => Evaluation Result for MAX() function call is an integer number with a value of 3
)
With these changes, ABS()
evaluates every entry in the array, and returns an array of absolute values that is then passed to MAX()
, so the call to max has the full set of values (3, 2., 1, 1, 12) and can identify the correct maximum value of 12
.
Evaluation Log:
Array
(
[0] => Testing cache value for cell Formula Test!A1
[1] => Evaluating formula for cell Formula Test!A1
[2] => Formula for cell Formula Test!A1 is MAX(ABS({-3, 2.5, -1; 0, -1, -12}))
[3] => Formula Test!A1 => Evaluating Negation of 3
[4] => Formula Test!A1 => Evaluation Result is an integer number with a value of -3
[5] => Formula Test!A1 => Evaluating Negation of 1
[6] => Formula Test!A1 => Evaluation Result is an integer number with a value of -1
[7] => Formula Test!A1 => Evaluating Negation of 1
[8] => Formula Test!A1 => Evaluation Result is an integer number with a value of -1
[9] => Formula Test!A1 => Evaluating Negation of 12
[10] => Formula Test!A1 => Evaluation Result is an integer number with a value of -12
[11] => Formula Test!A1 => Evaluating Function ABS() with 1 argument
[12] => Formula Test!A1 => Evaluating ABS( { -3, 2.5, -1; 0, -1, -12 } )
[13] => Formula Test!A1 => Evaluation Result for ABS() function call is a matrix with a value of { 3, 2.5, 1; 0, 1, 12 }
[14] => Formula Test!A1 => Evaluating Function MAX() with 1 argument
[15] => Formula Test!A1 => Evaluating MAX( { 3, 2.5, 1; 0, 1, 12 } )
[16] => Formula Test!A1 => Evaluation Result for MAX() function call is an integer number with a value of 12
)
This series of changes will ensure that arrays are returned by function calls when appropriate (rather than single values), and are then passed correctly through the Calculation Engine call stack as arrays rather than single scalar values. In the case of our formula MAX(ABS({-3, 2.5, -1; 0, -1, -12}))
, it will allow a correct result of 12
rather than an incorrect result of 3
to be returned.
It will not change the way that a cell handles the result of a matrix being returned to the getCalculatedValue()
call; it will still be reduced to the value of the first entry in that matrix. Handling for that change is ongoing as part of PR #2539, which includes a BC break, meaning that it will only be released with PhpSpreadsheet 2.0.0.
However, in addition to resolving returns from the Excel function implementations so that values aren't lost in a chain of calls, this is also preparatory work for that final PR #2539 handling of array results. This change ensures that a call passing an array to an Excel function results in the return of an array from that function so that the final cell-level handling in PR #2539 receives an array when it should.
Example of an existing function (The Math/Trig ABS() function)
This is relatively straightforward to implement for functions that only accept a single argument: the following shows the changes required for a typical function.
Current implementation:
class Absolute
{
/**
* ABS.
*
* Returns the result of builtin function abs after validating args.
*
* @param mixed $number Should be numeric,
*
* @return float|int|string Rounded number
*/
public static function evaluate($number)
{
try {
$number = Helpers::validateNumericNullBool($number);
} catch (Exception $e) {
return $e->getMessage();
}
return abs($number);
}
}
Updated (array function enabled implementation)
class Absolute
{
use ArrayEnabled;
/**
* ABS.
*
* Returns the result of builtin function abs after validating args.
*
* @param mixed $number Should be numeric, or can be an array of numbers
*
* @return array|float|int|string rounded number
* If an array of numbers is passed as the argument, then the returned result will also be an array
* with the same dimensions
*/
public static function evaluate($number)
{
if (is_array($number)) {
return self::evaluateSingleArgumentArray([self::class, __FUNCTION__], $number);
}
try {
$number = Helpers::validateNumericNullBool($number);
} catch (Exception $e) {
return $e->getMessage();
}
return abs($number);
}
}
Changes to the unit tests to verify correct behaviour when functions are enabled for use in array formulae
/**
* @dataProvider providerAbsArray
*/
public function testAbsoluteArray(array $expectedResult, string $array): void
{
$calculation = Calculation::getInstance();
$formula = "=ABS({$array})";
$result = $calculation->_calculateFormulaValue($formula);
self::assertEqualsWithDelta($expectedResult, $result, 1.0e-14);
}
public function providerAbsArray(): array
{
return [
'row vector' => [[[1, 0, 1]], '{-1, 0, 1}'],
'column vector' => [[[1], [0], [1]], '{-1; 0; 1}'],
'matrix' => [[[1, 0], [1, 12]], '{-1, 0; 1, -12}'],
];
}
Example of an existing function that accepts multiple arguments (The Math/Trig ATAN2() function)... (comments deleted to reduce code noise)
Current implementation:
class Tangent
{
/**
* ATAN2.
*
* Excel Function:
* ATAN2(xCoordinate,yCoordinate)
*
* @param mixed $xCoordinate should be float, the x-coordinate of the point
* @param mixed $yCoordinate should be float, the y-coordinate of the point
*
* @return float|string the inverse tangent of the specified x- and y-coordinates, or a string containing an error
*/
public static function atan2($xCoordinate, $yCoordinate)
{
try {
$xCoordinate = Helpers::validateNumericNullBool($xCoordinate);
$yCoordinate = Helpers::validateNumericNullBool($yCoordinate);
} catch (Exception $e) {
return $e->getMessage();
}
if (($xCoordinate == 0) && ($yCoordinate == 0)) {
return Functions::DIV0();
}
return atan2($yCoordinate, $xCoordinate);
}
}
Updated (array function enabled implementation)
class Tangent
{
use ArrayEnabled;
/**
* ATAN2.
*
* Excel Function:
* ATAN2(xCoordinate,yCoordinate)
*
* @param mixed $xCoordinate should be float, the x-coordinate of the point, or can be an array of numbers
* @param mixed $yCoordinate should be float, the y-coordinate of the point, or can be an array of numbers
*
* @return array|float|string
* The inverse tangent of the specified x- and y-coordinates, or a string containing an error
* If an array of numbers is passed as one of the arguments, then the returned result will also be an array
* with the same dimensions
*/
public static function atan2($xCoordinate, $yCoordinate)
{
if (is_array($xCoordinate) || is_array($yCoordinate)) {
return self::evaluateArrayArguments([self::class, __FUNCTION__], $xCoordinate, $yCoordinate);
}
try {
$xCoordinate = Helpers::validateNumericNullBool($xCoordinate);
$yCoordinate = Helpers::validateNumericNullBool($yCoordinate);
} catch (Exception $e) {
return $e->getMessage();
}
if (($xCoordinate == 0) && ($yCoordinate == 0)) {
return Functions::DIV0();
}
return atan2($yCoordinate, $xCoordinate);
}
}
Changes to the unit tests to verify correct behaviour when functions are enabled for use in array formulae
/**
* @dataProvider providerAtan2Array
*/
public function testAtan2Array(array $expectedResult, string $argument1, string $argument2): void
{
$calculation = Calculation::getInstance();
$formula = "=ATAN2({$argument1},{$argument2})";
$result = $calculation->_calculateFormulaValue($formula);
self::assertEqualsWithDelta($expectedResult, $result, 1.0e-14);
}
public function providerAtan2Array(): array
{
return [
'first argument row vector' => [
[[1.81577498992176, 1.17600520709514]],
'{-0.75, 1.25}',
'3',
],
'first argument column vector' => [
[[1.17600520709514], [0.98279372324733]],
'{1.25; 2}',
'3',
],
'first argument matrix' => [
[[2.03444393579570, 1.48765509490646], [1.57079632679490, 1.24904577239825]],
'{-1.5, 0.25; 0, 1}',
'3',
],
'second argument row vector' => [
[[-0.24497866312686, 0.39479111969976]],
'3',
'{-0.75, 1.25}',
],
'second argument column vector' => [
[[0.39479111969976], [0.58800260354757]],
'3',
'{1.25; 2}',
],
'second argument matrix' => [
[[-0.46364760900081, 0.08314123188844], [0.0, 0.32175055439664]],
'3',
'{-1.5, 0.25; 0, 1}',
],
'A row and a column vector' => [
[
[-2.21429743558818, 2.81984209919315, 2.55359005004223, 1.92956699706547],
[-1.69515132134166, 2.03444393579570, 1.81577498992176, 1.63321513679085],
[-1.01219701145133, 0.38050637711237, 0.67474094222355, 1.26791145841993],
[-0.51914611424652, 0.14189705460416, 0.27829965900511, 0.85196632717327],
],
'{-1.5; -0.25; 1.25; 3.5}',
'{-2, 0.5, 1, 4}',
],
'Two row vectors' => [
[[-2.21429743558818, 2.03444393579570, 0.67474094222355, 0.85196632717327]],
'{-1.5, -0.25, 1.25, 3.5}',
'{-2, 0.5, 1, 4}',
],
'Two column vectors' => [
[[-2.21429743558818], [2.03444393579570], [0.67474094222355], [0.85196632717327]],
'{-1.5; -0.25; 1.25; 3.5}',
'{-2; 0.5; 1; 4}',
],
];
}
The function test for ROUND()
contains a lot more variations on different combinations and sizes of arrays.
Functions that accept arguments using the splat operator, or more than 2 arguments, need to be assessed on a case by case basis. However, for the example of WORKDAYS()
, which accepts two "static" arguments, then uses the splat operator to accept additional arguments, the evaluateArrayArgumentsSubset()
method allows the call to indicate the static arguments that could be arrays, but to process the array of trailing arguments accepted by the method "normally":
public static function date($startDate, $endDays, ...$dateArgs)
{
if (is_array($startDate) || is_array($endDays)) {
return self::evaluateArrayArgumentsSubset(
[self::class, __FUNCTION__],
2,
$startDate,
$endDays,
...$dateArgs
);
}
...
}
We only check to see if the "static" argument values are arrays; and we call evaluateArrayArgumentsSubset()
with the additional limit
argument, that tells the code logic that only the first two arguments should be processed for the purposes of array testing.
Why is this required?
Besides fixing some basic array formulae when passed through the call stack in the Calculation engine (as described above):
One of the planned new features for PhpSpreadsheet 2.0 is support for array formulae, including the new array functions like SEQUENCE()
, SORT()
, FILTER()
, etc; and also the new Spill
and Single
operators.
While the PhpSpreadsheet operators already support Excel matrix/array handling, most of the function implementations don't yet; so this is preparation work in anticipation of providing full support for array formulae in version 2.0.
This is linked to the work ongoing in PR #2539 on branch CalculationEngine-Array-Formulae-Initial-Work
These changes to the function implementations can be done ahead in the current codebase ahead of the 2.0 release. The existing Calculation Engine will simply discard all but the very first scalar value from any matrix of values returned by an Excel function: that is already existing behaviour; but it may provide correct results when using arrays as arguments when the current implementation does not (as in the case of our =MAX(ABS({-3, 2.5, -1; 0, -1, -12}))
example).
Examples of Excel array functions, and the proposed support/implementation can be found in the documentation