Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flatten function #112

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add flatten function #112

wants to merge 1 commit into from

Conversation

oscarbenjamin
Copy link
Owner

CC @certik

Adds a flatten function method that can be used for common simplifications involving Add, Mul, Pow and Integer:

In [19]: from protosym.simplecas import x, y, sin

In [20]: e = x + x + y*y + 2*sin(x) + 3*sin(x)

In [21]: e
Out[21]: ((((x + x) + (y*y)) + (2*sin(x))) + (3*sin(x)))

In [22]: e.flatten()
Out[22]: ((2*x) + y**2 + (5*sin(x)))

While I want to have a flatten function I don't want it to be used by default. While some of these simplifications are unambiguously good others are not which is why this sort of thing needs to be controlled.

The code implementing flatten is not great. I just put it together quickly while I think of a better way of structuring it.

The flatten function is used by default inside diff but can also be disabled:

In [23]: sin(x).diff(x, 4)
Out[23]: sin(x)

In [24]: sin(x).diff(x, 4, flatten=False)
Out[24]: (-1*(-1*sin(x)))

I don't think I want this function to be used by default in diff. A much more limited version of this would be good because the current behaviour on main gives incorrect derivatives after a few rounds of differentiation:

In [2]: from protosym.simplecas import x, y, sin

In [3]: (x**2).diff(x)
Out[3]: (2*x**(2 + -1))

In [4]: (x**2).diff(x, 2)
Out[4]: (2*((2 + -1)*x**((2 + -1) + -1)))

In [5]: (x**2).diff(x, 3)
Out[5]: (2*((2 + -1)*(((2 + -1) + -1)*x**(((2 + -1) + -1) + -1))))

All that is needed to fix that though is flattening an Add of only integers (constant folding) and applying Pow(x, 1) -> x (which is an unambiguously good simplification). Probably just a specialised derivative rule for diff(x**2, x) -> Mul(2, x) could do that. I think that I will try a more limited version of this PR that only has those two rules: constant folding and handling x**2 better in differentiation.

This PR would fix that but also does too much:

In [2]: (x**2).diff(x)
Out[2]: (2*x)

In [3]: (x**2).diff(x, 2)
Out[3]: 2

In [4]: (x**2).diff(x, 3)
Out[4]: 0

This makes this particular benchmark faster e.g. without main we have (not using rust_protosym):

In [4]: from protosym.simplecas import x, y, sin

In [5]: %time ok = (x**10*sin(x)).diff(x, 100)
CPU times: user 3.08 s, sys: 2.16 ms, total: 3.08 s
Wall time: 3.08 s

In [6]: ok.count_ops_tree()
Out[6]: 3462587614523408610188244805484543

In [7]: ok.count_ops_graph()
Out[7]: 10506

However with this PR we have (not using rust_protosym):

In [2]: %time ok = (x**10*sin(x)).diff(x, 100) # this can be made faster
CPU times: user 217 ms, sys: 0 ns, total: 217 ms
Wall time: 216 ms

In [3]: ok.count_ops_graph()
Out[3]: 43

In [4]: ok.count_ops_tree()
Out[4]: 72

In [5]: ok
Out[5]: ((-62815650955529472000*sin(x)) + (-6902818786321920000*x*cos(x)) + (337637875417920000*x**2*sin(x)) + (9681372771840000*x**3*cos(x)) + (-180238322880000*x**4*sin(x)) + (-2276694604800*x**5*cos(x)) + (19762974000*x**6*sin(x)) + (116424000*x**7*cos(x)) + (-445500*x**8*sin(x)) + (-1000*x**9*cos(x)) + (x**10*sin(x)))

In this example flattening is helpful because all derivatives of this expression can be reduced down. However this is not actually a common case in practice because it is a high order derivative of a simple expression that happens to simplify. The best algorithm for a case like this would be something specialised to high-order derivatives. In practice what we want from an efficient implementation of diff is mostly to be able to efficiently compute 1st and 2nd order derivatives of large expressions and in most cases the sort of reduction seen for high derivatives of x**10*sin(x) will not be seen.

The reason that I use benchmarks like sin(sin(sin(sin(sin(sin(x)))))).diff(x, 10) is that I know that the derivatives cannot simplify by much and so after a few iterations we will find ourselves necessarily find ourselves differentiating large expressions. What I want to benchmark is how performance scales for those large expressions.

The problem with the approach in this PR is that some of the rules applied by flatten increase the size of the expression graph:

In [9]: e = sin(sin(sin(sin(sin(sin(x))))))

In [10]: e.diff(x, 5, flatten=True).count_ops_graph()
Out[10]: 563

In [11]: e.diff(x, 5, flatten=False).count_ops_graph()
Out[11]: 303

In [12]: e.diff(x, 5, flatten=True).count_ops_tree()
Out[12]: 21398

In [13]: e.diff(x, 5, flatten=False).count_ops_tree()
Out[13]: 79165

You can see here that the expression tree representation is made smaller by flattening but the expression graph representation is made larger. We generally want to work with the graph though because it is always smaller than the tree. These differences can all become more extreme:

In [17]: e.diff(x, 10, flatten=False).count_ops_graph()
Out[17]: 1948

In [18]: e.diff(x, 10, flatten=True).count_ops_graph()
Out[18]: 89637

In [11]: e.diff(x, 10, flatten=False).count_ops_tree()
Out[11]: 3623428786

In [12]: e.diff(x, 10, flatten=True).count_ops_tree()
Out[12]: 3883153

There are various improvements that can be made to the flattening here. SymEngine's flattening actually computes a smaller derivative expression. The difference is most likely handling things like a*b + b*a -> 2*a*b which I haven't implemented here yet. These are the numbers for SymEngine's derivative of the above expression (after converting to protosym via SymPy which might have altered the expression):

In [25]: e_sym = sympy.sympify(e_se)

In [26]: from protosym.simplecas import x, y, sin, Expr

In [27]: e_proto = Expr.from_sympy(e_sym)

In [28]: e_proto.count_ops_tree()
Out[28]: 375582

In [29]: e_proto.count_ops_graph()
Out[29]: 8697

We can see here that while SymEngine's derivative has a smaller tree representation it still has a larger graph representation than the derivative computed by protosym without flattening. What I want to do here is to shrink that graph representation by identifying precisely the right simplification rules and apply only those rules.

@certik
Copy link

certik commented Sep 19, 2023

Excellent. Make sure to benchmark it with rust as well and compare against symengine.

What I recommend doing is to construct a series of representative benchmarks. I don't know if evaluating to a single number is actually useful, I think these large expressions are used to generate code (either C or via LLVM) that then evaluate the expression at runtime for many different "x". So then you want it as small as possible.

@oscarbenjamin
Copy link
Owner Author

I think these large expressions are used to generate code (either C or via LLVM) that then evaluate the expression at runtime for many different "x". So then you want it as small as possible.

Exactly but it is the graph size that matters, not the tree size because the graph size is the size of the LLVM code. You can do this already with protosym using llvmlite:

In [1]: from protosym.simplecas import x, sin, lambdify

In [2]: e = sin(sin(sin(sin(sin(sin(x))))))

In [3]: %time ed = e.diff(x, 10, flatten=False)
CPU times: user 68.9 ms, sys: 3.11 ms, total: 72 ms
Wall time: 70.6 ms

In [4]: %time f = lambdify([x], e) # uses llvmlite
CPU times: user 38.8 ms, sys: 13.9 ms, total: 52.8 ms
Wall time: 51 ms

In [5]: f(1)
Out[5]: 0.5540163907556296

In [6]: %timeit f(1.0)
420 ns ± 0.893 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Here is the same with SymEngine and its LLVM backend:

In [1]: from symengine import sin, symbols, lambdify

In [2]: x = symbols('x')

In [3]: e = sin(sin(sin(sin(sin(sin(x))))))

In [5]: %time ed = e.diff(x, 10)
CPU times: user 233 ms, sys: 12.8 ms, total: 246 ms
Wall time: 245 ms

In [7]: %time f = lambdify([x], [e])
CPU times: user 7.27 ms, sys: 3.07 ms, total: 10.3 ms
Wall time: 10.3 ms

In [8]: f(1)
Out[8]: array(0.55401639)

In [9]: %timeit f(1.0)
7.35 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

So the protosym LLVM code times at 0.5 microseconds vs the SymEngine one at 7 microseconds. In this case the protosym generated expression is about 15x faster.

@certik
Copy link

certik commented Sep 19, 2023

That's a good benchmark. It runs too fast, I don't know if SymEngine has some overhead with passing in numpy arrays, but this kind of benchmark is very good, we just need to make the expression longer probably. What is a graph and a tree? I don't understand the difference.

@oscarbenjamin
Copy link
Owner Author

I don't know if SymEngine has some overhead with passing in numpy arrays

I don't think that is the issue. It is just that the LLVM code is different. I don't know how to see the code generated by SymEngine but you can see the protosym generated code easily:

In [1]: from protosym.simplecas import x, sin, cos

In [2]: e = sin(sin(x)).diff(x, 2)

In [3]: e
Out[3]: ((((-1*sin(sin(x)))*cos(x))*cos(x)) + (cos(sin(x))*(-1*sin(x))))

In [4]: print(e.to_llvm_ir([x]))

; ModuleID = "mod1"
target triple = "unknown-unknown-unknown"
target datalayout = ""

declare double    @llvm.pow.f64(double %Val1, double %Val2)
declare double    @llvm.sin.f64(double %Val)
declare double    @llvm.cos.f64(double %Val)

define double @"jit_func1"(double %"x")
{
%".0" = call double @llvm.sin.f64(double %"x")
%".1" = call double @llvm.sin.f64(double %".0")
%".2" = fmul double 0xbff0000000000000, %".1"
%".3" = call double @llvm.cos.f64(double %"x")
%".4" = fmul double %".2", %".3"
%".5" = fmul double %".4", %".3"
%".6" = call double @llvm.cos.f64(double %".0")
%".7" = fmul double 0xbff0000000000000, %".0"
%".8" = fmul double %".6", %".7"
%".9" = fadd double %".5", %".8"
ret double %".9"
}

Notice that although the expression e looks like it has 7 calls to sin and cos the LLVM code only calls sin and cos 4 times in total. The code is in cse form. The expression graph is just a representation of this cse form:

In [5]: from protosym.core.tree import forward_graph

In [6]: g = forward_graph(e.rep)

In [7]: for a in g.atoms: print(a)
-1
x

In [8]: for func, indices in g.operations: print(func, indices)
sin [1]
sin [2]
Mul [0, 3]
cos [1]
Mul [4, 5]
Mul [6, 5]
cos [2]
Mul [0, 2]
Mul [8, 9]
Add [7, 10]

You can see that this is the same as the LLVM code (which is generated directly from this g). This representation is of a directed acyclic graph rather than a tree. The representation is built from the bottom up starting from the atoms and then proceeding to the top-level expression one unique subexpression at a time.

Almost all operations in protosym are based on this forward graph representation. The main thing that the Rust backend does to speed things up is just to compute this graph more quickly. With the graph even numerical evaluation in Python can be faster.

We can query the size of the graph representation vs the tree representation:

In [9]: e.count_ops_graph()
Out[9]: 12

In [10]: e.count_ops_tree()
Out[10]: 20

The difference between these two numbers is the cse saving. Evaluating the expression recursively from the top would require 20 operations (counting each atom as an operation as well). Evaluating the forward graph requires 12 operations including the atoms.

Protosym computes symbolic derivatives using this forward graph representation and the forward accumulation algorithm that is sometimes used for computing numeric derivatives in automatic differentiation:
https://en.wikipedia.org/wiki/Automatic_differentiation#Forward_and_reverse_accumulation

The graph is always strictly smaller than the corresponding tree and it can be a lot smaller for large expressions that have many repeating subexpressions. We should always want to use this graph representation for numerical evaluation, differentiation, substitution etc. In protosym the only situation the tree representation is used is when building expressions like sin(sin(x)).

The automatic simplifications that SymPy and SymEngine perform often look like they make the expression simpler if you focus only on the tree form but what we should also be focusing on is what they do to the graph form because that is what we should be using for many computationally intensive operations.

@certik
Copy link

certik commented Sep 19, 2023

I see, so graph is a tree after CSE (and possibly some other simplifications) are performed on it?

SymEngine also has CSE, so you can try it and compare it.

We should be benchmarking it on some expression from PyDy. I think nested trig functions like sin(sin(x)) are usually quite rare, aren't they?

@oscarbenjamin
Copy link
Owner Author

I think nested trig functions like sin(sin(x)) are usually quite rare, aren't they

They are but you can think of these functions as a placeholder for any operation. The only functions are sin and cos in protosym so all demonstrations use those! I haven't added more functions because you can easily add your own functions and this is a key point of the design:

In [24]: from protosym.simplecas import Function, diff, a, x

In [25]: tan = Function('tan')

In [26]: diff[tan(a), a] = 1 + tan(a)**2

In [27]: tan(x**2).diff(x)
Out[27]: ((1 + tan(x**2)**2)*(2*x**(2 + -1)))

In [28]: %time ok = tan(tan(tan(x))).diff(x, 10)
CPU times: user 95.6 ms, sys: 2.58 ms, total: 98.2 ms
Wall time: 95.5 ms

Adding your own functions is important because the intention is that protosym would not define any functions but would rather just be a framework that can be built on top of. E.g. SymPy could depend on protosym but could then define all the functions that it wants in the sympy codebase. It is important to understand here that the protosym.simplecas part of protosym is only intended to be a demonstration of how to use protosym.core. In the future protosym.simplecas would perhaps not be there. That sort of code should all be in SymPy or in something else that depends on protosym. It is not intended that protosym itself would ever be something that resembles SymPy or SymEngine.

Here is an example that is just polynomial expressions which is probably closer to PyDy examples. In this case the polynomials are too large to be expanded the way that Flint/polys would (timings without rust_protosym):

In [2]: from protosym.simplecas import Function, diff, a, x

In [3]: def make_expression(x, n):
   ...:     e = x**2 + x
   ...:     for _ in range(n):
   ...:         e = e**2 + e
   ...:     return e
   ...: 

In [4]: make_expression(x, 1)
Out[4]: ((x**2 + x)**2 + (x**2 + x))

In [5]: make_expression(x, 2)
Out[5]: (((x**2 + x)**2 + (x**2 + x))**2 + ((x**2 + x)**2 + (x**2 + x)))

In [6]: make_expression(x, 3)
Out[6]: ((((x**2 + x)**2 + (x**2 + x))**2 + ((x**2 + x)**2 + (x**2 + x)))**2 + (((x**2 + x)**2 + (x**2 + x))**2 + ((x**2 + x)**2 + (x**2 + x))))

In [7]: e = make_expression(x, 100)

In [8]: e.count_ops_graph()
Out[8]: 204

In [9]: e.count_ops_tree()
Out[9]: 10141204801825835211973625643005

In [10]: %time ed = e.diff(x)
CPU times: user 10.2 ms, sys: 7.94 ms, total: 18.1 ms
Wall time: 18.1 ms

In [11]: %time ed.eval_f64({x:.001})
CPU times: user 3.31 ms, sys: 0 ns, total: 3.31 ms
Wall time: 3.32 ms
Out[11]: 1.2368849025211697

In [11]: from symengine import symbols

In [12]: x = symbols('x')

In [13]: e = make_expression(x, 100)

In [14]: %time ed = e.diff(x)
CPU times: user 14.2 ms, sys: 0 ns, total: 14.2 ms
Wall time: 14.7 ms

In [16]: %time ed.subs({x:.001})
CPU times: user 14.2 ms, sys: 3.36 ms, total: 17.6 ms
Wall time: 18 ms
Out[16]: 1.23688490252117

This is a contrived extreme benchmark that favours cse style simplification but it is also not completely unrealistic for the situations that arise in PyDy. They are not situations where any simplification really helps. Basically a large expression is built with symbolic arithmetic and it has products and powers that cannot be expanded. The expression always has large repeating subexpressions. They want to differentiate those large expressions and then evaluate numerically or do codegen.

Of course here I used make_expression(x, 100) but we can always do make_expression(x, 1000) etc:

In [12]: %time e = make_expression(x, 1000)
CPU times: user 91.6 ms, sys: 0 ns, total: 91.6 ms
Wall time: 89 ms

In [13]: %time e = make_expression(x, 10000)
CPU times: user 463 ms, sys: 3.3 ms, total: 466 ms
Wall time: 464 ms

In [14]: from symengine import symbols

In [15]: x = symbols('x')

In [16]: %time e = make_expression(x, 1000)
CPU times: user 110 ms, sys: 12.1 ms, total: 122 ms
Wall time: 120 ms

In [17]: %time e = make_expression(x, 10000)
^C---------------------------------------------------------------------------
KeyboardInterrupt 

I interrupted that last step because it was seeming slow and I didn't want my computer to crash (as has happened when previously testing examples like this).

@certik
Copy link

certik commented Sep 20, 2023

Yes, it's a good benchmark to benchmark custom functions like that.

Note that in LPython, I think we'll eventually support all functions, kind of like Mathematica does it (eventually thousands). For the PyDy use case, isn't it rare to need to create your own function with custom rules?

When I started SymPy, this was the first thing I did, to make it easy, but then I think it's very rarely used I think. I never used it myself.

@oscarbenjamin
Copy link
Owner Author

For the PyDy use case, isn't it rare to need to create your own function with custom rules?

PyDy does not need to create its own functions but SymPy does. If SymPy depends on something else then it is problematic if all of the functions and the rules for those functions are defined somewhere else without SymPy having any control over them.

I never used it myself.

You might not but SymPy's use as a library within the scientific Python ecosystem is significantly limited by not being easily extensible like this. There would be many more libraries building on top of SymPy it was easier to do this and if those new expression types would be fast: if SymPy does not make your custom symbolic expressions faster than writing your own implementation of Basic then you can just write your own. Plenty of libraries do that rather than building on top of SymPy or SymEngine because they are just not useful in that situation (SymPy is too slow and SymEngine is not extensible). This is why the Julia people decided that neither SymPy not SymEngine was suitable for what they needed.

@oscarbenjamin
Copy link
Owner Author

PyDy does not need to create its own functions but SymPy does.

Actually this isn't true. PyDy does need to create its own functions for codegen etc. The functions would not be like sin or cos but rather things like LinearSolve(M, b) etc. Ideally it would be easy for them to do this to control their numeric workflow but it is not.

@certik
Copy link

certik commented Sep 20, 2023

SymPy is extensible, you can create your own class. You probably mean custom simplification rules?

Regarding Julia, they wanted more flexibility, but when they stopped using SymEngine, they gave up performance, as my benchmarks show. You can definitely trade performance for flexibility. But the goal of SymEngine is maximum performance.

@oscarbenjamin
Copy link
Owner Author

SymPy is extensible, you can create your own class. You probably mean custom simplification rules?

Firstly you can't control what the existing types like Add do. Their behaviour is all hard-coded. Not being able to control what Add does then means that you can't easily control what f(x) + f(y) does for your own custom types either.

In the examples above where the graph size increases because of flatten I think that actually the main cause is things like Add(Add(x, y), z) -> Add(x, y, z). This has the effect of reducing the number of common subexpressions because there might be many occurrences of Add(x, y) but then after flattening we have lots of different subexpressions like Add(x, y, z), Add(x, y, t) etc.

Secondly the way to extend SymPy is far too difficult. Few libraries attempt it and in fact few SymPy developers can get it right either. The guide here is good but what it shows is that it is just much more difficult than it should be:
https://docs.sympy.org/latest/guides/custom-functions.html

Regarding Julia, they wanted more flexibility, but when they stopped using SymEngine, they gave up performance, as my benchmarks show. You can definitely trade performance for flexibility. But the goal of SymEngine is maximum performance.

They gave up performance of some things in favour of flexibility but also to focus on performance in other things. They are now building many more useful things over the top of their symbolics than are built on top of SymEngine and SymPy like e.g.:
https://github.com/SciML/ModelingToolkit.jl

The question is: in what situations does it actually make sense to do the particular things that SymEngine would make faster?

If the goal is numerical evaluation then we want something like LLVM.

If the goal is solving linear equations symbolically then we want something like Flint.

If the goal is solving linear equations numerically then we want BLAS.

Fast differentiation is important and is where SymEngine can speed up a calculation that might otherwise be slow with SymPy. In the context of the mechanics module the situation is basically that we want to solve a system of linear equations, differentiate and then eventually evaluate numerically (actually it is more complicated but these are the basic steps).

The large expressions to be differentiated are generated by solving the system of linear equations symbolically. The differentiation is slow just because solving the system symbolically generates absurdly large expressions whose derivatives become even more absurdly large expressions. Actually though when the end goal is numerical evaluation it would be better not solve the equations symbolically: evaluate the matrices numerically and then use BLAS to solve the linear equations numerically. Doing it this way is both faster and more accurate. The question is how you get the derivatives and you get them from implicit differentiation.

We have these equations and we want x'(t):

M(t) x(t) = b(t)

So we solve and then differentiate:

x'(t) = (inv(M(t)) * b(t)).diff(t)

This is slow with SymPy and SymEngine makes it a lot faster but it is still quite slow and both SymPy and SymEngine generate absurdly large expressions when doing this. Computing inv(M(t)) * b(t) makes very large expressions and then differentiating them makes them explode even more.

Actually though this is not the right way to do the calculation. Instead we should use implicit differentiation

M'(t) x(t) + M(t) x'(t) = b'(t)

Now the combined equations become

[M(t)     0] [x(t) ]   [b(t) ]
[M'(t) M(t)] [x'(t)] = [b'(t)]

The expressions in this equation are all much smaller than in diff(inv(M(t)) * b(t), t) and the derivatives in this case often tend to simplify the (approximately polynomial) expressions rather than making them more complicated. The numeric part now is that we should numerically evaluate these matrices with LLVM and use BLAS to solve the linear equations and then probably use a numeric ODE solver or something. What this means is that the part that SymEngine was making faster (algebra and differentiation) is not so significant any more in this workflow.

What the mechanics module needs is a way to represent the linear solve part implicitly:

x = LinearSolve(M, b)
xp = x.diff(t)  # This would produce a LinearSolveDiff object or something

We need this to be implicit rather than computing the solution in explicit symbolic form because when we codegen this the LinearSolve should become a call to BLAS.

It turns out that doing the calculation this way means that the expressions are all small and what you really want is to have unevaluated things like LinearSolve that represent something without any symbolic evaluation. You don't want the symbolic evaluation because the whole point is that you are symbolically representing a numeric calculation that will be done later after codegen. An example of where the automatic symbolic evaluation becomes problematic in this:

In [21]: M = Matrix([[x, y], [z, t]])

In [22]: Minv = Inverse(M)

In [23]: Minv
Out[23]: 
      -1
⎡x  y⎤  
⎢    ⎥  
⎣z  t⎦  

In [24]: 1 * Minv
Out[24]: 
⎡    t         -y    ⎤
⎢─────────  ─────────⎥
⎢t⋅x - y⋅z  t⋅x - y⋅z⎥
⎢                    ⎥
⎢   -z          x    ⎥
⎢─────────  ─────────⎥
⎣t⋅x - y⋅z  t⋅x - y⋅z⎦

This is exactly what you didn't want: the inverse was supposed to be computed numerically later not symbolically now. Note that this is happening with the matrix expressions module that was designed to have more restricted automatic evaluation than the rest of SymPy but it still ultimately has the same problem. In the calculation above we never want Inverse to "evaluate". Its purpose is to remain there until codegen.

On the other hand sometimes it does make sense to solve the linear equations symbolically because your end goal is to do symbolic things rather than numeric things. In that case though you need to compute the sort of symbolic inverse that Flint/DomainMatrix computes because you need the expressions to be structured rather than unstructured:

In [26]: from symengine import symbols, Matrix

In [27]: x, y, z, t = symbols('x, y, z, t')

In [28]: M = Matrix([[x, y], [z, t]])

In [29]: M.inv()
Out[29]: 
[(1 + y*z/(x*(t - y*z/x)))/x, -y/(x*(t - y*z/x))]
[-z/(x*(t - y*z/x)), (t - y*z/x)**(-1)]

With DomainMatrix that is:

In [30]: from sympy import symbols, Matrix

In [31]: x, y, z, t = symbols('x, y, z, t')

In [32]: M = Matrix([[x, y], [z, t]])

In [33]: M.to_DM()
Out[33]: DomainMatrix({0: {0: x, 1: y}, 1: {0: z, 1: t}}, (2, 2), ZZ[x,y,z,t])

In [34]: Mnum, den = M.to_DM().inv_den()

In [35]: Mnum.to_Matrix()
Out[35]: 
⎡t   -y⎤
⎢      ⎥
⎣-z  xIn [36]: den.as_expr()
Out[36]: tx - yz

Note here that this structured inverse will be in a much nicer form for differentiation: every expression is a ratio of two expanded polynomials. Obtaining the inverse in this structured form is more useful for subsequent symbolic calculations and also guarantees to avoid divide by zero and detects if the matrix is invertible etc. To compute this sort of structured symbolic inverse fast what is needed is DomainMatrix/Flint rather than expression trees.

So if all of these things are being done in the best possible way (LLVM, BLAS, Flint, unevaluated expressions, ...) then the things that SymEngine accelerates become less relevant in the overall workflow. I think that is basically what the Julia folks have realised and why they are not trying to match SymEngine's speed for algebra and differentiation and are instead focusing on other things.

@certik
Copy link

certik commented Sep 20, 2023

Yes, there is nothing wrong with giving up performance for flexibility. As long as we are clear on this and communicate this clearly.

There is also nothing wrong with optimizing the library for a different set of benchmarks than SymEngine is. Again, not a problem, as long as we are completely clear about that.

So why don't you create a set of relevant benchmarks that you think we should optimize for?

@oscarbenjamin
Copy link
Owner Author

There is also nothing wrong with optimizing the library for a different set of benchmarks than SymEngine is.

But we can also still use SymEngine for the things where it is faster so we don't need to make that tradeoff. What we really need is that it should be fast to convert from a SymPy expression to SymEngine and back because then it would be easy to speed up a particular calculation.

for example in the mechanics module it would make sense to convert to SymEngine to compute a large derivative but you would need to be able to quickly convert from SymPy to SymEngine and back for it to be worthwhile.

I considered the same thing with protosym e.g. (without rust_protosym):

In [9]: from sympy import symbols, cos, sin

In [10]: %time e = sin(sin(sin(sin(x))))
CPU times: user 4.21 ms, sys: 9 µs, total: 4.22 ms
Wall time: 4.2 ms

In [11]: %time ok = e.diff(x, 4)
CPU times: user 340 ms, sys: 73 µs, total: 340 ms
Wall time: 338 ms

In [12]: from protosym.simplecas import Expr

In [13]: %time e_proto = Expr.from_sympy(e)
CPU times: user 485 µs, sys: 18 µs, total: 503 µs
Wall time: 518 µs

In [14]: %time x_proto = Expr.from_sympy(x)
CPU times: user 202 µs, sys: 0 ns, total: 202 µs
Wall time: 216 µs

In [15]: %time ed_proto = e_proto.diff(x_proto, 4)
CPU times: user 6.31 ms, sys: 0 ns, total: 6.31 ms
Wall time: 6.26 ms

In [16]: %time ed_sympy = ed_proto.to_sympy()
CPU times: user 125 ms, sys: 2 µs, total: 125 ms
Wall time: 123 ms

So in this example SymPy takes 338 milliseconds to compute a derivative. Converting to protosym takes effectively no time in comparison to this and differentiating in protosym takes 6 milliseconds (and would be faster if rust_protosym was used). The problem is that converting back to a SymPy expression takes 120 milliseconds. If that last step was fast then it could be possible to make all sorts of things fast using SymEngine or anything else in particular places. As long as converting back to a SymPy expression is slow then that would always be the speed limiting factor.

So why is it slow to convert back to a SymPy expression? It is because just creating SymPy expressions is slow due to SymPy's automatic evaluation code being slow. It gets more noticeable in cases like this where a calculation is done outside of SymPy because when you try to convert back you are trying to create lots of expressions that aren't in SymPy's cache (otherwise this slowness would be spread out over time and might seem like it was part of diff: actually it is just __new__).

If SymPy had a non-evaluating expression type then that conversion step could be made much faster. Many things could be made much faster if there was just some way to create SymPy expressions quickly when evaluation is not needed. The expression in the example above is not that complicated:

In [21]: ed_proto.count_ops_graph()
Out[21]: 106

We only need to create 106 expression nodes and even in Python it should be possible to create a node much more quickly than 1 millisecond (i.e. 1 million CPU instructions).

@certik
Copy link

certik commented Sep 20, 2023

I think we need to be able to construct SymPy expressions and disable automatic evaluation --- obviously we are then responsible to ensure they are in the correct form (so we can't create x+x, but should create 2*x instead).

@oscarbenjamin
Copy link
Owner Author

obviously we are then responsible to ensure they are in the correct form

If SymPy had an unevaluated expression type then it would be possible to convert to that without worrying about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants