Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: Add dimensionality to enrich type system and strengthen compile-type checking #57973

Closed
czetie opened this issue Jan 24, 2023 · 64 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Milestone

Comments

@czetie
Copy link

czetie commented Jan 24, 2023

Author background

  • Would you consider yourself a novice, intermediate, or experienced Go programmer?
    Novice with Go, experienced with other languages esp. C

  • What other languages do you have experience with?
    Fortran, Pascal, Algol, C, Rust, Python, PL/M, various assembler

Related proposals

  • Has this idea, or one like it, been proposed before?
    Not to my knowledge or that of other Go programmers I spoke with

  • Does this affect error handling?
    No

  • Is this about generics?
    No

Proposal

  • What is the proposed change?
    Golang does not allow mathematical operations between operands of different types, even if they have compatible underlying types. This provides highly useful protection for addition and subtraction and is helpful in avoiding a certain class of logical errors. For example, it is almost certainly a logical error to attempt to calculate speed + elapsedTime, (assuming their types have been appropriately defined as float64, for example). However, this is too restrictive for multiplication and division: distance = speed * elapsedTime is a perfectly reasonable calculation. In order to perform this calculation today, the programmer must reduce everything to compatible types, typically the underlying types, this sacrificing the benefits of type checking. For example:
type distance float64
type time float64
type velocity float64

var v velocity
var d distance = 100
var t time = 2

v = d / t                                                  //will not compile, incompatible types
v = velocity(float64(d) / float64(t))        //will compile, but throws away type checking of d and t

A small enhancement to the rules for type mixing in expressions would allow useful constructs as mentioned above such as distance = speed * elapsedTime, as well as other more complex calculations, while still preventing mistakes such as distance = speed + elapsedTime. The characteristic of these calculations is that they mix values of different "dimension", in this case velocity, which in turn can be defined as having dimension Length/Time; and Time. And the resulting value has dimension different from either, i.e. Length in this case. While adding values of different dimension makes no sense, multiplying or dividing them can be perfectly meaningful and extremely useful.

This proposal would allow the developer to specify, by using the type system, dimensions and valid combinations of dimensions together with the new dimension that they result in.

  • Who does this proposal help, and why?
    This proposal would help any programmer working with data that has dimensions, e.g. physical quantities of Length, Time, Distance, or combinations of them, i.e. essentially all scientific and engineering data, by providing an added compile-time check that only quantities of the correct dimension are being mixed.

It could also help programmers in commercial applications where dimension might be defined more abstractly than in physics. For example, to estimate project cost we multiply "number of FTEs" by "fully loaded FTE cost" to get a result in dollars (or appropriate currency). We should not be able to use any old number defined in dollars in this calculation, but only one defined as an FTE cost. Or we might estimate project duration by dividing "number of story points" by "story points per FTE per day" and multiplying by "number of FTEs" to get a result with dimension measured in Days. Other examples of common classes of commercial mistake that can be caught with appropriate definitions include mixing quantities in different currencies; or multiplying by a fraction (0.0 to 1.0) instead of a percentage (0.0 to 100.0).

The more complex the calculation, the more valuable the checking of dimension. And to reiterate, this checking is lost today even if the programmer defines types, because in order to write calculations that mix types, they must be reduced to the underlying type, e.g. float64.

  • Please describe as precisely as possible the change to the language.
    This proposal would ADD a new form of type definition as an extension to existing syntax to define a type as the product or quotient of other types.

It would also MODIFY the restriction on mixing types in arithmetic expressions: it would still be forbidden to mix types in addition and subtraction, but it would be permitted to mix types in multiplication and division provide they have compatible underlying types.

  • What would change in the language spec?
  1. Multiplication or division of values and variables of different types would be allowed, provided their underlying types are compatible.
  2. The syntax for defining types would be extended to define types in terms of the product or quotient of other types, e.g.
type distance float64
type time float64
type mass float64
type velocity (distance / time)
type acceleration (velocity / time) 
type force (mass * acceleration)
type energy (mass * velocity * velocity)

Such type definitions could be compounded to any (reasonable) degree necessary for the domain of the application.

  • Please also describe the change informally, as in a class teaching Go.
    Consider the following code fragment that attempts to calculate a velocity from the division of distance by time.
type distance float64
type time float64
type velocity float64

var v velocity
var d distance = 100
var t time = 2

v = d / t		//will not compile

The compiler will reject the assignment because the quotient of d and t has a different type from v, even though they have the same underlying types. Specifically, d / t has type (or dimension) distance / time, which is not the same as velocity- at least as far as the compiler knows! We could make it compile by the following explicit type conversions:

v = velocity(float64(d) / float64(t))

However, now we have sacrificed the type checking that we had hoped for by creating types for distance and time; d and t could mistakenly be variables of any type whose underlying type is float64.

A better approach which allows us to retain the type checking is to define the type velocity not as float64 but as the quotient of types distance and time:

type distance float64
type time float64
type velocity (distance / time)

var v velocity
var d distance = 100
var t time = 2

v = d / t		//will compile, retains type checking

Of course, this example is very simple; real scientific and engineering formulas will often have many more variables and types.

  • Is this change backward compatible?
    Yes. This does not remove or change any existing syntax; nor change the semantics of any existing compilable code. It extends the allowed forms for calculations (type mixing) and for specifying types in terms of other types.

  • Orthogonality: how does this change interact or overlap with existing features?
    The concept of dimension is complementary to existing syntax for types, but as far as other language features are concerned, e.g. parameter passing, the end result behaves just like existing types.

  • Is the goal of this change a performance improvement?
    No.

Costs

  • Would this change make Go easier or harder to learn, and why?
    Probably minimal change. One small addition to the syntax for types is added, and a potential confusion about mixing types is removed.

  • What is the cost of this proposal? (Every language change has a cost).
    Modest. Existing code for determining the underlying types of variables and expressions and for converting between types, particularly to and from arithmetic types, will be relevant and can probably be reused. Additional code will be required in the code for parsing arithmetic expressions (see below under proposed implementation).

  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
    Compiler only.

  • What is the compile time cost?
    Modest. Additional checking for type compatibility in multiplications and divisions involving defined types whereas today those expressions are quickly rejected as errors.

  • What is the run time cost?
    Nil. After compile-time checking, the code reduces to the underlying types like any other calculation.

  • Can you describe a possible implementation?
    The compiler builds a dependency tree of types defined in terms of other types all the way down to underlying types. When it encounters an expression that multiplies or divides types it traverses the tree down to one level above underlying types, and verifies that dimension matches on both sides. For example, with the definitions

type distance float64
type time float64
type mass float64
type velocity (distance / time)
type acceleration (velocity / time) 
type force (mass * acceleration)
type energy (mass * velocity * velocity)

a variable of type energy has dimension (mass * velocity * velocity) which expands to (mass * (distance / time) * (distance / time)). The compiler will also expand the types of the variables to ensure that they have the same base dimensions, e.g. (force * distance) expands to (mass * acceleration * distance) and finally to (mass * distance / time / time * distance). Note that in order to see that these two are the same dimension, they must be rearranged into a common canonical form. However, this can be as simple as a list of base dimensions and for each a list of their indexes, e.g.
mass^1
distance^2
time^-2

Therefore, verifying compatibility requires only traversing the dependency tree to the bottom and accumulating a count of the index of each base dimension.

  • Do you have a prototype? (This is not required.)
    No.

Long prose description of proposal:
Enhancement request_ Dimensionality of Types.docx

@gopherbot gopherbot added this to the Proposal milestone Jan 24, 2023
@ianlancetaylor ianlancetaylor added LanguageChange Suggested changes to the Go language v2 An incompatible library change labels Jan 24, 2023
@ianlancetaylor
Copy link
Contributor

Historically Go has encouraged programming by writing code rather than coding by designing types. This change seems to make the type system more complicated in order to simplify the actual code. That's not a path that Go usually takes.

I'm also not completely clear on how this works. If velocity is distance divided by time, then distance is time multiplied by velocity. If I write type distance float64; type time float64; type velocity (distance / time) then will the compiler permit distance to be divided by time but will not permit velocity multiplied by time?

@czetie
Copy link
Author

czetie commented Jan 24, 2023

  1. The complication of the type system is minimal, and can be completely ignored by those who don't want to use it.

  2. In terms of the Go philosophy I don't see this as any less idiomatic than types in general; and certainly less so than, say, the introduction of generics; or the protections afforded by interfaces. If this is the philosophy, one might as well argue against simple types (i.e. "aliasing" of underlying types) entirely, since it creates more problems than value. In any case, I would argue that there is always value in indicating intentions to the compiler and letting it help avert mistakes, especially when (a) there is no down side as using the construct is entirely optional and (b) there is no runtime cost. I am a big proponent of simple language features that remove mental burden from the developer allowing them to focus on logic rather than, say, details of the type conversion rules.

  3. What is the programmatic alternative here? As far as I can see, the only way to do this without this enhancement is to cast everything down to float64, which eliminates the value of creating types in the first place; significantly complicates the code; obfuscates the programmer's intention for the reader (and isn't it a general principle that code is read many more times than it is written?). Am I missing something? If there were a way to achieve the same thing purely in code that I'm missing, I would be much more sympathetic to the argument that this is an unnecessary complication.

Perhaps the clarity and usefulness of this would be more clear in a more complicated, real-world example involving more variables and dimensions?

  1. I am a huge fan of the type system in Golang, which in combination with interfaces and the direction generics are going in is IMO extremely powerful while avoiding the complexity traps other languages have fallen into. And I think it can be leveraged very powerfully as a complement to coding rather than an alternative.

  2. Yes, you could also write d = v * t (with appropriate var declarations, obviously). The compiler would determine that indexes of the dimensions of v are
    distance 1
    time -1
    and of t it is
    time 1
    leaving just distance, which matches d. And the same would apply to more complex calculations with more dimensions, e.g. energy or power.

@DeedleFake
Copy link

I'm unclear how often something like would be useful. Velocity, time, and distance are pretty obvious, but in a program, how often can types be related like this? Can this handle more complicated relationships that map to more programming-related problems, rather than physics? In other words, can you give an example of something that is more likely to come up in an actual project?

@golightlyb
Copy link
Contributor

I would encapsulate the type conversions in a function like so:

func calcVelocity(d distance, t time) velocity {
   return velocity(float64(d) / float64(t))
}

Is that insufficient?

@czetie
Copy link
Author

czetie commented Jan 24, 2023

First, thank you for your engagement on this and testing whether the idea is robust.

That's as close as I could get too. (I tried doing the equivalent with an anonymous function, but for some reason couldn't get the typing to pass the compiler...).

The main difference here is that the compiler is no longer checking dimensions for us, because we have not told it explicitly that there is a relationship between velocity, distance, and time. So if we made a coding error in the function (unlikely in this simple example, more likely in realistic examples), such as mistakenly writing

return velocity(float64(t) / float64(d))

the compiler would not be able to detect it.

So I think I would say that this is considerably more verbose than being able to write v = d / t, and a lot less clear to the reader than having to jump off to read a function definition, and loses the dimension checking. It also requires a separate function for every distinct formula we want to write this way. Finally, it adds the overhead of a function call (I have no idea how efficient that is in Golang?) - unless the compiler is smart enough to inline the function, which it might do for sufficiently simple functions.

So one way to think of this is that we are in a sense "inlining" the type checking that function parameters perform and putting it into the type checking of the variables directly in the formula; and in addition we gain the dimensional analysis that guards against incorrect formulas.

Would it help if I were to show some more extended examples?

@czetie
Copy link
Author

czetie commented Jan 24, 2023

@DeedleFake I have spent the last decade primarily working with people doing scientific and engineering projects so I admit I have a kind of population bias. However, with those people dimensional analysis is something they do all the time as verification of their formulas; so it would be very valuable to that population for the compiler to verify their formulas.

However, I do think lots of other domains have the potential to benefit although in less obvious ways than physics dimensions - although it might take a bit of a mental shift to realize that their variables have dimensions. For example, I have also been working with people in distributed storage-defined software and here they have many integers being juggled in calculation such as buffer size (bytes); bytes per block; blocks per network packet; overhead (bytes); queue lengths; etc. Consequently, it is easy and far from unknown to mix up what different integers represent when calculating buffer sizes, overhead allowances, etc. So for one simple example, if we want to compute how many bytes fit in a network packet, we calculate bytes per block * blocks per network packet, which very naturally has units of bytes per network packet.

Another example comes from own experience in a factory automation system, one of the gnarliest I dealt with as field engineer. At a canned food factory there was a production line subsystem that counted cans off the line, and a warehouse management subsystem that counted them into the warehouse. However, the latter subsystem was crashing periodically with an integer overflow. Eventually I found the problem at the interface between the two subsystems: the warehouse subsystem was expecting to receive a count of the number of pallets, which it then multiplied by 16 to get the number of cases of product in stock. However, the programmer of the production line system was not passing the number of pallets, but rather the number of cases, which when multiplied by 16 caused the crash. Judicious use of definitions (dimensions) such as countCansPerCase, countCasesPerPallet, countCases, and countPallets could have prevented the confusion. Lots of errors of this class happen at the interface between subsystems (e.g. the Mars mission that was lost because of a confusion between Metric and Customary units).

For another domain example, consider project tracking/planning software. You might have lots of integer values such as number of FTEs; story points for each story; story points per day for each programmer; burn down rates; etc. As integers they can be combined in many ways, most of them meaningless. By defining dimensions such as FTEs, story points, days, etc. we can ensure that we are only combining them in meaningful ways.

@czetie
Copy link
Author

czetie commented Jan 24, 2023

@DeedleFake P.S. if you have an existing codebase I could look at and see how it might work with this proposal, I am willing to do so. I think that is a fairer test than my making up an arbitrary or hypothetical example.

@OwlSaver
Copy link

I really like this idea. It brings something that we naturally do every day and makes it a part of the language. In banking and insurance domains there are many dimensions that would only help to ensure the calculations are correct. It seems to me that this is a mechanical check that is actually very hard for humans to do. But it would be easy for a machine to do.

Make it a basic feature of function parameters. Then it can easily be applied to the special case of operators.

@atdiar
Copy link

atdiar commented Jan 25, 2023

Dimension checking is something that gets drilled into us when dabbling in physics but from a comp.sci point of view, how would someone implement this?

All I can think of seems to rely on operator overloading at its basis (which is not allowed in Go for also good reasons) .
Is this what this proposal could be reduced to? Anyone has an idea?

Edit: more specifically, a subset of operation overloading restricted to subtypes.
Basically, being able to further specify the operation + as a monomorphizable quasi-method plus(u, v Number) Number for instance. plus would be using a switch case and would have to be extendable for the various relations between defined subtypes of Number... Don't know if it makes sense or if it is really possible, especially when wanting to avoid whole-program analysis.

@apparentlymart
Copy link

I am familiar with this sort of dimensional-analysis-driven programming in some high-level languages, but I don't think any general-purpose language I've used has included it as a first-class feature.

Instead, I've typically seen it implemented by starting in a language that offers operator overloading and then using a library which allows constructing quantity types which include suitable overloads to make the arithmetic operators work correctly with them while also propagating the dimensions (and sometimes also converting units).

For example:

Of course I don't claim that the languages I'm familiar with are the only languages worth considering, but I would consider the three languages I used for the examples above to all have some intended use-cases in common with Go. (If you are thinking of some specific other languages that do have dimensions and quantities as first-class language primitives rather than as libraries, I'd love to learn about them!)

Assuming that it were valuable to model dimensions and quantities in Go, might it be more appropriate to follow the lead of these other languages and implement operator overloading as the language-level primitive, and then leave it to libraries to deal with the specific domain of engineering and scientific applications that would benefit most from dimensions and quantities?

@ianlancetaylor
Copy link
Contributor

The proposal here is not for operator overloading. It is to extend the type system to permit certain type combinations that are currently not permitted. Right now, the language does not permit dividing a variable of type distance by a variable of type time. With this proposal, it would permit dividing variables of those types, and the result would be type velocity.

@apparentlymart
Copy link

apparentlymart commented Jan 25, 2023

I understand the proposal as allowing to define relationships specifically between types whose underlying types support the / and * operators, and nothing else.

Go today does allow multiplying two values of a single named type whose underlying type supports multiplication. For example:

type velocity float64
v1 := velocity(1.0)
v2 := velocity(2.0)
result := v1 * v2
fmt.Printf("Result is %#v, a %T", v1*v2, result)
Result is 2, a main.velocity

In a dimensional analysis system I think this is an incorrect result, because the result should be (velocity * velocity), assuming such a thing were possible to represent in Go.

I'm not sure if this one hole where the types don't work out correctly invalidates benefits of making new expressions valid, but it does give me pause.

Particularly when considering one of the examples in the proposal which already includes a "velocity squared":

type energy (mass * velocity * velocity)

This seems to suggest that products of the same type are considered for dimensional analysis when combined with another type, but what about this one?

type velocitySquared (velocity * velocity)

Would a declaration like the above change the result of my example above to be Result is 2, a main.velocitySquared?

@apparentlymart
Copy link

apparentlymart commented Jan 25, 2023

The last example in my previous comment -- where defining velocitySquared changed the meaning of velocity * velocity -- also makes me realize that as currently proposed this seems to require global analysis rather than package-scoped analysis.

By that I mean that if type Velocity were in one package and type VelocitySquared were in another package, the Go compiler might need to take into account the definition of VelocitySquared whenever it is dealing with operands of type Velocity.

Although it was the self-product case that made me think of this, I think the same seems to be true for the other examples. If type Distance and type Time are in a different package than type Velocity then the same problem would arise.

It's also possible that two different packages might define different types involving (Distance / Time), making it unclear which definition applies to a particular expression and therefore what the result type of an expression ought to be.

Perhaps it would be reasonable to limit this only to relationships between types in the same package, but I don't think anything else in Go behaves like that.


This also reminds me a little of a feature of the Rust programming language that some developers find frustrating (based on the number of grumpy questions I can find about it online): in order to call a method of a trait on a type that implements that trait, you must import the trait itself into scope even though the expression relying on it doesn't include the trait name anywhere in it.

Rust traits also deal with the problem of potentially-conflicting definitions using the so-called "orphan rules", which (roughly) require that the relationship between a type and a trait must be defined in either the crate that defined the type xor the crate that defined the trait. That means that the compiler only needs to consider two possible locations for a definition (so no global analysis) and the program is invalid if the two locations conflict with one another.

Of course this dimensional analysis problem is far less general than Rust traits, but it does seem to still have the same question of which behaviors are in scope in which locations, and of potentially-conflicting definitions in different packages affecting the same type. I'm not sure if Rust's solutions to this problem really make sense for Go.

@fardream
Copy link

Not specific to go, I don't quite like this kind of types in general.

  1. The type is missing the unit - use the example of velocity/distance/time, what is the unit of those values. What does var v Velocity = 1 mean? Do we need to go a step further to do thing like type DistanceInMeter float64 etc?
  2. Very often in numerical simulations that I worked on, I want to know what the underlying type is (for example, to call either float32 or float64 version of some library code).
  3. I often reuse the memory of the same underlying type - for example, a float64 array may be used to store data for length, then updated for speed.

Because of 1) and 3), I actually have never used this kind of types for numerical simulation.

Also, not to nitpick, velocity generally also includes the direction information - and that's 3 values for a 3D velocity, plus the coordinate system. I am unsure how to properly design a system to account for those.

@golightlyb
Copy link
Contributor

golightlyb commented Jan 25, 2023

@apparentlymart I believe the result would have to be explicitly typed:

type velocity float64
type velocitySquared float64
var result velocitySquared
v1 := velocity(1.0)
v2 := velocity(2.0)
result = v1 * v2
velocityDoubled := velocity(2.0) * v1 // type velocity
result2 := velocitySquared(v1 * v2)

@czetie thanks. I can see the benefit of the compiler checking the formula is correct.

For non-trivial formulas I would cover it by a test, but I can see how it makes a library package easier to use correctly.

I wonder if there could be some proof of concept linter that could parse a comment to start with and validate a program for correctness.

type velocity float64 // Dimension: distance / time

This could help work out things like what was raised by @fardream before commiting to a language change

@czetie
Copy link
Author

czetie commented Jan 25, 2023

Dimension checking is something that gets drilled into us when dabbling in physics but from a comp.sci point of view, how would someone implement this?

No overloading required (other than what already exists for * and /). In its simplest form it just requires lifting the restriction against mixing types in multiplication and division provided that their underlying types are numeric (while retaining it for + and - because it strongly makes sense there).

@czetie
Copy link
Author

czetie commented Jan 25, 2023

@fardream

velocity generally also includes the direction information

Actually I agree with you, but I did not want to complicate the simple examples.

  1. Do we need to go a step further to do thing like type DistanceInMeter float64 etc?

Again, yes, and real world dimensional analysis typically does this, but I didn't want to complicate the examples here. (Remember the infamous lost Mars Lander due to a Newtons vs. customary units mixup?).

Regarding the float32/float64 versions of libraries, again this makes sense. If I were designing a language from scratch I would allow function parameter overloading so that functions could be written with the same names and different parameters (32 and 64 bit versions) and the compiler would distinguish them as distinct functions, calling the matching function based on the types of the parameters. In other words, functions are distinguished not merely by name but their entire signature: name, return type, and parameter type. I suspect that may not be very Go-like, but as is probably clear by now, I like making the compiler do the work!

For the memory reuse, not to sound flippant, but that sounds more like Fortran than Go :-) which of course is what a lot of people in your kind of domain are using. I would suggest that some kind of union type would be appropriate?

@czetie
Copy link
Author

czetie commented Jan 25, 2023

Perhaps it would be reasonable to limit this only to relationships between types in the same package, but I don't think anything else in Go behaves like that.

Very good point. I also suspect that for some applications development teams will want to have a common, shared set of definitions for common dimensions and relationships (much like good old header files in C). Perhaps something that mirrors the Public/private distinction for functions would be appropriate, so that any given developer can make use of team-shared dimensionality as well as private definitions of their own? FWIW I like the idea of reusing the existing mechanism for the Public/private distinction.

@czetie
Copy link
Author

czetie commented Jan 25, 2023

@golightlyb

I wonder if there could be some proof of concept linter that could parse a comment to start with and validate a program for correctness.

type velocity float64 // Dimension: distance / time

This could help work out things like what was raised by @fardream before commiting to a language change

This is a very interesting idea. To make this work you would still need to lift the restriction on mixing different types in division or multiplication, or equivalently implicitly casting down [am I saying that right?] to the underlying types to allow it to compile, but obviously that is a much smaller work effort than implementing the analyzer.

@czetie
Copy link
Author

czetie commented Jan 25, 2023

@golightlyb

For non-trivial formulas I would cover it by a test, but I can see how it makes a library package easier to use correctly.

Yes, and I suspect you would still want tests anyway to validate the calculations. My former customers in the scientific realm used to run massive amounts of tests whenever we changed our software to ensure that previous results still came out the same, only faster.

One way to think of this is that it is essentially "folding in" a certain amount of test - which is a huge help to people who, unlike you, are less than diligent in creating or maintaining tests. (I remember seeing some astonishingly high numbers for the amount of tests that are dead code, obsolete, or trivial "return true" stubs because people see it as a burden to write.)

@bcmills
Copy link
Contributor

bcmills commented Jan 25, 2023

#20757 is closely related.

@bcmills
Copy link
Contributor

bcmills commented Jan 25, 2023

@ianlancetaylor, I think the type-inference problem gets simpler (in some sense) if we allow variables to carry arbitrary combinations of units, and only collapse down to individual types when assigning to a variable of that type: an expression of type distance / time can stand on its own, and only needs to be collapsed to velocity when assigned to a variable of that type.

@bcmills
Copy link
Contributor

bcmills commented Jan 25, 2023

That said, I think the velocity example is something of a category error. “Distance” and “time” are dimensions, not units — even in a system of natural units the meaning of dimensional numbers depends on exactly which system of units is in use. (For example, the existing time.Duration as a unit is properly understood as “nanoseconds”, not “abstract duration”.)

A better motivating example might be the SI base units and their derived units: Seconds, not Time; Meters, not Distance. In such a system, some meaningful units don't have widely-used names: we measure velocity in SI as “meters per second”, not simply “velocity units”.

In such a system, I think it would make more sense to separate the notion of “units” from “numeric representation”: it is more meaningful to speak of “an int64 variable storing a quantity of meters” than of “a variable of type meters”. (To me, units are more like a special kind of type parameter than like types in their own right.)

@czetie
Copy link
Author

czetie commented Jan 25, 2023

@bcmills Interestingly, in my early drafts of this idea I started from the same kind of description that you have here. And yes, strictly speaking dimension is orthogonal to type in the usual sense. I even thought about having a keyword of dimension or dim, so you might write something like:

dim meters
var poolLength, poolWidth int32 meters
var poolArea int64 meters*meters

poolArea = poolLength * poolWidth

I also had thoughts about, as mentioned upthread, about the distinction between dimensions (length, time, etc.) and units (meters, seconds, etc.). After noodling with various designs for the syntax for a while I decided that for strictly practical purposes it was less disruptive to the language (and therefore less to learn) to collapse everything into the type concept. The question would be, is it likely that in a given project we would want to have more than one variable type (int32, int64) to have type meters? If so they should be separate. However, this is very much something where arguments can be made on both sides and may the best argument win.

I also had in mind the kind of non-scientific realms where dimensions are largely "counts" rather than metrics. For example in my hypothetical project planning tool we might have something like

dim FTE                             //Full Time Equivalent workers; could be employees or freelancers
dim durationDay             //Duration measured in days
var headCount int32 FTE  //count of the number of people on the project
var projectDuration int32 durationDay
var projectCostFTEs int 64 durationDay * FTE
...
projectCostFTEs = projectDuration * headCount

@atdiar
Copy link

atdiar commented Jan 25, 2023

@ianlancetaylor @czetie
Ah, yes it seems that it is mostly a typing concern. I initially thought about operator overloading because for the most part, current operators work for values of the same type.
That would allow to have operators available between defined types.
It's true that the implementation of the operation would remain the same however.

@apparentlymart
Regarding the restriction to * and /, that seems wise and easier. And explicitly deals with dimensions.

I still can't help but wonder if there isn't a general treatment somewhere to deal with time and temperature (and other) where deltas can be added but not the values themselves in most cases (unless we want to compute an average but a generic average function could deal with that easily).

Seems that restricting related definitions to a same package could perhaps work. Would have to check.

@bcmills
Oh that's right :)
I think that's something to really think about. Often people add distances, but it doesn't make sense to add time, only durations could be added to times and return another time.
Or... https://physics.stackexchange.com/questions/132720/how-do-you-add-temperatures

@bcmills
Copy link
Contributor

bcmills commented Jan 25, 2023

I still can't help but wonder if there isn't a general treatment somewhere to deal with time and temperature (and other) where deltas can be added but not the values themselves

The general treatment is to not provide arithmetic operators for such types. (time.Time provides an Add method, not a + operator).

Note that most of the problems with time.Duration detailed in #20757 arise from interacting with unitless values (multiplying, dividing, and converting), not interacting with time.Time itself.

@beoran
Copy link

beoran commented Jan 28, 2023

This long article seems interesting here: https://gmpreussner.com/research/dimensional-analysis-in-programming-languages

Now that we have generics perhaps they can be used for this looking at the examples from other languages?

@OwlSaver
Copy link

I just finished that article and was coming here to share it. The article is a good survey of what has been done in other languages. It is from 2018. The end is good where the author presets some alternatives and their pros and cons.

https://gmpreussner.com/research/dimensional-analysis-in-programming-languages

@atdiar
Copy link

atdiar commented Jan 29, 2023

@czetie just wondering. Isn't there still the question of what the return type should be if we just relax rules around operations as you imply? (@apparentlymart original question I believe)

@beoran @OwlSaver nice find!
I wonder if parametricity could be useful to account for units.
If we could write for instance:

type distance[M unit] float64
type duration[S unit] float64
type speed[T, U unit] = distance[T] / duration[U]

type speedSI = speed[meter, second] 

// where meter and second are types that satisfy the unit interface

Don't know if it would make sense, purely brainstorming.

@czetie
Copy link
Author

czetie commented Jan 29, 2023

just wondering. Isn't there still the question of what the return type should be if we just relax rules around operations as you imply? (@apparentlymart original question I believe)

I'm not I'm understanding the question, but I'll take a shot and if I have misunderstood please let me know

I have been writing code like this:

type distance float64
type time float64
type mass float64
type speed (distance / time)
type force (mass * acceleration)
type energy (mass * speed * speed)

var e energy
var m mass 
var s speed

e = m * s * s

I believe you suggesting that the last line here should be explicitly type-converted to the type of the result:

e = energy(m * s * s)

If so, I think you are right. The proposal is essentially the same, except that what we are now saying is instead of "the expression must have the same dimension as the variable to be assigned to", we are saying "the variable must have the correct dimensions for the type conversion (which in this case it does).

So in effect this becomes an extension of the type conversion rules:

  • an expression such as m * s * s has a dimension of (mass * speed * speed) which in turn is (mass * distance / time * distance / time).
  • a compound type can be explicitly converted to any type that has the same dimension.

Does that help?

@atdiar
Copy link

atdiar commented Jan 29, 2023

Yes, overall the idea is there.

I'm still unsure about whether that would just be a change in conversion rules.
Perhaps it really has to be checked during assignment rather.
The idea being that otherwise, it might look like it is possible to convert meters into square meters for instance.
Or perhaps there is something else that would need to change I'm not sure.

But other than that, yes :)

@apparentlymart
Copy link

The general idea of most of my questions so far has been to try to make the proposal more concrete/complete by describing it as precisely as I can and then noting aspects that seem to not yet be fully specified.

In a few cases I also made some observations about what I might expect based on similar decisions elsewhere in the language, but some of the things I've asked just seem like tricky design edge cases where the answer isn't clear to me, so I'm asking in an attempt to understand what you'd expect to see (as someone who has more experience with programming with dimensional analysis in other languages) and fit that with what seems possible to achieve and what interacts well or less well with existing language features.

With all of that in mind, I'm now thinking about this idea of implementing in two steps:

  1. First: Allow multiplying or dividing any combination of named types whose underlying types naturally support those operators.
  2. Later: Allow declaring relationships between those types so that the result type can be derived from the types of the operands using dimensional analysis.

It's important to define for any operation what its return type should be, so splitting this into two steps requires step 1 to answer the question I previously posed about what type a multiplication or division of two different named types should return. Concretely, the question is what the following program should print, and why:

type A float64
type B float64
a := A(1.2)
b := B(2.5)
fmt.Printf("%T", a * b)

It seems like in this two-step proposal the second step would give an answer to this question if there were a type defined as the product of A and B, but for this to be implemented in two steps the first step must itself be fully specified.

I think some earlier questions still apply for step 2:

  • What is the scope of a type declaration that represents a dimensional relationship? Can a type C = (A * B) anywhere in the program (in any imported package?) affect the result of the program above, or would C need to be defined in the same package as A and B in order to be valid? I think the answer to this question is not just a statement for how it ought to behave but also a rationale for why that's the best compromise for real programs we might imagine someone writing using this language feature.
  • Are only explicitly defined types subject to this new behavior, or would a * b in the above return some anonymous type (A * B) even when there is no type alias naming that type? (When I say "anonymous type" here I intend a similar meaning as for an anonymous struct type, like the empty struct struct {}, which we can use in Go without necessarily needing to assign it a name.)

I think doing this in two steps also adds at least one additional question:

  • If step one makes it valid to multiply or divide arbitrary numeric types without explicit conversions, does that remain valid in step 2?

    That is, would it be valid to multiply two different named types that are based on float64 even if there is no declared dimensional relationship between them? And would the result type in that case still be the same as it was in step 1, or does it change in step 2?

I want to be clear that I'm not asking these questions with a goal of catching you out by telling you that you gave the wrong answer. I don't know the answers to these questions myself. I'm curious about your thought process leading to the answers at least as much as the answers themselves, because that can help to understand the shape of the problem and the scope of the proposed solutions.

@czetie
Copy link
Author

czetie commented Jan 30, 2023

@apparentlymart > I want to be clear that I'm not asking these questions with a goal of catching you out by telling you that you gave the wrong answer.

Absolutely clear, and I'm happy that you're as invested in this proposal as you are, and these are exactly the right questions to ask if this concept is to become an implementable design. Many of them are simply things I had not thought of myself.

If step one makes it valid to multiply or divide arbitrary numeric types without explicit conversions, does that remain valid in step 2?

This in particular is an important question, and not one I have worked through because it wasn't originally part of my thought process. If the answer is No then to me that implies that Step 1 should be done in Go v1, and step 2 in Go v2, because it will existing code no longer compile (although the fix is probably automatable). If the answer is Yes then it leaves a hanging question about the type of the result and what it can be validly assigned to, especially as the Go philosophy is averse to implicit type conversions.

@gophun
Copy link

gophun commented Jan 30, 2023

Step 1 should be done in Go v1, and step 2 in Go v2

@czetie I just want to point out that it's highly unlikely there will ever be a v2 of Go. "Go 2" was a conceptual name under which major new features such as modules, error inspection and generics were introduced. But after it turned out that they could be introduced without major breakage, that name is no longer to be taken literally. The idea of ​​introducing a new major version of Go with breaking changes has now been discarded as far as I know. In fact, the plan is to take backward compatibility even more seriously.

@czetie
Copy link
Author

czetie commented Jan 30, 2023

@gophun Thanks for the heads up. I saw a lot of references to it and had no idea it had gone away.

That makes it even more important to work through in design the implications of a two stage approach, if indeed that is considered desirable.

@ianlancetaylor
Copy link
Contributor

Thanks for the proposal.

This is an interesting idea. However, it might be more appropriate for an analysis tool, not in the language proper. The analysis tool could interpret special comments describing the type relationships, and then verify that all conversions were applied appropriately. That might be even better than this proposal because the tool could also verify units (acre foot vs. cubic meters, or whatever).

In general Go's type system is quite simple, and this makes it more complex without providing a clear enough gain. This new functionality is also not generalized across the type system, in that it only works for scalar types.

Also the emoji voting is not in favor.

Therefore, this is a likely decline. Leaving it open for four weeks for final comments.

@czetie
Copy link
Author

czetie commented Feb 2, 2023

@ianlancetaylor Thank you for your thoughts. A couple of final thoughts for your consideration before final close:

  • I am not a fan of special comments, for various reasons (other than for doc). They always seem brittle, and verifying them syntactically is a pain. However, "matters of taste are not to be disputed" as the saying goes
  • Would it be possible to implement the "minimal" version of this idea, i.e. to relax the restriction against mixing different types with * and / operators while retaining it with + and - operators? After all, while the expression speed + duration makes no sense, speed * duration probably does. As well as making the existing syntax for type aliases much more useful, this would be an aid to any external analysis tool. In this model, the resulting type of using * or / between two mismatched types would have to (a) require compatible underlying types and (b) return a result of the same type as if the operation were between those underlying types. However, I suspect that the response to this will be that the Go philosophy against implicit type conversions (a good thing IMO) goes against it.

@czetie
Copy link
Author

czetie commented Feb 2, 2023

@ianlancetaylor Almost but not entirely orthogonal to this proposal, it has provoked another very high level thought in me that relates to the overall philosophy of Go. Is there a way for me to raise this more privately than in this forum? It is a potentially outlandish thought and this does not seem like the right place for it, at least until it has undergone more scrutiny.

@ianlancetaylor
Copy link
Contributor

You can always write directly to me (iant at golang.org) but I can't promise a useful or timely reply.

@ianlancetaylor
Copy link
Contributor

It is unlikely that we would consider permitting multiplication and division on values of different types. That kind of special case makes the language more complicated and harder to understand. Simple rules are highly desirable.

@czetie
Copy link
Author

czetie commented Feb 3, 2023 via email

@atdiar
Copy link

atdiar commented Feb 3, 2023

@czetie I know it's your final word but just wondering if you had thought about how one could build a domain specific language based on go's syntax?
It's more work but I assume that would eschew complexity from being inserted in the type system with some added benefits (vector/tensor handling for quaternion math for instance.., handling of + and - (adding deltas would be possible then, etc)

As I understand, the rule about operations is kept simple in a way that doesn't allow direct translation of distance, speed etc into types.
At best, it describes that only the same kind of types can be used in operations which is a lower level concept (doesn't handle higher dimensions).

Simply defining d=v*t without unit parameterization of these types seems limiting too?

Anyway, the main point is that if one can piggyback on the language and build a DSL that can be transpiled to it, it might be preferable, make more sense and much more flexible and comprehensive?

@czetie
Copy link
Author

czetie commented Feb 3, 2023 via email

@beoran
Copy link

beoran commented Feb 3, 2023

@czetie A good example of a Go based programming language is Go+ https://github.com/goplus/gop. You could probably do something similar to make a Go derivative language.

@DeedleFake
Copy link

You could bypass operators entirely and require methods. For example,

type Velocity struct {
  val float64
}

type Time struct {
  val float64
}

type Distance struct {
  val float64
}

func (v Velocity) MulTime(t Time) Distance { return Distance{v.val * t.val} }
func (t Time) MulVelocity(v Velocity) Distance { return v.MulTime(t) }

It's incredibly unergonomic, but it would allow you to completely control the interactions of units.

@apparentlymart
Copy link

Something like what @DeedleFake wrote was what I was imaging generating in the code generation-based solution I mentioned back in #57973 (comment) . Although I would probably have just used type Velocity float64 and accepted that this allows callers to intentionally break the abstraction if they want to; it's only implicit unwrapping of the named types that I would be concerned about personally, and the existing Go rules mostly take care of that.

I think generating these boilerplate methods for all of the possible relationships implied by some declared dimensions would at least avoid the non-ergonomic nature of implementing the functions.

Of course it still means that using the generated functions requires method call syntax instead of operator syntax, but in a language that doesn't offer operator overloading that is exactly the tradeoff I would expect: built-in operators are for built-in operations only, and more specialized operations use method or function call syntax.

d := Distance(10)
t := Time(2)
v := d.DivTime(t) // "v" has type Velocity due to the generated method signature

The remaining snafu for this is that it is already valid to multiply two values of the same named type without any explicit conversions, and that result contradicts the dimensional analysis rules:

v1 := Velocity(2)
v2 := Velocity(10)

// The following is a nonsense operation, but if it _were_ meaningful
// then it ought to produce a "velocity squared" result, but it actually
// produces just Velocity.
v3 := v1 * v2

Nothing to stop there being a generated VelocitySquared type and a method to produce it of course, but it's unfortunate that possibly the most obvious way to obtain a VelocitySquared from two velocities -- the built in multiply method -- silently does the wrong thing rather than failing as in the other cases.

This doesn't seem like a big show-stopper, though. In a program where the normal way to manipulate these values is via method calls hopefully someone is already expecting to get a "velocity squared" value using a method anyway, and will not think to try the normal multiplication operator.

@DeedleFake
Copy link

Although I would probably have just used type Velocity float64 and accepted that this allows callers to intentionally break the abstraction if they want to

I thought that of that and was going to mention it at the bottom of my post, but I didn't because I thought the issue with the ability to, for example, multiply two velocities that you mentioned later would be enough of a problem to make that approach not worth the extra convenience.

@kendfss
Copy link

kendfss commented Feb 17, 2023

I think it's a shame things are going to pan out this way. It's a great feature for DSP pipeline (which there a fair few go libraries/applications for) verification too. In that vein, I don't think it's any less "clear" than complex numbers, and if done well could even serve as a new basis for that and other number systems. Also think it's a shame to overlook the significance of compile-time correctness guarantees. To that end it would be great if this thread were left open for discussion after the issue is closed.

I think a potentially good implementation of this feature comes from what you might call a/an arithmetic domain/s.
Here I think of this as a type-set equipped with a set of binary operators that is associative and closed.
With that in mind we could specify a domain as follows

package domains

// Using upper/lower case for public/private

type Complex[T ordered & number] domain {
    Real[T]     = T
    Imaginary = Real[T] + I[T]
} // binary operator set is {+}

type SI[T number] domain {
    Speed[T]          = Meter[T] / Second[T]
    Hz[T]                = 1 / Second[T]
    Acceleration[T] = Speed[T] / Second
    Force[T]            = Mass[T] * Acceleration[T] & Mass[T] / (Second[T] * Second[T]) // here & means the expressions on either side should be equivalent
    Energy[T]          = (PlanckConstant /* const defined elsewhere */ * Frequency[T]) & (Force[T] * Acceleration[T] | 1/2 * Mass[T] * Speed[T] * Speed // here | means either of the expressions on either side are valid
} // operator set is {*, +}

type Audio[N, R number] domain {
    Hz[R]           = 1/Second[R]
    SampleRate = R(Samples[N]) * Hz[R] // samples sh/c-ould be an integer
    Freq[R]         = Complex[R] | Hz[R] + Phase[R]
}

type (
    Velocity3d[T number] domain {
        X[T] + Y[T] + Z[T]
    }
    Velocity4d[U number] domain {
        Velocity3d[U] + T[U]
    }
)
package main
import ."domains"
func Mass[T float32|float64](f SI.Force[T], a SI.Acceleration[T]) SI.Mass[T] {
    return f/a
}

Perhaps use-time type-params are unnecessary so long as used types agree with the implied constraint.
I think the usage of a literal in such an expression should impose a use-time requirement.
Perhaps terms like PlanckConstant could have derived units and value.

There is a concern about how to specify that you do not want the resulting value to be "reduced" ie, a vector has multiple parts but a scalar is reduced to one part. It is possible that there are systems whose arithmetic is best modeled by addition/subtraction but warrants "reduction".
In this vein, perhaps an additional keyword vector can be used instead of domain. Or we could just do as mathematicians and expect people to use.
Keeping to what was discussed above, a vector type can be thought of as one whose operator set includes addition and subtraction.

There are also "terminal" units I in Complex, Mass|Distance|Time in SI, though that's just an implementation detail; could implement automatic "terminal" verification fairly straightforwardly for scalars.

AFAIU, all of this could be implemented without making use of operator overloading by using generics. Essentially, the units should behave "like" type Mass = Float except the names would carry over to go doc.

@sparr
Copy link

sparr commented Feb 21, 2023

https://github.com/aurora-opensource/au is a library that provides compile-time dimensional unit safety in C++14 (as opposed to runtime unit manipulation which is more common in libraries in various languages). While their implementation is entirely different from what is proposed here (a library instead of a change to the language), a lot of the discussion in their documentation and on their repository is relevant here.

@beoran
Copy link

beoran commented Feb 21, 2023

@sparr While we won't be able to do as much compile type checking in Go as that library uses in C++, I think it should be possible to port this approach to Go generics. I recommend people who are interested in units in Go to try this approach.

@ianlancetaylor
Copy link
Contributor

More discussion but no change in consensus.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2023
@golang golang locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests