TM versus GD versus IM versus DD Discussion #2599

smiths · 2021-06-11T18:27:23Z

Discussion about the differences between TMs (Theoretical Models), GDs (General Definitions), IMs (Instance Models) and DDs (Data Definitions) come up all the time when working with Drasil. I've created this issue so that I can present the current thinking on these definitions and propose my current idea on how to modify the definitions and improve the distinction. @JacquesCarette and @balacij, please let me know your thoughts. My presentation below is not intended to be complete or definitive; it is intended to get the conversation started. 😄

The terminology and examples will all be from the viewpoint of an external reader of the SRS (Software Requirements Specification) documentation. The distinctions made are from the point of view of the document reader. I'm leaving for a separate discussion what information should be reflected in the internal Drasil representation of this knowledge and how best to encode this information.

I will first present the definitions I've been using for years. I no longer think these definitions are adequate, but I won't get into that discussion until after I've presented the mental model that has been used to date. I'll also define the supporting terminology of Goals and Assumptions.

To start with, here is a diagram showing the relationship between all of the pieces.

Goal

A goal is a functional objective the system under consideration should achieve. Goals provide criteria for sufficient completeness of a requirements specification and for requirements pertinence. Goals are written abstractly, with a minimal amount of technical language and no equations. They should be understandable by non-domain experts. Goals are refined by the instance models. (I forgot to include Goals in the figure above.)

A sample goal from SWHS is: "Predict the temperature of the water over time."

Assumptions

The assumptions are a refinement of the scope. The scope is general, where the assumptions are specific. All assumptions should be listed, even those that domain experts know so well that they are rarely (if ever) written down. The assumptions will be used to refine the theoretical models and general definitions.

A sample assumption is: "The only form of energy that is relevant for this problem is thermal energy. All other forms of energy, such as mechanical energy, are assumed to be negligible."

Theoretical Model

Theoretical models are abstract mathematical equations that will be refined into the instance models that are used to satisfy the goals for the project. The theoretical models are general. An example for heat transfer problems is the conservation of thermal energy:

Abstraction in the theoretical model means that the dimension of the problem is general (3D space), the coordinate system is not specified, simplifying assumptions have not been made, details are left out (like the boundary conditions for an ODE or PDE), etc. The inputs and outputs are not provided with the theoretical model because it needs to be refined to a point where meaningful, problem specific, inputs and outputs can be assigned.

Given their generality, theoretical models can be reused in building the models for other problems. For instance, the conservation of thermal energy is used for SWHS (solar water heating system), noPCM (solar water tank with just water, no PCM) and heat transfer in nuclear reactor fuel pins.

The theoretical model is not uniquely defined by the goal statement. For instance, in a mechanics problem the forces can be found using the equilibrium principle, or the principle of virtual work.

General Definition

The eventual goal is to have an instance model that is one step removed from code. In some cases jumping directly from the theoretical model to the instance model is too big an intellectual step. The refinement needs to be broken down into separate steps. GDs are a refinement of one or more TMs, and/or of other GDs. The GDs are less abstract than the TMs. Generally the reduction in abstraction is possible through invoking (using/referencing) Assumptions.

The TM for conservation of thermal energy can be refined into a 1D equation (lumped mass) by making assumptions about spatial variations not occurring. The result of the simplification is shown in the following GD for the temperature of a body over time:

The GD is more specific now, but still general enough to be applied in different situations. For instance, the body in question could be refined to be a body of water, or a body of PCM.

The general definitions will involve making decisions about coordinate systems, variable names, how vectors are represented, etc. They are less reusable than the theoretical models, but not as problem specific as the instance models.

Instance Model

The instance models are the equations that define the relationship between the inputs and the outputs. The instance models satisfy the goals. They are used by the requirements to express what the software does. Instance models use assumptions and data definitions to refine the theoretical models and/or the general definitions into specific equations. The instance models should be concrete enough that they can be transformed into code.

For the solar water heating system example, the GD above is refined into several IMs, such as the following:

In this case the temperature of the body of water is found using the temperature of the bodies in contact with the water. The GD above is applied in a similar way to find the temperature of the PCM (shown in another IM).

Given that TMs and GDs are more abstract, there will be many IMs that will satisfy the TMs and GDs. The IMs are more specific because they make choices about coordinate systems, vector representations, specific vector norms, symbols, etc.

Data Definition

The Data Definitions are definitions of symbols that are given for the problem. They are not derived; they are simply used by other models. For instance, if a problem depends on density, there may be a data definition for the equation defining density.

Data definitions might be relevant in other problems, or they might only be relevant in the current problem. For instance, density could come up in many situations, but the definition of the coefficients in the ODE for SWHS is likely unique to SWHS.

Potential Changes to Definitions

The proposed definitions fit fairly well with the SWHS example, but they don't work as well in the projectile example. In the projectile example a = dv/dt is called a theoretical model, but really the definition of acceleration isn't a theory in the conventional use of the term theory. The problem is that the definition is concurrently trying to do two different things:

Capture the notion of refining from general ideas to concrete requirements (and eventually executable code).
Use scientific terminology like theory and model, but relying on the reader's implicit understanding of the terms, instead of making the terminology explicit.

Going forward, I think we should make the notion of refinement explicit and separate it from the definition of the terminology. We should also remove the ``hard coding'' of one possible level of refinement in the template. There is no reason we can't go from a TM directly to an IM, and there is no reason we might not have additional layers of refinement, like TM - GD1 -> GD2 -> IM, etc. Rather than use TM, GD, etc. maybe we should just have the notion of M1, M2, M3, etc (for models at different levels of refinement)? The models would be supported by facts (data definitions).

Separate from the notion of refinement, the distinction between models and data definitions still seems like a relevant idea. There are many different meanings for theory:

https://en.wikipedia.org/wiki/Scientific_theory#Theories_and_laws

From the wikipedia page: "a fact is a simple, basic observation, whereas a law is a statement (often a mathematical equation) about a relationship between facts." ... ""...facts and theories are different things, not rungs in a hierarchy of increasing certainty. Facts are the world's data. Theories are structures of ideas that explain and interpret facts."

Maybe we interpret the data definitions as facts, and the theories are structures of ideas that explain and interpret facts? This doesn't quite work for us. In the wikipedia entry the facts are experimental observations. We aren't really building new theories based on observations; we are using existing theories that others have already build.

We might be better looking for inspiration in the terminology around models. Definitions of models can be found here:

https://en.wikipedia.org/wiki/Scientific_modelling

@JacquesCarette, I think you have a more unambiguous definition of models. What is your definition? I'd be happy to check the existing examples to see how well your definition fits with what we are currently calling models.

JacquesCarette · 2021-06-11T22:38:42Z

This is fantastic - definitely spells out lots of terminology. I need some time to have it sink in, and make a proposal.

One note: different disciplines abuse the words 'theory' and 'model' a lot. And these 'disciplines' can be sub-disciplines. The word 'model' means many many different things, and is fraught with danger because of that. Theory is not quite as bad, but also suffers from overloading.

smiths · 2021-06-11T22:53:24Z

Yes, I think I took advantage of the different uses of the words theory and model when I devised the SRS template. 😄 It would be nice to nail down the terminology so that we are at least internally consistent with how were use the terms.

balacij · 2021-06-17T17:34:56Z

Thank you for the detailed analysis! It certainly clears things up for me.

Considering what you said regarding the names differing between projects, I wonder if we can merge the models like you mentioned but with a ranked name grouping scheme placed on top of the models so that we can retain the names in areas but change it elsewhere (with explicit explanations of the terminology of course, and a restriction that models of a name group may only reference models of a name group with a rank >= their name group's rank). These name groups would likely need to have some defining factors enforced between them (potentially just whether or not the models are directly used in the resulting programs). I don't know how often this would be useful however.

JacquesCarette · 2021-06-17T22:06:27Z

I'll start with definitions taken from the mathematical side of things.

Theory: Already here there are inconsistencies in the mathematical literature. So let me define two versions:
- Theory Presentation: a set of sorts (or types), function (and sometimes relation) symbols, and a set of axioms.
- Complete Theory: a theory presentation and all the theorems that can be derived from it.

The second is often assumed in logic texts, because it is mathematically elegant, and completely impossible to actually write down. It is what I would call an ineffective concept. But it is the correct concept for the 'semantics' or meaning of a theory. Note that it implicitly assumes a background logic with rules of derivations, as a means to establish theorem-hood.

Model: given a particular 'home' (usually set theory), a model of a particular theory is a way to give a carrier for each sort, actual functions (and relations) for each function symbol, and a proof that each axioms holds. So a 'theory' could be the 'theory of monoids' (one type, a binary symbol, a unary symbol, and 3 axioms: associativity, left and right unitality), and then 3 models would be (Nat, +, 0), (Nat, *, 1), (String, ++, []).

And then it gets complicated. One can extend a theory with definitions (using the vocabulary of the background logic). These extensions necessarily 'hold' in all models, by rewriting them in terms of the model. Some theories are even 'ground theories': they actually have 0 sorts, 0 functions and 0 axioms, only definitions! If your ambient logic is powerful enough, this happens a lot. What this means is that you already have a large source of models available to you, because the signature of all these definitions is a theory presentation.

A PL analogy is probably good at this point:

A Theory Presentation is a like an ML signature, a Haskell typeclass, a Scala trait or a Java interface.
A Model is like an ML structure, a Haskell instance, a Scala extends, or a Java implements
In most of these languages, it is possible to define (overridable) functions in such a 'Theory'.

When you're working in Haskell (say), you use the same language to define both typeclasses and instances. The instances are full of terms (the stuff on the right-and-side of the = for each method of the typeclass that defining an instance requires).

All of this to say that, in some sense, there is more of a spectrum between pure theories and pure models than just these two extremes. On the other hand, there is a very important relation between them: the instance-of relation. A given theory may have many many different models - same with all the PL concepts. That's their very reason to exist!

Now, theories are not unsocial beasts, where you have a small zoo of fixed theories and that's it. [If the PL analogy is to hold, that's rather obvious!] There are various operations we can do on theories. Some of the go from theories to theories (adding new symbols, new axioms, etc). These kinds of operations are beyond Drasil right now (and were the topic of the PhD theses of Yasmine Sharoda and Musa Al-hassy).

The more important one, for Drasil, is refinement: making explicit choices that restrict the number of models. For example, a general theory might have been in n dimensions, but now we fix n to be 3. Models of our refined theory will still be models of the big one, with an additional constraint added. Refinement is extremely important for Drasil, as it encapsulates 'design choices' in their purest form. There are many different kinds of refinements, but I won't go into that here.

In theory, a refinement could cause a theory presentation to change quite a lot, by causing many of the embedded expressions to 'simplify'. Very true. And something which will currently be done 'by hand'.

So what does this mean for the topic of this issue?

TM By and large, these correspond to theory presentations.
GD From the description above, these are mostly theory presentations too, but obtained by a refinement step, from either a TM or another GD.
IM seems to be either a model or a theory we don't want to refine anymore. These IM have a specific 'form', in that they are constructive: a specific set of symbols are marked as "free" (or "input"), and the rest as "output" [we can deal with parameters later], and the theory is such that all output can be computed from the inputs.
DD these are exactly definitional extensions. The think that is missing from many DD is the context of definition: i.e. what is the background theory in which these make sense. Once we know that, they can be freely added to anything that refined that theory.

Concretely, I suggest the following:

TM be renamed Theory. (I would prefer theory presentation, but that's overly technical).
GD disappear as a named thing. A GD would still exist, but would really be the result of a process akin to NewThy = ApplyRefinement R1 (ApplyRefinment R2 Thy)). It could be named 'refined theory'. I don't suggest that exact syntax, it's actually kind of bad. But it's the "right" idea.
IM should probably be called either 'ground theory' or 'terminal theory' or 'computational theory' (or something else). I don't want to use the word 'model' at all, as it's way too overloaded.
DD should be Definition. The main thing to change is that it should always be attached to a theory.

The notion of scientific model pointed to by @smiths is in my vocabulary a theory presentation, which is deemed to have a particular relationship with the real world. It is usually somewhat crude, in the sense that there is an unwritten theory that one believes would be 'more accurate' that this is a refinement of, but the larger theory is never written down, as it would be too unwieldly. Sometimes, the assumptions made in this refinement are explicitly written down (ex: no friction).

There are a bunch more technical details, but I think this is enough to get a conversation going.

smiths · 2021-06-18T04:12:59Z

@JacquesCarette this is a great post. There is plenty to digest here. Once I've digested it, I'd like to try out your definitions by applying them (manually at first) to the existing case studies. I find it easier to see how everything fits together through examples. This will be especially helpful for understanding how a definition is attached to a theory.

balacij · 2021-11-10T00:01:24Z

In the context of #2883, I wonder if an alternative option is to only rewrite the encoding of TMs/GDs/IMs (not DDs) to use "Theory Presentations" under the hood. Then, we could create 2 variants of our SRS documents, as applicable, where TMs/GDs/DDs are additional definitions which add context to the theories for the purpose of SRS documents. I know that @smiths mentioned he had a problem with the "background theories vs refined theories" he prototyped on NoPCM (on the caseStudies repo), so this might be a potential resolution to it, on a case-by-case basis. However, I don't think we discussed the problems yet, so I might be mistaken 😅

EDIT: More directly, this could be our first instance of different SRS-template printers.

Also, since it seems like both of you agree that TMs/GDs/IMs (not DDs) are essentially refined/unrefined theories, should we start editing the code to make them use a new Theory/TheoryPresentation under the hood for now? Of course, we would need to spend time designing something as large as this first!

JacquesCarette · 2021-12-01T20:01:24Z

As we moved forward with our documents, we realized that the very definition of the various concepts (TM/GD/IM) was a bit too loose. At first, that was fine, but eventually we ran into problems. So I think that a rethink of the concepts is the better way to go.

I do like the idea of having multiple templates, with multiple names for things. However our first template should have 'solid' definitions for the concepts, which is currently not the case.

I think we're almost at the design stage. I think @smiths is still doing one example by hand to see how it goes, and that will feed into the design.

smiths · 2021-12-01T20:05:59Z

Yes, I am working on an example for the noPCM SRS.

balacij · 2023-06-27T17:12:18Z

I like the idea @smiths brought up about InstanceModels needing to be concrete enough such that we can generate code. If we try to think of 'concrete enough' using words from theories and refinements, does 'concrete' just mean that the theory can no longer be refined? Or perhaps that any further refinements allowed, do not present more choices (edit: I worded this oddly: I meant that there are no choices and any further refinements are not allowed to present more choices)?

Either way, I think we can intuit something fruitful about refinements and perhaps about choices.

JacquesCarette · 2023-06-28T18:15:38Z

No, 'concrete' does not mean that it can no longer be refined. It only means that it is specific enough that it is possible to give (unique) values to everything under some circumstances.

For example, a theory might have parameters and be considered concrete if we assume that we'll always be operating in an environment where those parameters will have known values. will have is key here. Such a theory is still refinable, as we could choose to specialize a parameter now.

smiths · 2023-06-28T21:50:19Z

Yes, I agree with @JacquesCarette. In the SRS for noPCM we can consider the IM as refined to the point where we can generate code. The code will involve solving an ODE, but we will haven't said what algorithm to use to solve the ODE. Even after the algorithm is selected we can potentially specialize parameters.

balacij · 2023-06-29T13:57:56Z

Hmm, ok, I think that makes sense. To be clear, does this mean that 'concrete' and 'abstract' are terms that are attached only with respect to a particular target language (for us, largely GOOL)?

Is there anything we can say about the allowed refinement types once a theory has become 'concrete'? For example, do they all ensure that the theory is still 'concrete' and that no new 'target language generation choices' appear?

smiths · 2023-06-29T16:00:05Z

@balacij I consider abstract and concrete as relative concepts. I think part of the benefit of Drasil is that we don't have to decide "for all time" whether something is abstract or concrete. We can draw a line at the abstraction level where we want to start and we can say that this is our known starting point. We don't have to justify it or derive it. If in the future we find that it is advantageous to be more general, possibly because being more general (abstract) lets us create something that can be used in another context, then we can move back the starting line; we can define a new point of what is known.

The same applies at the concrete end - the currently called "final theories". The final theories are the point where we say we are done, but we can change that in the future.

I don't think we have any need to identify the absolute "top", the most abstract theory, or an absolute "bottom" of the most concrete theory. We know when one theory is more abstract or concrete than another, but where we start and finish for any particular problem are determined by practical considerations.

JacquesCarette · 2023-06-29T20:55:55Z

Here is an example: is a*x^2+b*x+c a concrete polynomial or not? Well, if we have values for a,b,c it is. But more generally so: in a context where we have 'promises' of values for those 3, we can consider it concrete too.

Similarly, if all we know is that "p is a polynomial" then for sure it isn't concrete. It has neither a degree nor coefficients. "p is a degree 2 polynomial" is also abstract - and different from the above!! This is because before we had a concrete handle (i.e. names) for the coefficients, while here we only know they exist (and there are 3 of them).

So indeed 'concrete' and 'abstract' are relative terms, and we don't want to change that.

And so my take: a theory is "concrete enough" if it is possible to generate code from it. You can't generate code from p.

samm82 · 2024-06-29T00:00:27Z

Since we've been talking about a "little theories" approach, what should we do with this discussion? Is there anything left here that's worth documenting, or does our current discussion render this "obsolete"?

JacquesCarette · 2024-07-08T13:03:27Z

As it is a wide ranging discussion, it seems quite hard to tell. Especially as some of the knowledge in the above is now understood informally without our group -- but has it been written down? I don't know!

smiths · 2024-07-10T03:49:40Z

This issue probably should have been created in the Discussion forum, rather than as an issue. However, given that it is here, I don't think we should close it just yet. Once we start implementing our theory refinements into Drasil, we will likely re-read this issue.

samm82 · 2024-07-10T19:35:29Z

I've re-labelled this issue based on these comments (hopefully correctly!)

smiths added the design Related to the current design of Drasil (not artifacts). label Jun 11, 2021

smiths assigned balacij and JacquesCarette Jun 11, 2021

balacij mentioned this issue Jul 7, 2021

Move Expr's Deriv into DisplayExpr #2628

Closed

Ant13731 mentioned this issue Jul 12, 2021

Adding Documentation for Chunks, Types, Classes, and Instances #2679

Open

6 tasks

cd155 pinned this issue Aug 1, 2021

cd155 unpinned this issue Aug 1, 2021

This was referenced Aug 18, 2021

Continue building UID system #2788

Open

Parameterize QDefinition, reorganize DataDefinitions (for the QDef change), and move Deriv to ModelExpr #2791

Merged

This was referenced Oct 3, 2021

Alternative ModelKinds design + Display Language #2853

Open

Relevant Drasil Code for the GlassBR Example #2856

Closed

balacij mentioned this issue Feb 8, 2022

Should ModelKinds allow users to write an overriding name/term for the underlying theory? #2937

Closed

balacij mentioned this issue Apr 6, 2023

Should TheoryModels use their underlying ModelKinds constructions more for getting the “equation” part of their descriptions? #3333

Open

balacij mentioned this issue Apr 26, 2023

Redesign of TMs #1601

Closed

balacij added the High Priority label Apr 27, 2023

samm82 added traceability Features that allow tracing of provenance blocked and removed High Priority labels Jul 10, 2024

samm82 added the needs-mining There is valuable information in the comments that belongs elsewhere label Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TM versus GD versus IM versus DD Discussion #2599

TM versus GD versus IM versus DD Discussion #2599

smiths commented Jun 11, 2021 •

edited

Loading

JacquesCarette commented Jun 11, 2021

smiths commented Jun 11, 2021

balacij commented Jun 17, 2021

JacquesCarette commented Jun 17, 2021

smiths commented Jun 18, 2021

balacij commented Nov 10, 2021 •

edited

Loading

JacquesCarette commented Dec 1, 2021

smiths commented Dec 1, 2021

balacij commented Jun 27, 2023 •

edited

Loading

JacquesCarette commented Jun 28, 2023

smiths commented Jun 28, 2023

balacij commented Jun 29, 2023

smiths commented Jun 29, 2023

JacquesCarette commented Jun 29, 2023

samm82 commented Jun 29, 2024

JacquesCarette commented Jul 8, 2024

smiths commented Jul 10, 2024

samm82 commented Jul 10, 2024

TM versus GD versus IM versus DD Discussion #2599

TM versus GD versus IM versus DD Discussion #2599

Comments

smiths commented Jun 11, 2021 • edited Loading

JacquesCarette commented Jun 11, 2021

smiths commented Jun 11, 2021

balacij commented Jun 17, 2021

JacquesCarette commented Jun 17, 2021

smiths commented Jun 18, 2021

balacij commented Nov 10, 2021 • edited Loading

JacquesCarette commented Dec 1, 2021

smiths commented Dec 1, 2021

balacij commented Jun 27, 2023 • edited Loading

JacquesCarette commented Jun 28, 2023

smiths commented Jun 28, 2023

balacij commented Jun 29, 2023

smiths commented Jun 29, 2023

JacquesCarette commented Jun 29, 2023

samm82 commented Jun 29, 2024

JacquesCarette commented Jul 8, 2024

smiths commented Jul 10, 2024

samm82 commented Jul 10, 2024

smiths commented Jun 11, 2021 •

edited

Loading

balacij commented Nov 10, 2021 •

edited

Loading

balacij commented Jun 27, 2023 •

edited

Loading