Reopening the Problem Specifications repo

**TL;DR;** @exercism/track-maintainers We are planning on reopening problem specs in the next couple of weeks. The key change we are making is that problem-specs should be thought of as set of optional test cases. Individual tracks should choose which test cases to implement on a per-test case basis. We are adding a simple, per-exercise configuration file that keeps track of which test cases the exercise implements. We'll provide a simple mechanism for tracks to keep this file up-to-date with the test cases in the exercise's canonical data. The only change required for track maintainers is to update their test-generators to generate tests for just the test cases enabled in the configuration file. All test cases will be immutable; changes can only be introduced by adding new test cases, which can be flagged as re-implementing an existing test case.

---

A year ago we temporarily put the Problem Specifications repo in bug-fixes only mode (see this issue). 
This was done as there were systematic problems with the Problem Specifications repo that were causing conflict between maintainers, which we did not have the resources to fix in the immediate term.
Now that we're making good progress on Exercism v3, we have been able to dedicate time to designing a solution to the problems in the Problem Specifications repo. 

In this issue I will outline the problems that we're trying to solve and the solution we have come up with. 
We are very excited to re-open the Problem Specifications repo, which has always been one of the most active parts of Exercism and has brought a great deal of joy to many, and how these changes will allow us to do that. 
Let's get to it! 🎉

## Intro to Problem Specifications

The Problem Specifications repo describes a set of language-agnostic exercises, which maintainers can use to build exercises for their tracks. Over time, exercises have grown to not only contain shared titles and descriptions, but also canonical data that describes a set of test cases for the exercise. This setup has been very helpful for maintainers for tasks such as adding new exercises to tracks, updating existing existing exercises as well as bootstrapping entirely new tracks.

As the canonical data is defined as JSON data, some tracks have built tooling to automatically generate test suites from them. This has allowed these tracks to quickly scaffold test suites for new exercises, as well as fix bugs in existing (generated) test suites.

Unfortunately, as different tracks (and different maintainers) have different desires for how their tests are structured and written, this has caused tension. In this document, we'll describe what has caused this tension and how we aim to resolve this tension.

## Issues with Problem Specifications

Issues with the Problem Specifications repo occur when there is a 1-to-1 correspondence between an exercise defined in Problem Specifications and a track implementation of that exercise. In other words, if changes to the exercise data in Problem Specifications result in changes to a track's implementation, there can be trouble.

### README

The `README.md` file for track exercises is generated directly from the Problem Specifications exercise's `description.md` and `metadata.yml` files, using the [`configlet` tool](https://github.com/exercism/configlet).

An example of how this can be problematic is the word "promises", with some languages instead using the word "futures" to describe the same concept.

### Canonical data

Most of the exercises in the Problem Specifications repo are implemented in multiple tracks. This means that those tracks implement the _same_ exercise using the _same_ canonical data.

As an example for how this can be problematic, consider one track wanting to add a test case that uses Unicode strings as input, which another track would _not_ want to include in their track due to increased complexity for their students. Another example is when one track wants to only have tests for the public interface of the exercise, whereas another tracks wants to have tests for the private or lower-level interface of the exercise.

This issue is compounded even further when the track exercise's test suite is generated directly from a Problem Specifications exercise's `canonical-data.json` file, as maintainers often are reluctant to do a pre- or post-processing in their generators.

Note that not all types of changes to canonical data are problematic. We can discern the following four types of changes:

| Type of change                            | Problematic | Example                                                                                                    |
| ----------------------------------------- | ----------- | ---------------------------------------------------------------------------------------------------------- |
| Add test case                             | Possibly    | [Link](https://github.com/exercism/problem-specifications/commit/c5a1b8b75fb7337416abf4dd4b16d259a63c3f7b) |
| Remove test case                          | Unlikely    | [Link](https://github.com/exercism/problem-specifications/commit/4c453c8b9c58a8b68828d7e44c95bfcfc754507e) |
| Update test case description              | Rarely      | [Link](https://github.com/exercism/problem-specifications/commit/1add7644b75fa26e524668eb1013d53e330f143c) |
| Update test case input or expected output | Possibly    | [Link](https://github.com/exercism/problem-specifications/commit/f375a46c1e3d50ecd0347ab1dfc8bbdafc8817c7) |

As can be seen, not all types of changes are equally problematic. The most problematic changes are when either:

- A test case's input or expected output changes (possibly breaking test generators)
- A test case's focus changes (e.g. no longer using unicode in the input)
- A new test is added that changes the scope of the exercise

Of course, not _all_ changes are problematic, and many changes are in fact what one could consider "bug fixes", where the expected value does not match what the instructions specify (this could also be due to invalid instructions though). Determining whether something is a mere "bug fix" or something more substantial has proved difficult though, with opinions varying depending on how a track uses a specific test case.

To prevent breaking changes, the canonical data is currently versioned using SemVer, so in theory test generators could use this to do "safe" updates. In practice though, most test generators always use the latest version to generate their exercises.

## Redefining Problem Specifications

The first step in solving these issues is defining a clear guiding rule for the Problem Specifications repository:

**Problem Specifications exists to cover the "happy path". That means it should work for _most_ tracks most of the time. In situations that aren't in the "happy path", it is down to individual tracks to solve those issues locally.**

### Application/library tests

- Exercises _must_ contain tests that cover the public interface of the exercise (also thought of as "application tests").
- Exercises _may_ contain tests that cover the private or lower-level interface of the exercise (sometimes refered to as "library tests").

### Canonical data

- Each test case will have a UUID to unique identify it (which we will populate via a script for existing tests).
- All test cases should be considered optional, insomuch that a track should determine the valid/useful test cases for their language.
- Tracks can use the UUIDs to include/exclude specific test cases. Our recommendation is to explicitly _include_ tests, as that is least likely to break your test suite.
- Test cases will be immutable, which means that once a test case has been added, it never changes.

### Changing test cases

As test cases will be immutable, one cannot change an existing test case and thus a new test case must be added. This has several nice consequences:

- If the test generator uses an allowlist of test case UUIDs to select which test cases to use to generate the test suites from (which is our recommendation), the generated test suites will _never_ change nor will test generators break.
- There is no longer any discussion on if a change could break existing tracks, as tracks should use an allowlist and thus new addition should not result in any changes.
- There is no longer any discussion whether a change is a patch, minor or major update.
- We no longer need the versioning of the canonical data.

Let's look at an example. Suppose we have the following test case:

```json
{
  "uuid": "e46c542b-31fc-4506-bcae-6b62b3268537",
  "description": "two times one is two",
  "property": "twice",
  "input": {
    "number": 1
  },
  "expected": 3
}
```

We found a mistake in the expected value, but with tests being immutable, we'll have to add a new test case (with a new UUID):

```json
[
  {
    "uuid": "e46c542b-31fc-4506-bcae-6b62b3268537",
    "description": "two times one is two",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 3
  },
  {
    "uuid": "82d32c2e-07b5-42d9-9b1c-19af72bae860",
    "description": "two times one is two",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 2
  }
]
```

Some additional consequences of this approach:

- As test cases never change, "bug fixes" cannot be automatically applied by re-running the exercise's test generator; tracks have to explicitly change the UUID they use from the old test case to the new test case. We'll get to possible automation options to help with this later.
- There are now two test cases with the same description, but different UUIDs, so how does one know which test case to use? For this, we'll use two things:

  1. We'll add an `"reimplements"` field which must contain the UUID of the test case it is re-implementing. Note that we haven't named this field `"fixes"` or `"supersedes"` or something like that, as re-implemented test cases might actually not make sense for every track. This field can be omitted for test-cases that don't reimplement an old test.
  2. Use the (existing) `"comments"` field to explain why a test case was re-implemented (e.g. `"Expected value is changed to 2"`).

  While tracks _could_ automatically select the "latest" version of a test case by looking at the "reimplements" hierarchy, we recommend each track to make this a manual action.

  With the above suggested changes, the test cases now look like this:

```json
[
  {
    "uuid": "e46c542b-31fc-4506-bcae-6b62b3268537",
    "description": "two times one is two",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 3
  },
  {
    "uuid": "82d32c2e-07b5-42d9-9b1c-19af72bae860",
    "description": "two times one is two",
    "comments": ["Expected value is changed to 2"],
    "reimplements": "e46c542b-31fc-4506-bcae-6b62b3268537",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 2
  }
]
```

- The `comments` field _can_ be mutated, as it is merely metadata that won't be reflected in the tests. Changing it thus does _not_ mean adding a new test case.

This change was guided in part us feeling that canonical data test cases actually don't change _that_ often, so it would be a relatively minor inconvenience.

### Scenarios

- As test cases are now optional, we'll remove the `optional` field.
- To allow for selectively including/excluding test cases based on a property of the test case (such as using big integers), we'll add an optional `scenarios` field.
- The `scenarios` field can use one or more of a predefined set of values, which are defined in a `SCENARIOS.txt` file.
- The `scenarios` field _can_ be mutated additively, by adding new scenarios. Existing scenarios must not be changed or removed. Adding new scenarios does therefore does _not_ mean adding a new test case.
- We should add commonly used scenarios to existing test cases.
- Library tests will have a `library-test` scenario added to allow for easy including/excluding of library tests. Application tests _won't_ have their own scenario, as they _must_ be included and should not be filtered on.

```json
{
  "uuid": "25a8463d-560f-4a42-9be7-b79d00cd28a4",
  "description": "64",
  "property": "square",
  "input": {
    "square": 64
  },
  "expected": 9223372036854775808,
  "scenarios": ["big-integers"]
}
```

### Indicate which test cases have been implemented

- If a track's exercise is based on canonical data from the Problem Specifications repo, the exercise should contain a `.meta/tests.toml` file.
- The goal of this file is to keep track of which tests from the canonical data this exercise implements. As such, it should list the UUIDs of all the test cases in the canonical data this exercise is based on, and for each test case indicate if the exercise implements it or not (`true` or `false`).

```toml
[canonical-tests]

# no name given
"19709124-b82e-4e86-a722-9e5c5ebf3952" = true

# a name given
"3451eebd-123f-4256-b667-7b109affce32" = true

# another name given
"653611c6-be9f-4935-ab42-978e25fe9a10" = false
```

The advantages to having this file are:

1. It makes it very explicit which test cases are implemented. This is both useful for manual _and_ automatically generated tests.
1. It is ideal for automation purposes.

We can script-prepopulate the initial creation of these `.meta/tests.toml` files across all tracks, initially setting all test cases to `true`.

### Notifications

The `.meta/tests.toml` file allows us to detect which test cases an exercise implements. This gives us the ability to tackle one of the downsides of using the Problem Specifications repository as the source of data: not knowing when canonical data has been changed.

Tracks that implemented an exercise using canonical data as its basis, basically had two options to check if canonical data was changed:

1. They could keep a close eye on the Problem Specifications repository to see what files were being changed. This requires quite some effort and it is easy to miss things.
1. If exercises were created by a test generator, a track could re-run the test generator and see if the test suites would change.

Both options are clearly not ideal.

Using the `.meta/tests.toml` files, we'll be building a GitHub action that serves as a notification bot. It will regularly check the track's `.meta/tests.toml` files' contents against the test cases defined in Problem Specification repo's `canonical-data.json` files.

Based on this diff, it could automatically do things like post an issue to your track's repository if there are any relevant changes, which could look like this:

> ## This is your weekly problem-specification update
>
> **Summary:** There are 20 new tests, and 5 existing tests with updates.
>
> According to your track's [`config.json`](/link/to/config) file, the following exercises are implemented with [problem specification changes](/link/to/commit/range/on/ps):
>
> - [ ] [`two-fer`](/link/to/two-fer/README.md) ([local](/link/to/two-fer/tests.toml) | [upstream](/link/to/p-s/canonical/tests.toml))
> - [ ] [`bob`](/link/to/bob/README.md) ([local](/link/to/two-fer/tests.toml) | [upstream](/link/to/p-s/canonical/tests.toml))
> - [ ] [`resistor-color`](/link/to/resistor-color/README.md) ([local](/link/to/resistor-color/tests.toml) | [upstream](/link/to/p-s/canonical/tests.toml))

We could even create individual issues per exercise, but the most important things is that we _can_ do this type of automation when tracks use the `.meta/tests.toml` file.

### Tooling

- We'll add a new `sync` command to [configlet](https://github.com/exercism/configlet) to help work with `.meta/tests.toml` files.
- The `sync` command can be used to detect if there are exercises which `.meta/tests.toml` file is missing test cases that _are_ in its canonical data.
- The `sync` command can scaffold a new `.meta/tests.toml` file for an exercise by interactively giving the maintainer the option to set a test case's status to `true` or `false`.
- The `sync` command can update an existing `.meta/tests.toml` file by interactively giving the maintainer the option to set a test case's status to `true` or `false`.

- We'll add a new tool to allow easy formatting of the JSON files in the Problem Specifications repo.
- This new tool will be able to verify the correct of the Problem Specifications repo, replacing the existing bash scripts.

### README

- Descriptions of an exercise should strive to use language-agnostic terminology, when possible.
- Currently, `README.md` files contain both the problem's story as well as the instructions on what is expected of the student. In the future, we'd like to replace the `README.md` file with two separate files: `story.md` and `instructions.md`. This is similar to how the v3 Concept Exercises' documentation is split up over multiple files.

### Formatting

- We should reduce churn on Pull Requests due to formatting of JSON, Markdown and Yaml files in the Problem Specifications repo by providing the contributor ways in which to automatically format these files.
- We should provide both a standalone binary as well as GitHub integration to easily format the source files.

## Where to go from here?

We’re really excited about re-opening Problem Specifications. We feel like these changes should fix the issues that caused the repo to be locked, but not add any meaningful extra burden on maintainers.

If there is general consensus with this approach we will action these changes and re-open Problem Specifications as soon as possible. I have already prototyped much of the tooling but still have to do some work to get it finished. We also need to add documentation to Problem Specs, and make PRs both to this repo for the UUIDs, and the new files in the track repos. My plan is to submit PR’s for these changes over the next two weeks, and then reopen this repository by mid-October. 

Once the above changes have been merged, we can start accepting PR's again (although some of them probably need some little tweaking to correspond to the new format).

Thanks to everyone for being patient with us over this past year! I look forward to hearing your thoughts on all these changes :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reopening the Problem Specifications repo #1674

Intro to Problem Specifications

Issues with Problem Specifications

README

Canonical data

Redefining Problem Specifications

Application/library tests

Canonical data

Changing test cases

Scenarios

Indicate which test cases have been implemented

Notifications

This is your weekly problem-specification update

Tooling

README

Formatting

Where to go from here?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type of change	Problematic	Example
Add test case	Possibly	Link
Remove test case	Unlikely	Link
Update test case description	Rarely	Link
Update test case input or expected output	Possibly	Link

Uh oh!

Reopening the Problem Specifications repo #1674

Description

Intro to Problem Specifications

Issues with Problem Specifications

README

Canonical data

Redefining Problem Specifications

Application/library tests

Canonical data

Changing test cases

Scenarios

Indicate which test cases have been implemented

Notifications

This is your weekly problem-specification update

Tooling

README

Formatting

Where to go from here?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions