Skip to content

[RFC]: support for structured package data #1147

Open

Description

Description

This RFC proposes adding structured package data to facilitate automation and scaffolding.

Overview

The need for structured package data has been discussed at various points during stdlib development. This need has become more paramount when seeking to automate specialized package generation for packages which wrap "base" packages for use with other data structures. The most prominent example being math/base/special/* APIs which are wrapped to generate a variety of higher-order packages, including

  • math/iter
  • math/strided
  • math/ generics supporting ndarrays, arrays, and scalars

and more recently in work exposing those APIs in spreadsheet contexts. In each context, one needs to

  • specify parameters, including types, names, and descriptions
  • add example values
  • generate random values for benchmarking, examples, and tests
  • specify aliases
  • specify related keywords

and in some contexts

  • create native implementations

While various attempts have been made to automate scaffolding of higher-order packages, where possible, each attempt has relied on manual entry of necessary scaffold data, including parameter names, descriptions, and example values. To date, we have not created a centralized database from which we pull desired package meta data.

Proposal

In this RFC, I propose adding structured meta data to "base" packages. This structured meta data can then be used in various automation contexts, most prominent of which is automated scaffolding.

The meta data would be stored as JSON in a subfield of the __stdlib__ configuration object of package.json files. The choice of JSON stems from the ability to use JSON Schema for validation and linting.

Examples

I've included two examples below.

math/base/ops/add:

{
    "$schema": "math/base@v1.0",
    "base_alias": "add",
    "alias": "add",
    "pkg_desc": "add two double-precision floating-point numbers",
    "desc": "adds two double-precision floating-point numbers",
    "short_desc": "",
    "parameters": [
        {
            "name": "x",
            "desc": "first input value",
            "type": {
                "javascript": "number",
                "jsdoc": "number",
                "c": "double",
                "dtype": "float64"
            },
            "domain": [
                {
                    "min": "-infinity",
                    "max": "infinity"
                }
            ],
            "rand": {
                "prng": "random/base/uniform",
                "parameters": [
                    -10.0,
                    10.0
                ]
            },
            "example_values": [
                -1.2,
                2.0,
                -3.1,
                -4.7,
                5.5,
                6.7
            ]
        },
        {
            "name": "y",
            "desc": "second input value",
            "type": {
                "javascript": "number",
                "jsdoc": "number",
                "c": "double",
                "dtype": "float64"
            },
            "domain": [
                {
                    "min": "-infinity",
                    "max": "infinity"
                }
            ],
            "rand": {
                "prng": "random/base/uniform",
                "parameters": [
                    -10.0,
                    10.0
                ]
            },
            "example_values": [
                3.1,
                -4.2,
                5.0,
                -1.0,
                -2.0,
                6.2
            ]
        }
    ],
    "returns": {
        "desc": "sum",
        "type": {
            "javascript": "number",
            "jsdoc": "number",
            "c": "double",
            "dtype": "float64"
        }
    },
    "keywords": [
        "sum",
        "add",
        "addition",
        "total",
        "summation"
    ],
    "extra_keywords": []
}

stats/base/dists/arcsine/pdf:

{
    "$schema": "stats/base/dists@v1.0",
    "base_alias": "pdf",
    "alias": "pdf",
    "pkg_desc": "arcsine distribution probability description function (PDF)",
    "desc": "evaluates the probability density function (PDF) for an arcsine distribution with parameters `a` (minimum support) and `b` (maximum support)",
    "short_desc": "probability density function (PDF) for an arcsine distribution",
    "parameters": [
        {
            "name": "x",
            "desc": "input value",
            "type": {
                "javascript": "number",
                "jsdoc": "number",
                "c": "double",
                "dtype": "float64"
            },
            "domain": [
                {
                    "min": "-infinity",
                    "max": "infinity"
                }
            ],
            "rand": {
                "prng": "random/base/uniform",
                "parameters": [
                    -10.0,
                    10.0
                ]
            },
            "example_values": [
                2.0,
                5.0,
                0.25,
                1.0,
                -0.5,
                -3.0
            ]
        },
        {
            "name": "a",
            "desc": "minimum support",
            "type": {
                "javascript": "number",
                "jsdoc": "number",
                "c": "double",
                "dtype": "float64"
            },
            "domain": [
                {
                    "min": "-infinity",
                    "max": "infinity"
                }
            ],
            "rand": {
                "prng": "random/base/uniform",
                "parameters": [
                    -10.0,
                    10.0
                ]
            },
            "example_values": [
                0.0,
                3.0,
                -2.5,
                1.0,
                -1.25,
                -5.0
            ]
        },
        {
            "name": "b",
            "desc": "maximum support",
            "type": {
                "javascript": "number",
                "jsdoc": "number",
                "c": "double",
                "dtype": "float64"
            },
            "domain": [
                {
                    "min": "-infinity",
                    "max": "infinity"
                }
            ],
            "rand": {
                "prng": "random/base/uniform",
                "parameters": [
                    10.0,
                    20.0
                ]
            },
            "example_values": [
                3.0,
                7.0,
                2.5,
                2.0,
                10.0,
                -2.0
            ]
        }
    ],
    "returns": {
        "desc": "evaluated PDF",
        "type": {
            "javascript": "number",
            "jsdoc": "number",
            "c": "double",
            "dtype": "float64"
        }
    },
    "keywords": [
        "probability",
        "pdf",
        "arcsine",
        "continuous",
        "univariate"
    ],
    "extra_keywords": []
}

Annotated Overview

{
    // Each configuration object should include the schema name and version so that tooling can gracefully handle migrations and eventual schema evolution:
    "$schema": "math/base@v1.0", // math/base indicates that this schema applies those packages within the math/base namespace. Different namespaces are likely to have different schema needs; hence, the requirement to specify which schema the structured package meta data is expected to conform to.

    // The "base" alias is the alias without, e.g., Hungarian notation prefixes and suffixes:
    "base_alias": "add",

    // The alias is the "base" alias and any additional type information:
    "alias": "add",

    // The package description used in the `package.json` and README:
    "pkg_desc": "add two double-precision floating-point numbers",

    // The description used when documenting JSDoc and REPL.txt files:
    "desc": "adds two double-precision floating-point numbers",

    // A short description which can be used by higher order packages or in other contexts:
    "short_desc": "",

    // A list of API parameters:
    "parameters": [
        {
            // The parameter name as used in API signatures and JSDoc:
            "name": "x",

            // A parameter description:
            "desc": "first input value",

            // Parameter type information as conveyed in various implementation contexts:
            "type": {
                "javascript": "number",
                "jsdoc": "number",
                "c": "double",

                // This field would have more prominence in higher-order APIs, such as those involving ndarrays, where the JavaScript value may be `ndarray`, but we want to ensure we use an ndarray object having a float64 data type:
                "dtype": "float64"
            },

            // The mathematical domain of accepted values (note: this is an array as some math functions have split domains):
            "domain": [
                {
                    "min": "-infinity",
                    "max": "infinity"
                }
            ],

            // Configuration for generating valid random values for this parameter:
            "rand": {
                // A package name for a suitable PRNG:
                "prng": "random/base/uniform",

                // Parameter values to be supplied to the PRNG:
                "parameters": [
                    -10.0,
                    10.0
                ]
            },

            // Concrete values to be used in examples (note: these could possibly be automatically generated according to the `rand` configuration above):
            "example_values": [
                -1.2,
                2.0,
                -3.1,
                -4.7,
                5.5,
                6.7
            ]
        },
        ...
    ],

    // Configuration for the return value (if one exists):
    "returns": {
        // Return value description, as might be used in JSDoc and REPL.txt:
        "desc": "sum",

        // Return value type information:
        "type": {
            "javascript": "number",
            "jsdoc": "number",
            "c": "double",
            "dtype": "float64"
        }
    },

    // A list of keywords without all the boilerplate keywords commonly included in `package.json`:
    "keywords": [
        "sum",
        "add",
        "addition",
        "total",
        "summation"
    ],

    // Additional keywords (e.g., the built-in API equivalent, such as Math.abs):
    "extra_keywords": []
}

Discussion

  • The most prominent risk is that this is yet another place where meta data can drift and something more we need to maintain. While true, I think having structured meta data has benefits which outweigh the additional costs, particularly when we consider how commonly we often wrap "base" functionality as part of higher order APIs. Given that we've had a recurring need for such meta data, we'll eventually need some sort of standardized way of storing this meta data.
  • One benefit of having this structured meta data is that this could better enable AI tools, such as those provided by OpenAI, to scaffold new packages involving "base" implementations.

Related Issues

No.

Questions

  • What other data, if any, should be included?
  • One open question is whether we should include support for constraints? E.g., in the arcsine PDF function a < b. In the example JSON, I've simply manually adjusted the PRNG parameters and the example values to ensure we don't run afoul of that constraint. It was not clear to me how we might include such constraints in a universal way which is machine parseable and actionable in scaffolding tools.
  • Which other package namespaces might benefit from structured meta data and how would their schemas differ from the examples above?
  • The proposal above suggests adding the meta data to package.json files. This could lead to bloat in the package.json files. Another possibility is putting such info in a separate .stdlibrc file in the root package directory. Would this be preferrable?

Other

No.

cc @Planeshifter

Checklist

  • I have read and understood the Code of Conduct.
  • Searched for existing issues and pull requests.
  • The issue name begins with RFC:.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    FeatureIssue or pull request for adding a new feature.Needs DiscussionNeeds further discussion.RFCRequest for comments. Feature requests and proposed changes.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions