Skip to content

feat: support for re-usable global functions in workflows #894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jun 15, 2025

Conversation

doc-han
Copy link
Collaborator

@doc-han doc-han commented Mar 5, 2025

Short Description

Introduces support for re-usable global functions in workflows

Fixes #648

Implementation Details

Runtime

  1. A workflow.json can now have the field workflow.globals which contains some inline code with exported functions.
  2. Whenever operations in a step are going to be executed, we load these global functions into the vm.
  3. Hence, these functions after being defined in the functions field can be used in any step or job code without needing to import them. They become globally available

CLI

  1. The CLI when passed a workflow.json file allows you to specify a file path instead of inline code(you can have inline too) and then loads the code at the provided filePath before running the code using the runtime.

QA Notes

  1. Using the CLI with a workflow.json file, pass some some inline code for a filePath to code at workflow.globals.
  2. Use the defined functions in job code without importing them.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

Release branch checklist

Delete this section if this is not a release PR.

If this IS a release branch:

  • Run pnpm changeset version from root to bump versions
  • Run pnpm install
  • Commit the new version numbers
  • Run pnpm changeset tag to generate tags
  • Push tags git push --tags

Tags may need updating if commits come in after the tags are first generated.

@github-project-automation github-project-automation bot moved this to New Issues in v2 Mar 5, 2025
@doc-han doc-han requested a review from josephjclark March 5, 2025 18:11
Copy link
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work as always @doc-han

I'd like to run more tests tomorrow, but here's a few quick thoughts!

@theroinaochieng theroinaochieng moved this from New Issues to DevX Backlog in v2 Mar 6, 2025
@josephjclark
Copy link
Collaborator

I would like to see a lot more testing around the globals. Probably in the runtime package but maybe in integration-tests/execute too.

I've been doing local testing like this:

  • What if globals throws an error?
  • What if globals has no exports?
  • Can I export arrows and functions? ie export const x = () => {} and export function x() {}
  • Can I do export default { x } (presumably the default export is ignored?)
  • Can I export constants that aren't functions? (yes, of course we can, but I want to see a test!)
  • Can I export my own operation-like function? (yes, of course we can, but I want to see a test!)
  • What happens if I try to import (maybe hard to unit test this?)
  • What happens if I try to require
  • Can I use globals like Math, console, atob, decodeURIComponent, etc?
  • Can I use eval? Fetch? (I think eval should be banned but fetch seems OK? Or does that open a door to abuse?)
  • If I don't export a variable, presumably I cannot read it in the job? Ie function x() { } is "private" to the globals module?

And I'd like to see that sort of thing reflected in unit tests somewhere.

@josephjclark
Copy link
Collaborator

josephjclark commented Mar 10, 2025

@doc-han this is fantastic. What we need now is:

  • Update the # workflows section of cli/readme.md
  • Main Docs. I'll take care of this. Global Functions docs#644
  • Add support to the worker. I know it'll take Lightning some time to catch up, but I'd love to cleanly finish the work here. All we need to do is feed the globals key in from the incoming Run object in convert-lightning-plan.ts, with a couple of tests.
  • properly pass logger to globals. so that console.log would stop erroring
  • Add more tests per my notes above
  • Add a changeset

And some important notes:

Operations

My biggest worry is this:

A user declares a cool hlper function

export const coolHelperFunction = () => {
	// do stuff
}

They call it from their job code:

coolHelperFunction()

And then... EXCEPTION!!

[R/T] ✘ Error reported by "coolHelperFunction()" operation line 1:
[R/T] ✘ TypeError: fn is not a function

At least the new error reporting stuff should give a clue. But why did this fail? What's this fn stuff??

The problem of course is that coolHelperFunction is not an operation and can't be called from the top level of job code.

This gets us into explaining Operations and a whole level of details I don't want to engage users with (this is a huge tension in the docs right now).

We can't lint for this stuff. It's perfectly legit to export a non operation. It's also cool to declare your own operation. It's just the usage in job code that's wong.

Solutions?

  • Be better at non-operation top-level calls in the compiler. This problem is probably made harder by globals tbh.
  • Force all helpers to be operations and do some magic wrapping ourselves. I'm not really interested in this.
  • Provide a helper function called operation or declareOperation and tell users to call that:
declareOperation(name, (state, arg1, arg2) => {
	// do what you want here
})

But I mean, that's not terribly clean for users. And because it's dynamic it might be a bit harder to make this even work? We'd have to dynamically inject the globals right?

Can we detect this particular case and raise a better error or warning? I don't really know where we'd do that...

Validation

There's quite a lot that can go wrong with globals, including:

  • User tries to import/require a module
  • User tries to override an adaptor function

I think all these can be validated at compile time. So we should run the compiler and throw if we detect any major problems.

Also, maybe we should have a rule like: globals are immutable, so you're not allowed to export any let or var variables. Consts and functions only.

We should probably create a new issue for this and deal with it later.

Can I implement a whole adaptor in globals?

Kinda yeah.

Is that a problem? It feels like abuse of the feature, but I also don't see a problem. It might help people out.

But! The big thing here is that, unlike unlike adpators, you cannot import anything into your globals. You can't use npm stuff.


// FIXME: when expression isn't a string, additional stuff isn't loaded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to look at this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. expression in this scope is typed string | Operation[].
It seems Operation[] type is only reached via tests. CLI always has expression being a string.

If this assumption is accurate and expression in runtime is approx. always a string, then there's no problem.
Here's the condition that loads-module only when it's a string.

if (typeof expression === 'string') {
const exports = await loadModule(expression, {
...opts.linker,
// allow module paths and versions to be overriden from the defaults
modules: mergeLinkerOptions(opts.linker?.modules, moduleOverrides),
context,
log: opts.logger,
});
const operations = exports.default;
return {
operations,
...exports,
} as JobModule;
} else {
if (opts.forceSandbox) {
throw new InputError('Invalid arguments: jobs must be strings');
}
return { operations: expression as Operation[] };
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yes, usually the expression comes in as a string, but for some unit tests we do pass functions straight in.

it's a fair constraint that if you're passing a function directly, we don't do any environment preparation. We don't load globals, we don't load a sandbox.

I don't want to leave this comment here. It's not meaningul.

  • We could move it to the prepareJob declaration. It's not a fixme, but it's correct docs that if you pass a live function, it'll bypass the sandbox, globals etc.
  • We could put out a warning in the logger. Given that it only affects tests, I don't think this is neccessary

Let's move it to prepareJob and remove the FIXME

@github-project-automation github-project-automation bot moved this from DevX Backlog to In review in v2 Mar 10, 2025
@doc-han
Copy link
Collaborator Author

doc-han commented Mar 17, 2025

  • Can I use eval? Fetch? (I think eval should be banned but fetch seems OK? Or does that open a door to abuse?)

I believe Fetch should also be banned. so that all fetch calls go through our adaptors. The globals to me sound like helpful utils for doing internal work. anything external has to go through adaptors.

@josephjclark
Copy link
Collaborator

I believe Fetch should also be banned. so that all fetch calls go through our adaptors. The globals to me sound like helpful utils for doing internal work. anything external has to go through adaptors.

I am happy to make that a constraint, at least for now. execute/context/buildContext() doesn't allow fetch right now anyway - that's completely consistent with job code.

I'll update the docs PR to make this clear


// FIXME: when expression isn't a string, additional stuff isn't loaded
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yes, usually the expression comes in as a string, but for some unit tests we do pass functions straight in.

it's a fair constraint that if you're passing a function directly, we don't do any environment preparation. We don't load globals, we don't load a sandbox.

I don't want to leave this comment here. It's not meaningul.

  • We could move it to the prepareJob declaration. It's not a fixme, but it's correct docs that if you pass a live function, it'll bypass the sandbox, globals etc.
  • We could put out a warning in the logger. Given that it only affects tests, I don't think this is neccessary

Let's move it to prepareJob and remove the FIXME

@doc-han
Copy link
Collaborator Author

doc-han commented Mar 17, 2025

  1. User tries to import/require a module
    https://nodejs.org/api/vm.html#example-running-an-http-server-within-a-vm
    https://nodejs.org/api/vm.html#when-importmoduledynamically-is-vmconstantsuse_main_context_default_loader
    Require and imports aren't available to the vm by default.

  2. Can I use globals like Math, console, atob, decodeURIComponent, etc?
    Since we're now using buildContext which is also used for job code. all environment globals available in job code is also available in globals.

@josephjclark
Copy link
Collaborator

TODO: the compiler needs to know what global functions are available so that it can not try to associate them with with the adaptor

@doc-han
Copy link
Collaborator Author

doc-han commented Apr 9, 2025

Way to go:
Have a compile step for the globals file or source, extract all exported identifiers and then add them to the ignore list for the compilation step for adding imports. This will ensure that our global functions aren't associated to any adaptor

@josephjclark
Copy link
Collaborator

See #925

@josephjclark
Copy link
Collaborator

Just a thought: because we've changed the workflow.json structure, CLI deploy might start breaking. Need to test this.

@josephjclark
Copy link
Collaborator

Another thought from Stu: how do we expose the globals to the type system for code assist?

@doc-han
Copy link
Collaborator Author

doc-han commented May 30, 2025

This is looking tricky at the moment.

  1. we need to know the exported modules/functions from global functions (runtime) using loadModule
  2. we need to pass this to the compiler when jobs are being compiled (compiler)

Compilation happens way before runtime things happen.

OR

we do static analysis to extract the exported identifiers and then pass to the compilation step for jobs. Hence, global functions have to always be compiled first before any job is compiled.

cc: @josephjclark

@doc-han
Copy link
Collaborator Author

doc-han commented May 30, 2025

Or we export module-loader from runtime. with this we break the order in CLI.
compile -> runtime.

We'll now have runtime to get exported modules for global functions. Then compile for jobs -> runtime

@doc-han
Copy link
Collaborator Author

doc-han commented May 30, 2025

Possible collision between user defined global function identifiers & exported adaptor methods.
when collision happens. user defined will pre-empt since we tell compiler to ignore from adaptor imports.

@josephjclark
Copy link
Collaborator

@doc-han let's talk about this on Monday!

@josephjclark
Copy link
Collaborator

From our meeting:

  1. We use static analysis in the compiler to find the exports from globals
  2. Focus on a spike/draft for now, try and get that done before you break.

@josephjclark
Copy link
Collaborator

Yes to --globals path/to/file in CLI

eg. openfn --globals ./path-to-globals.js
@doc-han
Copy link
Collaborator Author

doc-han commented Jun 4, 2025

cli argument --globals added.
when this argument is used while there's another user of globals in a workflow.json, the one in the cli argument takes precedence.

@josephjclark
Copy link
Collaborator

TODO: review docs OpenFn/docs#644

Copy link
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - just one question about a test. I'm likely to release this as-is though - I've run some basic tests and it looks fine

@@ -244,3 +244,37 @@ export default [each(
const { code: result } = compile(source);
t.is(result, expected);
});

// TODO: this is error prone and shouldn't be desired
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this error prone? Do we need to take a look at this test?

t.deepEqual(res, ['lightning', 'compiler']);
});

// TODO: do we want to pick up default exports?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's ignore export default for now

@josephjclark josephjclark merged commit 24fde7a into main Jun 15, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in v2 Jun 15, 2025
@josephjclark josephjclark deleted the farhan/re-usable-functions-across-workflow branch June 15, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Runtime: allow re-usable functions across a workflow
2 participants