Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose convert recipes to sqlite-utils --functions #484

Open
simonw opened this issue Sep 6, 2022 · 11 comments
Open

Expose convert recipes to sqlite-utils --functions #484

simonw opened this issue Sep 6, 2022 · 11 comments
Labels
cli-tool enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Sep 6, 2022

--functions was added in:

It would be useful if the r.jsonsplit() and similar recipes for sqlite-utils convert could be used in these blocks of code too: https://sqlite-utils.datasette.io/en/stable/cli.html#sqlite-utils-convert-recipes

@simonw simonw added enhancement New feature or request cli-tool labels Sep 6, 2022
@simonw
Copy link
Owner Author

simonw commented Sep 6, 2022

Here's the implementation for recipes at the moment:

def _compile_code(code, imports, variable="value"):
globals = {"r": recipes, "recipes": recipes}
# If user defined a convert() function, return that
try:
exec(code, globals)
return globals["convert"]
except (AttributeError, SyntaxError, NameError, KeyError, TypeError):
pass

And here's the --functions implementation that doesn't expose them:

def _register_functions(db, functions):
# Register any Python functions as SQL functions:
sqlite3.enable_callback_tracebacks(True)
globals = {}
try:
exec(functions, globals)
except SyntaxError as ex:
raise click.ClickException("Error in functions definition: {}".format(ex))
# Register all callables in the locals dict:
for name, value in globals.items():
if callable(value) and not name.startswith("_"):
db.register_function(value, name=name)

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

Will also need to update documentation here: https://sqlite-utils.datasette.io/en/stable/cli.html#defining-custom-sql-functions

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

This feature is a tiny bit weird though: the recipe functions are not exposed to SQL by default, they are instead designed to be used with sqlite-utils convert. I guess with --functions support you could do something like this:

sqlite-utils data.db "update mytable set col1 = parsedate(col1)" --functions "parsedate = r.parsedate"

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

It's not quite that simple. I tried applying this patch:

diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index c51b101..33e4d90 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -30,6 +30,7 @@ from .utils import (
     Format,
     TypeTracker,
 )
+from . import recipes
 
 
 CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
@@ -3029,7 +3030,7 @@ def _load_extensions(db, load_extension):
 def _register_functions(db, functions):
     # Register any Python functions as SQL functions:
     sqlite3.enable_callback_tracebacks(True)
-    globals = {}
+    globals = {"r": recipes, "recipes": recipes}
     try:
         exec(functions, globals)
     except SyntaxError as ex:

Then got this:

% sqlite-utils memory --functions 'parsedate = r.parsedate' 'select parsedate("1st jan")'
Error: wrong number of arguments to function parsedate()
% sqlite-utils memory --functions 'parsedate = r.parsedate' 'select parsedate("1st jan", 0, 0, 0)' 
[{"parsedate(\"1st jan\", 0, 0, 0)": "2022-01-01"}]

The problem here is that the parsedate function signature looks like this:

def parsedate(value, dayfirst=False, yearfirst=False, errors=None):

But the code that register SQL functions introspects that signature, so creates a SQL function that requires four arguments.

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

So you would need to do this instead:

sqlite-utils memory 'select parsedate("1st jan")' --functions '
def parsedate(s):
    return r.parsedate(s)
'

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

I could teach this code here to only register the function using arguments that don't have default parameters:

# Register all callables in the locals dict:
for name, value in globals.items():
if callable(value) and not name.startswith("_"):
db.register_function(value, name=name)

Or even this code here:

def register(fn):
fn_name = name or fn.__name__
arity = len(inspect.signature(fn).parameters)
if not replace and (fn_name, arity) in self._registered_functions:
return fn
kwargs = {}
registered = False
if deterministic:
# Try this, but fall back if sqlite3.NotSupportedError
try:
self.conn.create_function(
fn_name, arity, fn, **dict(kwargs, deterministic=True)
)
registered = True
except (sqlite3.NotSupportedError, TypeError):
# TypeError is Python 3.7 "function takes at most 3 arguments"
pass
if not registered:
self.conn.create_function(fn_name, arity, fn, **kwargs)
self._registered_functions.add((fn_name, arity))
return fn

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

That would be a breaking change though - existing code that registers functions with default parameters should continue to work unchanged (unless I want to ship sqlite-utils 4.0).

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

I could do this:

 # Register all callables in the locals dict: 
 for name, value in globals.items(): 
     if callable(value) and not name.startswith("_"): 
         db.register_function(value, name=name, ignore_params_with_defaults=True) 

Introducing a new ignore_params_with_defaults option.

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

Here's how to detect defaults in the function signature:

>>> import inspect
>>> def foo(a, b, c=1, d=2):
...      pass
... 
>>> inspect.signature(foo)
<Signature (a, b, c=1, d=2)>
>>> inspect.signature(foo).parameters
mappingproxy(OrderedDict([('a', <Parameter "a">), ('b', <Parameter "b">), ('c', <Parameter "c=1">), ('d', <Parameter "d=2">)]))
>>> inspect.signature(foo).parameters['c']
<Parameter "c=1">
>>> dir(inspect.signature(foo).parameters['c'])
['KEYWORD_ONLY', 'POSITIONAL_ONLY', 'POSITIONAL_OR_KEYWORD', 'VAR_KEYWORD', 'VAR_POSITIONAL', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_annotation', '_default', '_kind', '_name', 'annotation', 'default', 'empty', 'kind', 'name', 'replace']
>>> inspect.signature(foo).parameters['c'].default
1
>>> inspect.signature(foo).parameters['a'].default
<class 'inspect._empty'>

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

OK with this:

diff --git a/sqlite_utils/cli.py b/sqlite_utils/cli.py
index c51b101..93d82a9 100644
--- a/sqlite_utils/cli.py
+++ b/sqlite_utils/cli.py
@@ -30,6 +30,7 @@ from .utils import (
     Format,
     TypeTracker,
 )
+from . import recipes
 
 
 CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
@@ -3029,7 +3030,7 @@ def _load_extensions(db, load_extension):
 def _register_functions(db, functions):
     # Register any Python functions as SQL functions:
     sqlite3.enable_callback_tracebacks(True)
-    globals = {}
+    globals = {"r": recipes, "recipes": recipes}
     try:
         exec(functions, globals)
     except SyntaxError as ex:
@@ -3037,4 +3038,4 @@ def _register_functions(db, functions):
     # Register all callables in the locals dict:
     for name, value in globals.items():
         if callable(value) and not name.startswith("_"):
-            db.register_function(value, name=name)
+            db.register_function(value, name=name, ignore_defaults=True)
diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 27c46b0..1407d23 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -370,6 +370,7 @@ class Database:
         self,
         fn: Callable = None,
         deterministic: bool = False,
+        ignore_defaults: bool = False,
         replace: bool = False,
         name: Optional[str] = None,
     ):
@@ -397,7 +398,10 @@ class Database:
 
         def register(fn):
             fn_name = name or fn.__name__
-            arity = len(inspect.signature(fn).parameters)
+            params = inspect.signature(fn).parameters
+            if ignore_defaults:
+                params = [p for p in params if params[p].default is inspect._empty]
+            arity = len(params)
             if not replace and (fn_name, arity) in self._registered_functions:
                 return fn
             kwargs = {}

I can now do this:

% sqlite-utils memory --functions 'parsedate = r.parsedate' 'select parsedate("1st jan")'
[{"parsedate(\"1st jan\")": "2022-01-01"}]

@simonw
Copy link
Owner Author

simonw commented Sep 7, 2022

Or... I could automatically register multiple copies of the function of different arities!

If I'm going to do something like that though I need to think carefully about how functions that have keyword-only arguments should work: https://peps.python.org/pep-3102/

def compare(a, b, *ignore, key=None):
    ...

I should think about how these work with db.register_function() anyway, since SQL functions cannot support keyword arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli-tool enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant