Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema Rewrite #651

Merged

Conversation

riedgar-ms
Copy link
Collaborator

@riedgar-ms riedgar-ms commented Feb 22, 2024

Rewriting the JSON Schema generator to be more remote from the actual grammar. @hudson-ai provided the code which allowed recursive structures (such as linked lists) to be created.

@riedgar-ms
Copy link
Collaborator Author

Tagging @hudson-ai for interest

@hudson-ai
Copy link
Collaborator

Are you open to including enum types and objects with additionalProperties in this PR, or should those be saved for the next one?

@hudson-ai
Copy link
Collaborator

May I also suggest changing collections.abc.Mapping[str, any], to collections.abc.Mapping[str, Any]? any is not a type, but Any from typing is :)

@hudson-ai
Copy link
Collaborator

Also for 3.8, Mapping and MutableMapping should be imported from typing if you want to subscript them.

@riedgar-ms
Copy link
Collaborator Author

@hudson-ai thanks for the Python 3.8 warning :-/

Comment on lines 6 to 10
from ._char_range import char_range
from ._one_or_more import one_or_more
from ._optional import optional
from ._zero_or_more import zero_or_more

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pull these imports straight from the guidance.library namespace instead of pulling through the private files (https://github.com/guidance-ai/guidance/blob/main/guidance/library/__init__.py)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked a bit. The _grammar import isn't working, though

@hudson-ai
Copy link
Collaborator

A bit annoying, but I recommend something like this for the Mapping and MutableMapping interfaces:

import sys
if sys.version_info >= (3, 9, 0):
    from collections.abc import Mapping, MutableMapping
else:
    from typing import Mapping, MutableMapping

This is because as of 3.9, typing's versions are deprecated aliases to the abc ones and will be removed in future versions of python -- see https://docs.python.org/3/library/typing.html#typing.Mapping

@riedgar-ms
Copy link
Collaborator Author

@slundberg do you have thoughts on this? @hudson-ai has submitted a PR to my fork which avoids the problem by using a Python closure to do lazy evaluation. However, I suspect that that is not going to serialise well. I have a feeling that the previous tweak proposed for Placeholder is going to be what is needed.

Can you comment on this?

Also, do you want to wait for that to be sorted before merging this? I could make my test case be an XFAIL while the exact update for Placeholder is figured out. If you are concerned about people picking up and using my wrong initial implementation, then this does need to merge reasonably quickly.

@hudson-ai
Copy link
Collaborator

hudson-ai commented Feb 28, 2024

While closures can be a pain for serialization, there is no issue serializing the grammars generated using the approach in my PR (as far as I can tell). The code that generates the grammars has closures, but the grammars themselves don't.

That being said, the closure approach feels like a "workaround" to play nicely with how the library currently uses Placeholders, and caching guidance-decorated functions with a nonzero number of args would remove the need for the workaround.

hudson-ai and others added 4 commits February 28, 2024 14:49
Guidance handles what would otherwise be infinitely recursive calls to guidance functions with a little "placeholder" based lazy-evaluation. This works great when the guidance functions take no arguments, but it fails otherwise. Fixing that may be a nice PR in of itself, but in the meantime, we can put ourselves in the zero-arg setting using closures.

From @hudson-ai
@riedgar-ms
Copy link
Collaborator Author

@hudson-ai I've just tested your change with the example server in guidance, and it does serialise correctly. I've added a test which shows this

@slundberg
Copy link
Collaborator

slundberg commented Mar 4, 2024

I feel I was mistaken thinking I ever got this working...

@slundberg can you advise on how to make the following pattern not infinitely recurse? This is a huge simplification of what we're trying to do here, but it has the same failure mode as far as I can tell.

@guidance(stateless=True)
def repeat(lm, arg):
   return lm + arg + select([repeat(arg), ''])

The following works just fine because of the placeholder business when a function has no args, but with args above, we get infinite recursion.

@guidance(stateless=True)
def repeat_x(lm):
   return lm + 'x' + select([repeat_x(), ''])

There is a cache argument in the decorator, but it doesn't seem to help. Any ideas?

Sorry to be slow here. Infinite recursion is avoided by caching the function calls, however as you point out we just avoid caching all together if the function has an argument. We do this because it is challenging to correctly (and efficiently) check for equality over arbitrary input arguments. Good idea with closures. The alternative would be to add some basic argument equality checking to the guidance function caching (this would make writing these kinds of grammars easier in the future).

@hudson-ai
Copy link
Collaborator

hudson-ai commented Mar 4, 2024

Sorry to be slow here. Infinite recursion is avoided by caching the function calls, however as you point out we just avoid caching all together if the function has an argument. We do this because it is challenging to correctly (and efficiently) check for equality over arbitrary input arguments. Good idea with closures. The alternative would be to add some basic argument equality checking to the guidance function caching (this would make writing these kinds of grammars easier in the future).

@slundberg I think that adding more caching would be great, although this particular use-case wouldn't be very easy -- simply because the input is an un-hashable dictionary 😅

Off-topic to this PR -- will make an issue when I get the chance:
Maybe in the meantime, it would be nice to catch un-cached recursive calls and give users a nicer error that explains they are infinitely recursing because the current version of guidance only supports no-arg recursion? I only figured this out because I looked closely at the source code of the decorator. I doubt we want users to have to do that ;)

@riedgar-ms
Copy link
Collaborator Author

@slundberg were there any other changes you wanted to this PR? I think it's ready to merge.

@riedgar-ms
Copy link
Collaborator Author

Ping @slundberg .....

@slundberg
Copy link
Collaborator

slundberg commented Mar 8, 2024

Okay took a pass through this and am set to merge once the tests run. Thank you again @riedgar-ms and sorry it took a bit to review. The only change I made was to shorten the name of the library function from gen_json to just json. We could put "gen" in front of every library function if we wanted since they all result in generation...so we just leave them off normally. In this case that makes is easier for a library import conflict, but I think it is better for the user to pick what they want (either guidance.json or from guidance import json as my_gen_json)

Thanks!

@slundberg slundberg merged commit 76348e1 into guidance-ai:main Mar 8, 2024
12 of 15 checks passed
@slundberg
Copy link
Collaborator

@riedgar-ms I went ahead and merged this, but there seems to be an unrelated build error on the new macOS tests....might be worth looking into.

@riedgar-ms riedgar-ms deleted the riedgar-ms/json-schema-rewrite-01 branch March 15, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants