JSON Schema Rewrite #651

riedgar-ms · 2024-02-22T16:26:50Z

Rewriting the JSON Schema generator to be more remote from the actual grammar. @hudson-ai provided the code which allowed recursive structures (such as linked lists) to be created.

riedgar-ms · 2024-02-22T16:27:02Z

Tagging @hudson-ai for interest

hudson-ai · 2024-02-22T17:24:36Z

Are you open to including enum types and objects with additionalProperties in this PR, or should those be saved for the next one?

hudson-ai · 2024-02-22T17:28:21Z

May I also suggest changing collections.abc.Mapping[str, any], to collections.abc.Mapping[str, Any]? any is not a type, but Any from typing is :)

hudson-ai · 2024-02-22T17:35:21Z

Also for 3.8, Mapping and MutableMapping should be imported from typing if you want to subscript them.

riedgar-ms · 2024-02-22T18:52:11Z

@hudson-ai thanks for the Python 3.8 warning :-/

Harsha-Nori · 2024-02-22T19:20:54Z

guidance/library/_json_schema.py

+from ._char_range import char_range
+from ._one_or_more import one_or_more
+from ._optional import optional
+from ._zero_or_more import zero_or_more
+


We can pull these imports straight from the guidance.library namespace instead of pulling through the private files (https://github.com/guidance-ai/guidance/blob/main/guidance/library/__init__.py)

Tweaked a bit. The _grammar import isn't working, though

tests/library/test_json_schema.py

hudson-ai · 2024-02-22T20:44:33Z

A bit annoying, but I recommend something like this for the Mapping and MutableMapping interfaces:

import sys
if sys.version_info >= (3, 9, 0):
    from collections.abc import Mapping, MutableMapping
else:
    from typing import Mapping, MutableMapping

This is because as of 3.9, typing's versions are deprecated aliases to the abc ones and will be removed in future versions of python -- see https://docs.python.org/3/library/typing.html#typing.Mapping

riedgar-ms · 2024-02-27T13:37:46Z

@slundberg do you have thoughts on this? @hudson-ai has submitted a PR to my fork which avoids the problem by using a Python closure to do lazy evaluation. However, I suspect that that is not going to serialise well. I have a feeling that the previous tweak proposed for Placeholder is going to be what is needed.

Can you comment on this?

Also, do you want to wait for that to be sorted before merging this? I could make my test case be an XFAIL while the exact update for Placeholder is figured out. If you are concerned about people picking up and using my wrong initial implementation, then this does need to merge reasonably quickly.

hudson-ai · 2024-02-28T18:30:14Z

While closures can be a pain for serialization, there is no issue serializing the grammars generated using the approach in my PR (as far as I can tell). The code that generates the grammars has closures, but the grammars themselves don't.

That being said, the closure approach feels like a "workaround" to play nicely with how the library currently uses Placeholders, and caching guidance-decorated functions with a nonzero number of args would remove the need for the workaround.

@hudson-ai

Guidance handles what would otherwise be infinitely recursive calls to guidance functions with a little "placeholder" based lazy-evaluation. This works great when the guidance functions take no arguments, but it fails otherwise. Fixing that may be a nice PR in of itself, but in the meantime, we can put ourselves in the zero-arg setting using closures. From @hudson-ai

…a-rewrite-01

riedgar-ms · 2024-02-28T20:38:16Z

@hudson-ai I've just tested your change with the example server in guidance, and it does serialise correctly. I've added a test which shows this

guidance/library/_json_schema.py

…ema-rewrite-01

slundberg · 2024-03-04T16:15:41Z

I feel I was mistaken thinking I ever got this working...

@slundberg can you advise on how to make the following pattern not infinitely recurse? This is a huge simplification of what we're trying to do here, but it has the same failure mode as far as I can tell.
@guidance(stateless=True)
def repeat(lm, arg):
   return lm + arg + select([repeat(arg), ''])
The following works just fine because of the placeholder business when a function has no args, but with args above, we get infinite recursion.
@guidance(stateless=True)
def repeat_x(lm):
   return lm + 'x' + select([repeat_x(), ''])
There is a cache argument in the decorator, but it doesn't seem to help. Any ideas?

Sorry to be slow here. Infinite recursion is avoided by caching the function calls, however as you point out we just avoid caching all together if the function has an argument. We do this because it is challenging to correctly (and efficiently) check for equality over arbitrary input arguments. Good idea with closures. The alternative would be to add some basic argument equality checking to the guidance function caching (this would make writing these kinds of grammars easier in the future).

hudson-ai · 2024-03-04T21:50:52Z

Sorry to be slow here. Infinite recursion is avoided by caching the function calls, however as you point out we just avoid caching all together if the function has an argument. We do this because it is challenging to correctly (and efficiently) check for equality over arbitrary input arguments. Good idea with closures. The alternative would be to add some basic argument equality checking to the guidance function caching (this would make writing these kinds of grammars easier in the future).

@slundberg I think that adding more caching would be great, although this particular use-case wouldn't be very easy -- simply because the input is an un-hashable dictionary 😅

Off-topic to this PR -- will make an issue when I get the chance:
Maybe in the meantime, it would be nice to catch un-cached recursive calls and give users a nicer error that explains they are infinitely recursing because the current version of guidance only supports no-arg recursion? I only figured this out because I looked closely at the source code of the decorator. I doubt we want users to have to do that ;)

riedgar-ms · 2024-03-05T14:39:05Z

@slundberg were there any other changes you wanted to this PR? I think it's ready to merge.

riedgar-ms · 2024-03-07T16:00:45Z

Ping @slundberg .....

slundberg · 2024-03-08T00:32:58Z

Okay took a pass through this and am set to merge once the tests run. Thank you again @riedgar-ms and sorry it took a bit to review. The only change I made was to shorten the name of the library function from gen_json to just json. We could put "gen" in front of every library function if we wanted since they all result in generation...so we just leave them off normally. In this case that makes is easier for a library import conflict, but I think it is better for the user to pick what they want (either guidance.json or from guidance import json as my_gen_json)

Thanks!

slundberg · 2024-03-08T01:11:28Z

@riedgar-ms I went ahead and merged this, but there seems to be an unrelated build error on the new macOS tests....might be worth looking into.

riedgar-ms added 5 commits February 22, 2024 10:43

Getting started

f4e8908

Add null

cd3ed04

Add strings

cef78f8

Another test

d31a9b2

Starting on arrays

ac0fdb0

riedgar-ms requested a review from slundberg February 22, 2024 16:26

riedgar-ms added 5 commits February 22, 2024 11:29

More testing

48bc9d2

Hmm....

6e0e42b

Refactor a little

52be0e3

Add booleans

bb3863c

Get number working

6059239

riedgar-ms added 6 commits February 22, 2024 13:14

First draft of references

6b07266

Another test

be7f0d2

Linting

c430aad

Add anyOf

a8d46fb

any -> Any

590c041

Fun with Python 3.8

447b814

Do some captures

264aaa7

Harsha-Nori reviewed Feb 22, 2024

View reviewed changes

riedgar-ms added 4 commits February 22, 2024 14:53

Working on failure cases

7d79349

Trying to pin down Mock behaviour

0ad7de6

Import fiddling

ac0f8b8

Remove obsolete files

667cc34

riedgar-ms commented Feb 22, 2024

View reviewed changes

tests/library/test_json_schema.py Outdated Show resolved Hide resolved

Forgot part of schema

8421472

hudson-ai and others added 4 commits February 28, 2024 14:49

Linting

a6f4674

Merge remote-tracking branch 'origin/main' into riedgar-ms/json-schem…

51947b2

…a-rewrite-01

Add (recursive) JSON schema test to server

a071bac

hudson-ai reviewed Feb 28, 2024

View reviewed changes

guidance/library/_json_schema.py Outdated Show resolved Hide resolved

riedgar-ms added 6 commits February 29, 2024 07:41

Small correction to json int generation

e6315ec

Some refactoring

a84f1ca

Better server tests

6fdb887

Refactoring server tests

214ae78

Remove unecessary complications

10ab37c

Merge remote-tracking branch 'upstream/main' into riedgar-ms/json-sch…

c1e83fe

…ema-rewrite-01

hudson-ai mentioned this pull request Mar 4, 2024

additionalProperties + enums riedgar-ms/guidance#7

Closed

Merge branch 'main' into riedgar-ms/json-schema-rewrite-01

cdb5cd0

Merge branch 'main' into riedgar-ms/json-schema-rewrite-01

3e87c77

slundberg and others added 4 commits March 7, 2024 16:42

Merge branch 'main' into riedgar-ms/json-schema-rewrite-01

3354227

Shorten gen_json to just 'json'

e07cd3a

Fix typo

71dfc63

fix name change

3feb8da

slundberg merged commit 76348e1 into guidance-ai:main Mar 8, 2024
12 of 15 checks passed

riedgar-ms deleted the riedgar-ms/json-schema-rewrite-01 branch March 15, 2024 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Schema Rewrite #651

JSON Schema Rewrite #651

riedgar-ms commented Feb 22, 2024 •

edited

Loading

riedgar-ms commented Feb 22, 2024

hudson-ai commented Feb 22, 2024

hudson-ai commented Feb 22, 2024

hudson-ai commented Feb 22, 2024

riedgar-ms commented Feb 22, 2024

Harsha-Nori Feb 22, 2024

riedgar-ms Feb 22, 2024

hudson-ai commented Feb 22, 2024

riedgar-ms commented Feb 27, 2024

hudson-ai commented Feb 28, 2024 •

edited

Loading

riedgar-ms commented Feb 28, 2024

slundberg commented Mar 4, 2024 •

edited

Loading

hudson-ai commented Mar 4, 2024 •

edited

Loading

riedgar-ms commented Mar 5, 2024

riedgar-ms commented Mar 7, 2024

slundberg commented Mar 8, 2024 •

edited

Loading

slundberg commented Mar 8, 2024

JSON Schema Rewrite #651

JSON Schema Rewrite #651

Conversation

riedgar-ms commented Feb 22, 2024 • edited Loading

riedgar-ms commented Feb 22, 2024

hudson-ai commented Feb 22, 2024

hudson-ai commented Feb 22, 2024

hudson-ai commented Feb 22, 2024

riedgar-ms commented Feb 22, 2024

Harsha-Nori Feb 22, 2024

Choose a reason for hiding this comment

riedgar-ms Feb 22, 2024

Choose a reason for hiding this comment

hudson-ai commented Feb 22, 2024

riedgar-ms commented Feb 27, 2024

hudson-ai commented Feb 28, 2024 • edited Loading

riedgar-ms commented Feb 28, 2024

slundberg commented Mar 4, 2024 • edited Loading

hudson-ai commented Mar 4, 2024 • edited Loading

riedgar-ms commented Mar 5, 2024

riedgar-ms commented Mar 7, 2024

slundberg commented Mar 8, 2024 • edited Loading

slundberg commented Mar 8, 2024

riedgar-ms commented Feb 22, 2024 •

edited

Loading

hudson-ai commented Feb 28, 2024 •

edited

Loading

slundberg commented Mar 4, 2024 •

edited

Loading

hudson-ai commented Mar 4, 2024 •

edited

Loading

slundberg commented Mar 8, 2024 •

edited

Loading