Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenerationConfig throws Object is not JSON serializable when setting constraints #31070

Closed
4 tasks
OS-leonardopratesi opened this issue May 27, 2024 · 5 comments
Closed
4 tasks

Comments

@OS-leonardopratesi
Copy link

OS-leonardopratesi commented May 27, 2024

System Info

From the GenerationConfig documentation Constraints should be supported but when I try to load any it fails.
I know I can also set the constraints directly in the pipeline.generate(), (*edit: not really a constraint but through the logit_ processor) but for my use case I need to store the constraints in the configuration for later usage.
Following code reproduces the issue:

from transformers import PhrasalConstraint, AutoTokenizer
from transformers import GenerationConfig

model_name = "deepseek-ai/deepseek-coder-1.3b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
stop_token_id = tokenizer("STOP").input_ids
constraint = PhrasalConstraint(stop_token_id)

generation_config, unused_kwargs = GenerationConfig.from_pretrained(
    model_name,
    constraints=[constraint])
TypeError: Object of type PhrasalConstraint is not JSON serializable
File <command-2448391505512096>, line 4
      1 import json
      3 # Convert the constraint to a JSON serializable format
----> 4 constraint_json = json.dumps(constraint)
File /usr/lib/python3.11/json/encoder.py:180, in JSONEncoder.default(self, o)
    161 def default(self, o):
    162     """Implement this method in a subclass such that it returns
    163     a serializable object for ``o``, or calls the base implementation
    164     (to raise a ``TypeError``).
   (...)
    178 
    179     """
--> 180     raise TypeError(f'Object of type {o.__class__.__name__} '
    181                     f'is not JSON serializable')

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import PhrasalConstraint
from transformers import GenerationConfig

model_name = "deepseek-ai/deepseek-coder-1.3b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
stop_token_id = tokenizer("STOP").input_ids
constraint = PhrasalConstraint(stop_token_id)

generation_config, unused_kwargs = GenerationConfig.from_pretrained(
model_name,
constraints=[constraint])

Expected behavior

From documentation it should support Constraints, https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig.constraints, I expect the constraints to be stored somehow and be loaded automatically when calling AutoModelForCausalLM.

@zucchini-nlp
Copy link
Member

cc @gante

@dhaivat1729
Copy link
Contributor

dhaivat1729 commented May 28, 2024

I think this error is coming because we are trying to get serialized representation of the constraint object at several places.
In line 993, we are trying to log the config, which has json object in it, which would fail because constraint class doesn't support such serialization.
This error can also occur at from_pretrained method, where we are calculating hash of the config in line #952 and #956.

I think the easy fix would be have a serialization method for constraint class that can be triggered.

Can I propose a PR for this? For this, I think we might have to go through all possible class objects that can be consumed by GenerationConfig class, and implement a general purpose serialization mechanism, easiest is to replace class object with obj.__dict__ for the purpose of serialization. From my understanding, serialization is only for logging and hashing purposes, and it might not have an impact on actual flow of execution.

@gante I would like to propose a PR but I am not sure what would be ideal approach, would like to get suggestions to move forward. :)

Edit:

Upon digging a bit deeper, I realized that the error is coming in this function 1055, where we do not explicitly take care of serializing class objects.

We can just reimplement this function and it will fix everything.

@OS-leonardopratesi
Copy link
Author

This should serialize also the code of my custom defined constraints, I don't see a way of doing this without pickle.

@huggingface huggingface deleted a comment from github-actions bot Jun 27, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante
Copy link
Member

gante commented Aug 2, 2024

Hi folks 👋 Apologies for the late response -- constraints in general is very poorly maintained at the moment. We're refactoring beam methods, let's hold on changes to constraints for now. See related trackers: 1 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants