Remove code that overrides max_seq_length #54

michael-quinlan · 2023-06-26T22:25:49Z

max_seq_length was being hardcoded at 512 rather then being loaded via the config or via the code from the is None block. In fact the following if could not be reached

if max_seq_length is None:
            if hasattr(self.auto_model, "config") and hasattr(self.auto_model.config, "max_position_embeddings") and hasattr(self.tokenizer, "model_max_length"):
                max_seq_length = min(self.auto_model.config.max_position_embeddings, self.tokenizer.model_max_length)

The default code loads the corred sequence length of 512 so this hardcoding is not requred. I verifed this by running the inference examples, noting that the print on line 252 print('max_seq_length ',max_seq_length) shows 512.

I also verified that the simiarities produced by running the following code in the README is also identtical from the release verison on pypi

from sklearn.metrics.pairwise import cosine_similarity
sentences_a = [['Represent the Science sentence: ','Parton energy loss in QCD matter'], 
               ['Represent the Financial statement: ','The Federal Reserve on Wednesday raised its benchmark interest rate.']]
sentences_b = [['Represent the Science sentence: ','The Chiral Phase Transition in Dissipative Dynamics'],
               ['Represent the Financial statement: ','The funds rose less than 0.5 per cent on Friday']]
embeddings_a = model.encode(sentences_a)
embeddings_b = model.encode(sentences_b)
similarities = cosine_similarity(embeddings_a,embeddings_b)

==

array([[0.81227076, 0.7351362 ],
       [0.6770725 , 0.81411076]], dtype=float32)

Although this change probably won't effect anyone running the default version of the code (since the 512 override is the same as what is loaded) I think it's worth fixing now either by merging this PR of by the authors making the fix and verifying end-to-end.

Ethan-Chen-plus · 2023-07-07T13:23:28Z

@michael-quinlan Hi!
Could we change the max_length when we fine tune on our own data with long text?

Remove code that overrides max_seq_length

3530638

hongjin-su merged commit 30b5e5c into xlang-ai:main Jul 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove code that overrides max_seq_length #54

Remove code that overrides max_seq_length #54

michael-quinlan commented Jun 26, 2023 •

edited

Loading

Ethan-Chen-plus commented Jul 7, 2023

Remove code that overrides max_seq_length #54

Remove code that overrides max_seq_length #54

Conversation

michael-quinlan commented Jun 26, 2023 • edited Loading

Ethan-Chen-plus commented Jul 7, 2023

michael-quinlan commented Jun 26, 2023 •

edited

Loading