Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section 3.1 - TypeError: can only concatenate str (not "float") to str #73

Closed
Adrian-1234 opened this issue Jan 12, 2023 · 1 comment
Closed

Comments

@Adrian-1234
Copy link

All o/p consistent with the note book example until 3.1; then :

for name, is_disc in [('discriminator', True), ('qa', False)]:
for train_test, dt in [('train', train_df), ('test', test_df)]:
ft = create_fine_tuning_dataset(dt, discriminator=is_disc, n_negative=1, add_related=True)
ft.to_json(f'{name}_{train_test}.jsonl', orient='records', lines=True)

TypeError Traceback (most recent call last)
in
1 for name, is_disc in [('discriminator', True), ('qa', False)]:
2 for train_test, dt in [('train', train_df), ('test', test_df)]:
----> 3 ft = create_fine_tuning_dataset(dt, discriminator=is_disc, n_negative=1, add_related=True)
4 ft.to_json(f'{name}_{train_test}.jsonl', orient='records', lines=True)

in create_fine_tuning_dataset(df, discriminator, n_negative, add_related)
46 rows = []
47 for i, row in df.iterrows():
---> 48 for q, a in zip(("1." + row.questions).split('\n'), ("1." + row.answers).split('\n')):
49 if len(q) >10 and len(a) >10:
50 if discriminator:

TypeError: can only concatenate str (not "float") to str

I add in 3 str(...) :

    for q, a in zip(("1." + str(row.questions)).split('\n'), ("1." + str(row.answers)).split('\n')):
        if len(q) >10 and len(a) >10:
            if discriminator:
                rows.append({"prompt":f"{row.context}\nQuestion: {q[2:].strip()}\n Related:", "completion":f" yes"})
            else:
                rows.append({"prompt":f"{row.context}\nQuestion: {q[2:].strip()}\nAnswer:", "completion":f" {a[2:].strip()}"})

for i, row in df.iterrows():
    for q in ("1." + str(row.questions)).split('\n'):

Which allows the code to run, but:
openai api fine_tunes.create....

Upload progress: 100% 1.00/1.00 [00:00<00:00, 2.57kit/s]
[organization=user-dyhnotsuxa3kiftffqbsno2j] Error: Expected file to have JSONL format, where every line is a JSON dictionary. Line 1 is not a dictionary. (HTTP status code: 400)

discriminator_train.jsonl and discriminator_test.jsonl are zero length files.

@Adrian-1234
Copy link
Author

Further investigation looks like the question generator did not run correctly - investigating this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant