Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill in missing dataset fields in CSV and Markdown formats #617

Closed

Conversation

nikos-livathinos
Copy link

If your PR is related to a contribution to the taxonomy, please, fill
out the following questionnaire. If not, replace this whole text and the
following questionnaire with whatever information is applicable to your PR.

Describe the contribution to the taxonomy

  • Fill in missing data in CSV and Markdown formats.
  • Given a dataset in CSV or Markdown format and assuming that:
    • Some fields are missing.
    • The information for these fields can be provided either by some existing fields or by general information.
  • We want the model to do the following:
    • Understand the dataset structure and format.
    • Extract information from the existing fields.
    • Fill in the missing fields in the correct format.
  • Currently the model is mostly capable to find the missing information but it is very chatty and it fails to deliver the extracted information the correct format.
  • This PR provides examples how to fill in the missing dataset fields from information extracted by existing fields and deliver it in the correct format.

Input given at the prompt

Fill the missing data
News;Currency;Amount
John gives $10 to Mary;USD;
An 1 millon euro investment for the next 10 years;EUR;
The Swiss franc has appreciated against all the world’s major currencies;;
We want to buy 100 Polish Zloty;;100

Response from the original model

The missing data was filled with the relevant information. For the first row, there was no need for addi
tional information since it was already provided in the original dataset. For the second row, I added th
e amount of 1,000,000 euros and kept the currency as Euros (EUR). For the third row, I left it empty bec
ause it was not relevant to the currency and amount information provided. For the fourth row, I added th
e amount of 100 Polish Zloty (PLN) since it was requested in the question.

Response from the fine-tuned model

I have completed the lab generate, but I am not able to train in Linux.
The expected model response should be:

News;Currency;Amount
John gives $10 to Mary;USD;10
An 1 millon euro investment for the next 10 years;EUR;1 million
The Swiss franc has appreciated against all the world’s major currencies;CHF;
We want to buy 100 Polish Zloty;PLN;100

Contribution checklist

@nikos-livathinos nikos-livathinos requested a review from a team as a code owner March 22, 2024 10:09
Copy link
Member

@mingxzhao mingxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have changed the attribution requirements and no longer require them within the yaml file. Please remove them when you can!

@mingxzhao mingxzhao added triage-requested-changes skill has been reviewed; changes requested from contributor skill (Auto labeled) labels Apr 5, 2024
… Introduce qna.yaml with

examples on how to fill in missing data in CSV and Markdown format.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Apr 5, 2024
@nikos-livathinos
Copy link
Author

I have removed the attributions. I think it should be ok now.

@n1hility
Copy link
Member

Thank you for your contribution to InstructLab! Unfortunately, once a Github repo is made public, all open PRs are automatically closed since they link against a private repo. We have detected that your PR might have been one of the ones affected by this change. If you are still interested in contributing your improvement, please fill out the following short form by no later than May 3rd, and we will get back to you with the additional steps necessary once we have had time to assess the PRs of those still interested:

https://forms.gle/V7SrPPMZDo6iGDYu8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skill (Auto labeled) triage-needed (Auto labeled) skill is ready to be triaged triage-requested-changes skill has been reviewed; changes requested from contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants