Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-Populate remaining fields in Attribution Information form. #195

Open
ediazgallego opened this issue Sep 23, 2024 · 4 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers
Milestone

Comments

@ediazgallego
Copy link
Collaborator

Why

  • After typing the URL in the attribution information form, we want to auto-populate the remaining fields.

UX recommendations

  • Implement logic to validate if the URL can be used to populate the remaining fields.

Contextual Information

From @vishnoianil: Knowledge submission can be sourced from various targets. A simple example would be wikipedia. If user adds the URL to a wikipedia page, we should automatically populate other fields (title, revision, license, author) from wikipedia page.

At this point we can target wikipedia, because upstream taxonomy repo only accepting knowledge contribution based on the wikipedia. In future we will add support for more sources for knowledge contribution, and the extraction process for attribution information can be very specific to each target as well.

You can follow this scripts https://github.com/mairin/instructlab-knowledge-utils?tab=readme-ov-file#1-%EF%B8%8F-wikipedia-attribution-genpy to determine how we can populate this information from wikipedia. Big thanks to @mairin for writing these utilities.

@ediazgallego ediazgallego added enhancement New feature or request good first issue Good for newcomers labels Sep 23, 2024
@vishnoianil vishnoianil added this to the release-1.1 milestone Sep 23, 2024
@aevo98765
Copy link
Member

I think the taxonomy guys are now accepting knowledge sources that are not from Wikipedia https://github.com/instructlab/taxonomy/blob/main/CONTRIBUTING.md.

We probably need to think of a more generic system in the long term. i.e. How can we extract title, revision... from any source information.

@ediazgallego
Copy link
Collaborator Author

@vishnoianil @aevo98765
After looking @mairin's utility scripts, it's clear we need to utilize Wikipedia's APIs to retrieve summary data. There are wrapper packages that simplify the use of the Wikipedia API, but one thing I thought we need to consider is how to handle API calls when a user is entering a URL.

First Assumption Approach

I envision a behavior similar to a search field:

  1. The request is triggered as the user types or when the input reaches a certain length.
  2. Results are fetched either from cache or the API.
  3. The UI is re-rendered with the results.

Potential Issues

This approach could potentially create significant load on the UI. To mitigate this, we should consider implementing one of the following:

  • A validation button that will make the user to click on it after entering the URL, the click event should then validate the URL and if all seems correct, we then fetch the data and populate remaining fields with available data.
  • URL verification logic, instead of a button this one could behave more like form validation that is happening in the background and once the validation completes and verifies the URL it also fetch the data and populate remaining fields.

I believe these measures would help us control the frequency of data fetches for auto-populating fields.

Questions for Discussion

  • Which approach do you think would be more user-friendly?
  • Are there any performance concerns we should address?

Your thoughts and expertise on these would be greatly appreciated.

@vishnoianil
Copy link
Member

my guys are now accepting knowledge sources that are not from Wikipedia

I think apart from wikipedia, any knowledge that has associated markdown file and it's source is acceptable contribution at this point of time. @juliadenham @jjasghar please correct me if I am wrong.

+1 on the generic system, although at this point of time we don't know all the sources (and not sure if internet in general can be source or not). So let's start with wikipedia and evolve it as a general system as we learn more about sources.

@vishnoianil
Copy link
Member

  • button that will make the user to click on it after entering the URL,

@ediazgallego Thanks for sharing the thoughts. Discussion like these will be helpful for other contributors as well. Appreciate it.

How about we hook the data fetch to "OnBlur" event. This event will trigger only when user click's out of the input box. Followed by url validation (it should be valid url). Not sure, if wikipedia requires a api_key to access the summary data of the page, if not, it would be good to implement the fetch on the client side rather than on server side, so it's going to run on client browser and we can prevent any possible scale issue on the server side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: No status
Development

No branches or pull requests

3 participants