-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-Populate remaining fields in Attribution Information form. #195
Comments
I think the taxonomy guys are now accepting knowledge sources that are not from Wikipedia https://github.com/instructlab/taxonomy/blob/main/CONTRIBUTING.md. We probably need to think of a more generic system in the long term. i.e. How can we extract title, revision... from any source information. |
@vishnoianil @aevo98765 First Assumption ApproachI envision a behavior similar to a search field:
Potential IssuesThis approach could potentially create significant load on the UI. To mitigate this, we should consider implementing one of the following:
I believe these measures would help us control the frequency of data fetches for auto-populating fields. Questions for Discussion
Your thoughts and expertise on these would be greatly appreciated. |
I think apart from wikipedia, any knowledge that has associated markdown file and it's source is acceptable contribution at this point of time. @juliadenham @jjasghar please correct me if I am wrong. +1 on the generic system, although at this point of time we don't know all the sources (and not sure if internet in general can be source or not). So let's start with wikipedia and evolve it as a general system as we learn more about sources. |
@ediazgallego Thanks for sharing the thoughts. Discussion like these will be helpful for other contributors as well. Appreciate it. How about we hook the data fetch to "OnBlur" event. This event will trigger only when user click's out of the input box. Followed by url validation (it should be valid url). Not sure, if wikipedia requires a api_key to access the summary data of the page, if not, it would be good to implement the fetch on the client side rather than on server side, so it's going to run on client browser and we can prevent any possible scale issue on the server side. |
Why
UX recommendations
Contextual Information
From @vishnoianil: Knowledge submission can be sourced from various targets. A simple example would be wikipedia. If user adds the URL to a wikipedia page, we should automatically populate other fields (title, revision, license, author) from wikipedia page.
At this point we can target wikipedia, because upstream taxonomy repo only accepting knowledge contribution based on the wikipedia. In future we will add support for more sources for knowledge contribution, and the extraction process for attribution information can be very specific to each target as well.
You can follow this scripts https://github.com/mairin/instructlab-knowledge-utils?tab=readme-ov-file#1-%EF%B8%8F-wikipedia-attribution-genpy to determine how we can populate this information from wikipedia. Big thanks to @mairin for writing these utilities.
The text was updated successfully, but these errors were encountered: