-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Is your feature request related to a problem? Please describe.
Currently, refinery enables the user to provide upload options to define how the included data should be imported (e.g. column separator, line terminator). But often, the import does not work as expected, for example, the defined options do not match the format of the uploaded file perfectly.
Further, user cannot specify if only part of the included data should be imported to refinery or map data to (existing) data in refinery (e.g. user data).
Describe the solution you'd like
The import and export should be supported by an assistant. This assistant would preview how the uploaded data would be imported into refinery and provide more options.
Preview
The assistant should include a view that displays how (one or a couple) of records would look like in the import or export.
So, for an import, it would show which attributes would be created and the included values for the sample records. For export, it would show the exported record, for example the created json string.
Provide more options
- Pandas import options
As already included, the user should be able to specify pandas import options. This includes column separator, line terminator etc. - Mappings
Users should be able to create mappings for the imported data. For example, a mapping between users in the import and users in refinery. - Extraction data
In refinery, extraction data is labeled on token level (tokens are defined by spacy). Other labeling tools follow different approaches. E.g. Labeling studio enables the user to label any charspan. Therefore, charspans must be matched with tokens when importing these data into refinery. Different strategies can be applied for the matching, e.g. expanding the charspan to the next tokens. Here, the user should be able to choose between the different strategies.
Additional context
test finding v1.5.0