-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data explorer specs #42
Conversation
Co-authored-by: Kriti Godey <kriti@centerofci.org>
Co-authored-by: Kriti Godey <kriti@centerofci.org>
Co-authored-by: Kriti Godey <kriti@centerofci.org>
…athesar-wiki into data-explorer-specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghislaineguerin I added some comments. Some general questions?
(1) What happened to the alerts and error prevention and onboarding sections?
(2) What is the plan for putting this in the design prototype? Will you still be doing that or are we going to stick with the low definition wireframes for now? There are a lot of errors and interactions that are missing here but I think if we're moving this to the prototype, those are better handled there. If there's no plans to put this in the prototype, then I'll start listing them out here.
Fixes #1065
Could you link the PR description to the correct issue (it's in the main repo, not this one)?
|
||
[Selecting a Base Table](https://share.balsamiq.com/c/esUftomoFsZiRhZDMYdoPP.png) | ||
|
||
[Base Table Selected](https://share.balsamiq.com/c/me5CfGGeX3R35x2otLqbpu.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have total rows and column count be greater than 0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also applies to other wireframes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgodey Those numbers applied to the base table. I might add that information to the table selector menu instead. I feel it's important as an additional reference point when selecting tables, as the name only might not suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the wireframe to include those numbers inside the table selector.
--- | ||
Wireframes | ||
|
||
[Adding Columns](https://share.balsamiq.com/c/oyxTXxqSh8rLqU3JY71DWY.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tabs of "Available", "Formula" and "In Use" are a bit confusing. "Available" vs. "In Use" makes sense. "From Formula" seems like a different feature entirely.
design/specs/visual-query-builder.md
Outdated
Formulas are used to generate new columns based on different parameters. To access the list of formulas, the user start the `Add Column` process and selects the option `From Formula` at the top of the inspector panel. Selecting a formula will open a form that users can fill out to determine the values of the new column. | ||
|
||
Depending on the selected formula, different settings will be available. | ||
More on formulas and specific details for each will be covered on a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the new issue would also cover selection of a column after a formula is chosen. The wireframe does not seem to include that.
--- | ||
Wireframes | ||
|
||
[Applying a Formula](https://share.balsamiq.com/c/nEzYaNKNTSv9EFwzwtfSof.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the formula is applied, shouldn't the column be identical to a formula column (in the second screen of this wireframe)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once they save it, yes. I have added the additional step.
--- | ||
Wireframes | ||
|
||
[Output Summarization](https://share.balsamiq.com/c/rPMwwETnuQ8a4ut8NfmcDB.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still missing data-type specific grouping options
design/specs/visual-query-builder.md
Outdated
|
||
[Applying a Formula](https://share.balsamiq.com/c/nEzYaNKNTSv9EFwzwtfSof.png) | ||
|
||
## 3. Transforming the Output Table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need wireframes for deleting steps as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the specs along w wireframes.
|
||
[Adding Columns](https://share.balsamiq.com/c/oyxTXxqSh8rLqU3JY71DWY.png) | ||
|
||
[Added Column](https://share.balsamiq.com/c/7U9yahZtYWX2G5obkyivoz.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have an example with a more nested source so we can see what the source field looks like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section specific to sources.
|
||
For a more detailed overview of this feature, read [Views Product Spec](https://wiki.mathesar.org/en/product/specs/2022-01-views). | ||
|
||
## Scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing scenarios for navigating to and navigating away from the data explorer (from the usual view with tabs and left sidebar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work overall. I left a few specific questions.
design/specs/visual-query-builder.md
Outdated
Depending on the selected formula, different settings will be available. | ||
More on formulas and specific details for each will be covered on a separate issue. | ||
|
||
Columns generated from formulas will display a formula icon indicator in the column header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these have a similar "Column Source" view to represent which (if any) columns provided data which was manipulated by the formula, or should users get this info by inspecting the formula?
I don't have a particular solution in mind here, I'm just wondering if you've thought about it, and what that might look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if a formula parameter is a column, the sources are displayed.
|
||
### 2.4 Filtering Input Column Values | ||
|
||
Columns that link to multiple records can have filters applied to them to retrieve only values that match user-specified criteria. Multiple filters are allowed for each input column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's reasonable to expect users to come up with a filter that results in a 1:1 mapping between the base table rows and the linked column entries. In the example wireframe, we're aggregating under the assumption that there are still more rows in the filtered column than in the base table. Is that intended to be the only option? What if their filter removes (or would remove) all values associated with a row. Do we need to have a NULL
sometimes, and a zero in other cases (e.g., NULL
for first-character matching, and 0 for counting). If we're planning to have NULL
when we clear out a given section of a linked column, why restrict to multiple-record links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mathemancer Could you provide an example of what you're envisioning here? I'm having a hard time understanding the issue from your description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we're aggregating actors in movies into lists, but we're only interested in actors whose first name starts with 'P'. Then filtering might produce some movies with no actors after filtering for only actors fitting the description. I assume we'd want to show NULL
for that case. What if instead, we're aggregating the number of Academy awards held by the casts of movies? This would be a sum, rather than a list. But then the value shown for a movie with a cast none of whom ever won an Oscar should probably be 0
, not NULL
. Is this the intended UI in these cases? Are there other cases to consider?
Now, since we're opening the possibility of ending up with no results in the linked columns for some movies, what if we have some many-to-one relationship (maybe movie studios as a toy example)? Each movie has one studio, and for some reason we're only interested in studios starting with the letter 'P'. Why not give the possibility of filtering the linked studio column so that we show 'Pixar' and 'Paramount', but only NULL
where there would have been 'Universal'? My question there is whether the restriction to only allowing filtering of linked records in cases of columns that link multiple records is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your question now. I'm concerned that allowing filtering for those types of use cases would overload the concept of filters. I think users should use formulas for use cases like that instead, e.g. the Is True
formula or the Starts With
formula.
|
||
### 3.3 Sorting the Output Table | ||
|
||
Users can sort the output table by applying a sort to any of the result table columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include other aggregations? I'm not sure how this should work for non-numeric aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, non-numeric aggregations can only be displayed as lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question in that case is whether we want to sort by lists. That may be an awkward concept for users, since sorting by a list isn't a very natural thing to do, and there are a number of slightly weird options for how the sorting would actually work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't allow users to sort by aggregated columns? I can see how that would get confusing.
@ghislaineguerin what do you think?
Caveats about my review
General questions and critique
2.2 Input column sources
2.4 Filtering Input Column Values
2.5 Aggregating Input Column Values
Transform Result
6. Saving the Query as a view
An input column used in a formula has been deleted
|
--- | ||
Wireframes | ||
|
||
[Adding Columns](https://share.balsamiq.com/c/oyxTXxqSh8rLqU3JY71DWY.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have the separate "In Use" tab?
If a user adds a column, can they not add it again?
For eg., The user may want to add the same column multiple times and apply different formulas to them.
How would be represent such cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can add a column again. They may, however, change the column names and even the data types of the columns that have been selected (through aggregations or summarization). The 'In use' section would allow users to quickly see which columns are being used in the view. Consider a user inspecting an existing view that they did not create and simply wanting to list all of the sources.
design/specs/visual-query-builder.md
Outdated
Formulas are used to generate new columns based on different parameters. To access the list of formulas, the user start the `Add Column` process and selects the option `From Formula` at the top of the inspector panel. Selecting a formula will open a form that users can fill out to determine the values of the new column. | ||
|
||
Depending on the selected formula, different settings will be available. | ||
More on formulas and specific details for each will be covered on a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the new issue would also cover selection of a column after a formula is chosen. The wireframe does not seem to include that.
--- | ||
Wireframes | ||
|
||
[Added Input Filter](https://share.balsamiq.com/c/tK2hZy6FB4zVYpN56ejpBk.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter is applied on actor.date_of_birth
, while the selected column is actor.id
. If we add another column actor.name
, would we show this applied filter in both columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would only affect the input column with the filter.
--- | ||
Wireframes | ||
|
||
[Applying a Formula](https://share.balsamiq.com/c/nEzYaNKNTSv9EFwzwtfSof.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to have a single workflow:
- Either user adds a formula and chooses the columns (My choice), or
- User adds a column and chooses the formula..
A formula can modify the column entirely, and already applied filters and aggregations would no longer be applicable. A user adding a formula to an existing column might not understand that the filters & aggregations would get removed.
Overall, I think this additional flexibility would be troublesome for the user and for us to maintain in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghislaineguerin As we discussed yesterday, I think we should remove formulas from this spec entirely and have a separate spec for integrating formulas once we finish the design for mathesar-foundation/mathesar#1252.
--- | ||
Wireframes | ||
|
||
[Output Filter](https://share.balsamiq.com/c/dAVYnp8VnG2r2HzkSZ3rgp.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find using the text Filter
for both input and output to be confusing. Can we rename this to be more descriptive or add additional help content in both places describing what they do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. See my reply to Sean's same question below:
Rather than calling it filter, we could call it filter records or include only records that match this condition.
|
||
[Deleting a Transformation Step](https://share.balsamiq.com/c/XeNAKGz1UnDMfscSTLDVp.png) | ||
|
||
## 5. Previewing the Query Results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the preview needs more screen space. It would be better if we can explore options to merge both panes into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think merging both panes is a good idea. The left pane represents the structure of the query, the right pane is an "inspector" focusing on context-specific details for the current task the user is doing.
I do agree that the user might want more space for the preview. Making both panes collapsible seems like a better option for this.
|
||
## 5. Previewing the Query Results | ||
|
||
A preview, or query result table, should be visible while the user is in `Data Explorer`. The result table will change based on the different configurations, for example, if a user applies a filter, the system should refresh the table to show the output with the filter applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better if this preview table is similar to normal tables that we display, with pagination options at the bottom.
I dont think limit
and offset
should be part of transformations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my understanding, pagination is merely a display option, but limit and offset have an effect on the number of records in the table. How do you see users distinguishing between the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @ghislaineguerin said, limit and offset are applied to the query definition, this is not pagination. @ghislaineguerin It would be good to explain the limit/offset somehow since new users may not understand the difference. I also think the wireframes (or prototype) should include pagination.
@seancolsen Thanks for the input. Here are my answers, I will also ask @kgodey to provide additional details where needed.
They are the same. We renamed the query builder to data explorer to make it more user-friendly and goal-oriented.
The data explorer will be accessible via the top navigation as well as the toolbar for views and tables. Views can be opened in data explorer to inspect or modify their settings, and tables can be used as the starting point for a new query.
I believe that this is a reasonable recommendation, given that changing the base table will result in the entire query being deleted. It will also be more clear that table selection is not something that can be modified.
In my opinion, "ephemeral inquiries" make sense as a starting point. Rather than just saving views, we wanted users to be able to explore data at any time and use this as a way of assessing, and reviewing data in their tables. It is possible that having new objects created every time they explore will discourage them from using the tool in this manner.
It's possible that we could get rid of it if we implement the changes you've recommended earlier. We'd still need a way to make it clear which table we were looking at.
According to my understanding, pagination is merely a display option, but limit and offset impact the actual number of records in the table
Although it's worth looking into and your suggestions make sense, some of the interactions in the inspector panel could lead to a more comfortable user experience. We are preparing an interactive prototype and will be testing different configurations.
They should be able to do so from both locations.
When a user leaves the field, the update should be reflected. The assumption is that the result table is updated when the user changes the input columns.
I think we only support AND conditions. @kgodey might have a better answer.
We are attempting to create a more natural approach to describing the links that does not rely on diagrams.
This makes sense. However, we started with a column to better identify the source of the new column values. Because we will be displaying sources in places where the selection is done by column rather than a table, the order made more sense. I'm not sure what the best approach is here; I'll think about it more and respond with a better answer.
Rather than calling it filter, we could call it
This, I believe, is possible. In the interactive prototype, I will include an example of this.
https://wiki.mathesar.org/en/product/specs/2022-01-views/04-formulas/4a-record-aggregations
@kgodey, I'll defer to you.
Will be included in the interactive prototype.
We talked about it, and while your approach makes sense, not all step deletions will result in errors. We'd have to assess how simple it would be for users to update the remaining steps that cause errors. Imagine if a user only wants to remove an initial filter and can't do it without losing all subsequent steps?
It should remain and disable the save button until changes have been made so that users know it has been saved. I believe we'll need to provide a way to get to the actual view.
See the previous answer.
I fully agree that this is a tricky scenario. It would be a good idea to go over this further on our next call. |
Thanks @ghislaineguerin. More thoughts...
|
Responses to comments from @seancolsen and @ghislaineguerin:
This was integrated into the main query builder page so that if a new user clicked on "Data Explorer", they could have an idea about what was going to happen next. If we just ask them to select a table in a separate page (we had this in a previous wireframe version), they won't know what the context is. @ghislaineguerin It would be good to integrate reasoning like this into the spec so that if we go back and look at the spec, we'll remember why we made the design decisions we did.
We need to have the base table in a prominent place because the rest of the query depends on it. The columns available are based on relationships to the base table, the source information displayed is relative to the base table. I think it helps orient users to have it displayed over everything else. I do think we could put in an explanation there about why it's relevant. We could also use the header for other purposes (e.g. if we're using the data explorer to edit an existing view, we could put information about the view there). @ghislaineguerin It would be good to document that the main purpose of the header to orient the user and provide context for the whole query.
Please see my response to @pavish's comment here: #42 (comment) I think that making the panes collapsible is a better option than trying to merge them because having a separation of "where am I in the query" and "what am I currently working on" would help orient users.
@ghislaineguerin I do think we should support
We're not currently giving users this option, we're defaulting to
These are out of scope for this spec, they will be handled in:
I think this is a good idea. I also commented about this here: #42 (comment)
I agree with this.
@ghislaineguerin I think it would be fine to only allow users to delete the last step. I don't think we can figure out which steps will cause errors and which won't, since the query is built in-order. Even if we can figure it out, it will be a hard problem to solve technically and I don't think it's worth putting effort into for the alpha release. The other alternative I see is that deleting a step would also delete all subsequent steps, but that seems like a worse UX. |
This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then. |
Fixes #1065