Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data explorer specs #42

Merged
merged 33 commits into from
Jun 23, 2022
Merged

Data explorer specs #42

merged 33 commits into from
Jun 23, 2022

Conversation

ghislaineguerin
Copy link
Contributor

@ghislaineguerin ghislaineguerin commented Mar 28, 2022

Fixes #1065

@ghislaineguerin ghislaineguerin requested a review from a team March 28, 2022 16:54
@kgodey kgodey self-assigned this Mar 29, 2022
@kgodey kgodey added the status: review In review label Mar 29, 2022
Copy link
Contributor

@kgodey kgodey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghislaineguerin I added some comments. Some general questions?

(1) What happened to the alerts and error prevention and onboarding sections?
(2) What is the plan for putting this in the design prototype? Will you still be doing that or are we going to stick with the low definition wireframes for now? There are a lot of errors and interactions that are missing here but I think if we're moving this to the prototype, those are better handled there. If there's no plans to put this in the prototype, then I'll start listing them out here.

Fixes #1065

Could you link the PR description to the correct issue (it's in the main repo, not this one)?


[Selecting a Base Table](https://share.balsamiq.com/c/esUftomoFsZiRhZDMYdoPP.png)

[Base Table Selected](https://share.balsamiq.com/c/me5CfGGeX3R35x2otLqbpu.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have total rows and column count be greater than 0 here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also applies to other wireframes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgodey Those numbers applied to the base table. I might add that information to the table selector menu instead. I feel it's important as an additional reference point when selecting tables, as the name only might not suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wireframe to include those numbers inside the table selector.

---
Wireframes

[Adding Columns](https://share.balsamiq.com/c/oyxTXxqSh8rLqU3JY71DWY.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tabs of "Available", "Formula" and "In Use" are a bit confusing. "Available" vs. "In Use" makes sense. "From Formula" seems like a different feature entirely.

Formulas are used to generate new columns based on different parameters. To access the list of formulas, the user start the `Add Column` process and selects the option `From Formula` at the top of the inspector panel. Selecting a formula will open a form that users can fill out to determine the values of the new column.

Depending on the selected formula, different settings will be available.
More on formulas and specific details for each will be covered on a separate issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create an issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the new issue would also cover selection of a column after a formula is chosen. The wireframe does not seem to include that.

---
Wireframes

[Applying a Formula](https://share.balsamiq.com/c/nEzYaNKNTSv9EFwzwtfSof.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the formula is applied, shouldn't the column be identical to a formula column (in the second screen of this wireframe)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once they save it, yes. I have added the additional step.

---
Wireframes

[Output Summarization](https://share.balsamiq.com/c/rPMwwETnuQ8a4ut8NfmcDB.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still missing data-type specific grouping options


[Applying a Formula](https://share.balsamiq.com/c/nEzYaNKNTSv9EFwzwtfSof.png)

## 3. Transforming the Output Table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need wireframes for deleting steps as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the specs along w wireframes.


[Adding Columns](https://share.balsamiq.com/c/oyxTXxqSh8rLqU3JY71DWY.png)

[Added Column](https://share.balsamiq.com/c/7U9yahZtYWX2G5obkyivoz.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an example with a more nested source so we can see what the source field looks like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a section specific to sources.


For a more detailed overview of this feature, read [Views Product Spec](https://wiki.mathesar.org/en/product/specs/2022-01-views).

## Scenarios
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing scenarios for navigating to and navigating away from the data explorer (from the usual view with tabs and left sidebar).

@kgodey kgodey assigned seancolsen, mathemancer and pavish and unassigned kgodey Mar 29, 2022
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work overall. I left a few specific questions.

Depending on the selected formula, different settings will be available.
More on formulas and specific details for each will be covered on a separate issue.

Columns generated from formulas will display a formula icon indicator in the column header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these have a similar "Column Source" view to represent which (if any) columns provided data which was manipulated by the formula, or should users get this info by inspecting the formula?

I don't have a particular solution in mind here, I'm just wondering if you've thought about it, and what that might look like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if a formula parameter is a column, the sources are displayed.


### 2.4 Filtering Input Column Values

Columns that link to multiple records can have filters applied to them to retrieve only values that match user-specified criteria. Multiple filters are allowed for each input column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's reasonable to expect users to come up with a filter that results in a 1:1 mapping between the base table rows and the linked column entries. In the example wireframe, we're aggregating under the assumption that there are still more rows in the filtered column than in the base table. Is that intended to be the only option? What if their filter removes (or would remove) all values associated with a row. Do we need to have a NULL sometimes, and a zero in other cases (e.g., NULL for first-character matching, and 0 for counting). If we're planning to have NULL when we clear out a given section of a linked column, why restrict to multiple-record links?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathemancer Could you provide an example of what you're envisioning here? I'm having a hard time understanding the issue from your description.

Copy link
Contributor

@mathemancer mathemancer Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we're aggregating actors in movies into lists, but we're only interested in actors whose first name starts with 'P'. Then filtering might produce some movies with no actors after filtering for only actors fitting the description. I assume we'd want to show NULL for that case. What if instead, we're aggregating the number of Academy awards held by the casts of movies? This would be a sum, rather than a list. But then the value shown for a movie with a cast none of whom ever won an Oscar should probably be 0, not NULL. Is this the intended UI in these cases? Are there other cases to consider?

Now, since we're opening the possibility of ending up with no results in the linked columns for some movies, what if we have some many-to-one relationship (maybe movie studios as a toy example)? Each movie has one studio, and for some reason we're only interested in studios starting with the letter 'P'. Why not give the possibility of filtering the linked studio column so that we show 'Pixar' and 'Paramount', but only NULL where there would have been 'Universal'? My question there is whether the restriction to only allowing filtering of linked records in cases of columns that link multiple records is needed.

Copy link
Contributor

@kgodey kgodey Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your question now. I'm concerned that allowing filtering for those types of use cases would overload the concept of filters. I think users should use formulas for use cases like that instead, e.g. the Is True formula or the Starts With formula.


### 3.3 Sorting the Output Table

Users can sort the output table by applying a sort to any of the result table columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include other aggregations? I'm not sure how this should work for non-numeric aggregations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, non-numeric aggregations can only be displayed as lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question in that case is whether we want to sort by lists. That may be an awkward concept for users, since sorting by a list isn't a very natural thing to do, and there are a number of slightly weird options for how the sorting would actually work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't allow users to sort by aggregated columns? I can see how that would get confusing.

@ghislaineguerin what do you think?

@seancolsen
Copy link
Contributor

Caveats about my review

  • For the sake of time, I have not read the other reviews. Apologies if some of this feedback overlaps with other comments.

  • For the sake of time, I'm withholding critique of all the "Formulas" features because (based on conversations with Kriti) my understanding is that we're seeking to defer implementation of formulas until after alpha.


General questions and critique

  • (1) For our own product terminology, what's the difference between "Data Explorer" and "Query Builder"?

  • (2) How do I navigate to the Data Explorer? What is the URL? From the Data Explorer, how do I navigate to view a table?

    I think the "Select Table" step should be its own page. When I'm selecting a table, I don't need to see any of the query-building UI. And when I'm building my query, I'd like to see a button somewhere like "Start over" or "New Query".

  • (3) If I build a query, close my browser tab (without saving to a view), then return to the data explorer, is my query saved? Or do I need to start over?

    This may seem like an implementation detail, but I think we should figure it out during the spec process because there are some underlying architectural decision that affect UX and may require more UI.

    I see the following approaches:

    • "Ephemeral queries" - If I close the tab, I lose my work. The benefit to this approach is that it's simple to implement and easy for the user to build two queries side-by-side in separate tabs. The downside of course is that users may inadvertently lose their work.

    • "Server persistent queries via a custom data structure" - During every step of the process, we'd save the query to the server. This is a nice safeguard, but significantly complicates the use-case of running multiple queries side-by-side without saving them to views.

    • "Server persistent queries via views" - Automatically save to a view, but provide a mechanism mark the view as an incomplete query.

    I'm torn on which approach I think is best. There may be other approaches too.

  • (4) A substantial amount of screen space is dedicated to the top header bar with "Exploring" and "Movie". Can we get rid of this header?

  • (5) Why are limit and offset different from the pagination on tables?

  • (6) Like Pavish mentioned on the call, I'm also interested in merging the left and right sidebars into one.

    As far as I can tell, this seems pretty straightforward...

    1. We'd have one sidebar. It could be on the left or right. I don't care.

    2. The sidebar would display one (and only one) "pane" at a time.

    3. We'd have three possible panes:

      • "Builder" - (with sections: "Columns", "Transform Result", "Query Details", and "Save Options")

      • "Source Selector" - shown titled "Add Columns" here

      • "Source Config"

      (The pane names would not be displayed to the user -- they only exist so that I can refer to the panes here.)

    4. The sidebar would show the "Builder" pane by default.

    5. When any non-default pane displays, a "Back" button would appear at the top, taking the user to the previously displayed pane.

    6. Clicking the "Add Column" button would show the "Source Selector" pane.

    7. Selecting a column would display the "Source Config" pane.

  • (7) Can the user re-order the columns? How? Drag/drop? From the "Columns" section? From the result table? Both?

  • (8) There are lots of text boxes that seem like they're probably editable but don't have a "Save" button. For example: filter values, "Limit", "Offset", column names, etc. How do I apply the changes I make within these text boxes? E.g. changing the name of a column.

  • (9) Is the following query possible? If so, how do I do I add a filter as an OR condition across two columns?

    SELECT "Title"
    FROM "Movies"
    WHERE
      "sequel_id" IS NOT NULL OR
      "prequel_id" IS NOT NULL;

2.2 Input column sources

  • (10) Within the "Input Column Source" wireframe, I find the "Link" info to be a bit confusing.

  • (11) Here's an alternate idea for how to present the entire "Input column source" panel:

    image

    I'm not tied to this idea, but it's a quick stab at presenting the info in a manner more akin to my own mental model.

  • (12) If we don't want to use my idea above, then I'd suggest that within the current wire frame we display "Table" before "Column" because table is a higher-level entity than column and I need to know the table first before I know the column.

2.4 Filtering Input Column Values

  • (13) The highlighted "Filter" section in this wireframe took me a while to understand. I see it's purpose now, and see why we need it. But I think this needs a little polish before it will be intuitive to users. I like how we're using separate terms for "Aggregation" and "Summarize". We might want to think of different terms for the filter in this case and the filter that gets applied within "Transform Result". Perhaps a help bubble next to the section heading would be sufficient. I'm not sure.

  • (14) Can I use the Data Explorer to build this query?

    SELECT
      "Person"."full_name" AS "James Bond Actor",
      array_agg("Movie"."Title") AS "Films",
    FROM "Person"
    JOIN "Movie Cast Member"
      ON "Movie Cast Member"."actor_id" = "Person"."id" AND
        "Movie Cast Member"."character_name" = 'James Bond'
    JOIN "Movie"
      ON "Movie"."id" = "Movie Cast Member"."movie_id"
    GROUP BY "Person"."id";

    What's notable here is that "character_name" = 'James Bond' is essentially a multi-record column filter, but then I'm aggregating on a different column that's based on that filtered join.

2.5 Aggregating Input Column Values

  • (15) What are the options for the "Aggregation" dropdown?

  • (16) I'm curious how we intend to handle JOIN vs LEFT JOIN.

    As an aside: my experience is primarily with MySQL. The MySQL syntax supports many different join keywords, but I've come to understand that when writing SQL, I only need to choose between JOIN and LEFT JOIN -- and words like INNER, OUTER and such are superfluous. I'm not certain the same is true for Postgres. And I'm aware that Postgres supports other joins which MySQL does not, such as FULL JOIN. So I hope this part makes sense to Postgres people. I can clarify if needed.

    If I don't want to aggregate my values, I may want to choose between a JOIN (which might eliminate rows from the base table) and a LEFT JOIN (which might have NULL values for the columns in my joined table). Maybe we should give users the choice between the two join types? I'm not sure the best way to present that choice to non-technical users though. It's a difficult concept.

    With an aggregation, I think a LEFT JOIN would suffice in all cases, but there may be scenarios I'm not considering.

Transform Result

  • (17) Can we get wireframes for the UI of each transform?

  • (18) For simplicity's sake, what if we only allow the user to delete the last transformation step? If this is not acceptable, then I have some further concerns which are more complex and will take more time to articulate.

6. Saving the Query as a view

  • (19) After I save to a view, what happens? Does Mathesar keep me on the edit screen so that I can continue tweaking the query? Or does Mathesar navigate me to view the view?

  • (20) If I use the query builder to edit a view, how do I update the view definition? Does every step within the query builder automatically re-save to the view? (I would hope not.) But some users might expect auto-save. How will users know their view definition is not saved?

An input column used in a formula has been deleted

  • (21) The specs say:

    If a user deletes an input column utilized in a formula, then that formula column needs to display the error and prompt the user to resolve it.

    This is tricky. For the sake of simplicity, I'd rather not support this kind of situation that gracefully. I have some additional questions which are dependent on answers to other questions I've already asked. Fully explaining all my thoughts here will take a while, so I'd like discuss this on our next call about views.

---
Wireframes

[Adding Columns](https://share.balsamiq.com/c/oyxTXxqSh8rLqU3JY71DWY.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have the separate "In Use" tab?

If a user adds a column, can they not add it again?
For eg., The user may want to add the same column multiple times and apply different formulas to them.

How would be represent such cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can add a column again. They may, however, change the column names and even the data types of the columns that have been selected (through aggregations or summarization). The 'In use' section would allow users to quickly see which columns are being used in the view. Consider a user inspecting an existing view that they did not create and simply wanting to list all of the sources.

Formulas are used to generate new columns based on different parameters. To access the list of formulas, the user start the `Add Column` process and selects the option `From Formula` at the top of the inspector panel. Selecting a formula will open a form that users can fill out to determine the values of the new column.

Depending on the selected formula, different settings will be available.
More on formulas and specific details for each will be covered on a separate issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the new issue would also cover selection of a column after a formula is chosen. The wireframe does not seem to include that.

---
Wireframes

[Added Input Filter](https://share.balsamiq.com/c/tK2hZy6FB4zVYpN56ejpBk.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter is applied on actor.date_of_birth, while the selected column is actor.id. If we add another column actor.name, would we show this applied filter in both columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would only affect the input column with the filter.

---
Wireframes

[Applying a Formula](https://share.balsamiq.com/c/nEzYaNKNTSv9EFwzwtfSof.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to have a single workflow:

  • Either user adds a formula and chooses the columns (My choice), or
  • User adds a column and chooses the formula..

A formula can modify the column entirely, and already applied filters and aggregations would no longer be applicable. A user adding a formula to an existing column might not understand that the filters & aggregations would get removed.

Overall, I think this additional flexibility would be troublesome for the user and for us to maintain in the long run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghislaineguerin As we discussed yesterday, I think we should remove formulas from this spec entirely and have a separate spec for integrating formulas once we finish the design for mathesar-foundation/mathesar#1252.

---
Wireframes

[Output Filter](https://share.balsamiq.com/c/dAVYnp8VnG2r2HzkSZ3rgp.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find using the text Filter for both input and output to be confusing. Can we rename this to be more descriptive or add additional help content in both places describing what they do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. See my reply to Sean's same question below:

Rather than calling it filter, we could call it filter records or include only records that match this condition.


[Deleting a Transformation Step](https://share.balsamiq.com/c/XeNAKGz1UnDMfscSTLDVp.png)

## 5. Previewing the Query Results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preview needs more screen space. It would be better if we can explore options to merge both panes into one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think merging both panes is a good idea. The left pane represents the structure of the query, the right pane is an "inspector" focusing on context-specific details for the current task the user is doing.

I do agree that the user might want more space for the preview. Making both panes collapsible seems like a better option for this.


## 5. Previewing the Query Results

A preview, or query result table, should be visible while the user is in `Data Explorer`. The result table will change based on the different configurations, for example, if a user applies a filter, the system should refresh the table to show the output with the filter applied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this preview table is similar to normal tables that we display, with pagination options at the bottom.

I dont think limit and offset should be part of transformations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding, pagination is merely a display option, but limit and offset have an effect on the number of records in the table. How do you see users distinguishing between the two?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @ghislaineguerin said, limit and offset are applied to the query definition, this is not pagination. @ghislaineguerin It would be good to explain the limit/offset somehow since new users may not understand the difference. I also think the wireframes (or prototype) should include pagination.

@ghislaineguerin
Copy link
Contributor Author

@seancolsen Thanks for the input. Here are my answers, I will also ask @kgodey to provide additional details where needed.

(1) For our own product terminology, what's the difference between "Data Explorer" and "Query Builder"?

They are the same. We renamed the query builder to data explorer to make it more user-friendly and goal-oriented.

(2) How do I navigate to the Data Explorer? What is the URL? From the Data Explorer, how do I navigate to view a table?

The data explorer will be accessible via the top navigation as well as the toolbar for views and tables. Views can be opened in data explorer to inspect or modify their settings, and tables can be used as the starting point for a new query.

I think the "Select Table" step should be its own page. When I'm selecting a table, I don't need to see any of the query-building UI. And when I'm building my query, I'd like to see a button somewhere like "Start over" or "New Query".

I believe that this is a reasonable recommendation, given that changing the base table will result in the entire query being deleted. It will also be more clear that table selection is not something that can be modified.

(3) If I build a query, close my browser tab (without saving to a view), then return to the data explorer, is my query saved? Or do I need to start over?

In my opinion, "ephemeral inquiries" make sense as a starting point. Rather than just saving views, we wanted users to be able to explore data at any time and use this as a way of assessing, and reviewing data in their tables. It is possible that having new objects created every time they explore will discourage them from using the tool in this manner.

(4) A substantial amount of screen space is dedicated to the top header bar with "Exploring" and "Movie". Can we get rid of this header?

It's possible that we could get rid of it if we implement the changes you've recommended earlier. We'd still need a way to make it clear which table we were looking at.

(5) Why are limit and offset different from the pagination on tables?

According to my understanding, pagination is merely a display option, but limit and offset impact the actual number of records in the table

(6) Like Pavish mentioned on the call, I'm also interested in merging the left and right sidebars into one.

Although it's worth looking into and your suggestions make sense, some of the interactions in the inspector panel could lead to a more comfortable user experience. We are preparing an interactive prototype and will be testing different configurations.

(7) Can the user re-order the columns? How? Drag/drop? From the "Columns" section? From the result table? Both?

They should be able to do so from both locations.

(8) There are lots of text boxes that seem like they're probably editable but don't have a "Save" button. For example: filter values, "Limit", "Offset", column names, etc. How do I apply the changes I make within these text boxes? E.g. changing the name of a column.

When a user leaves the field, the update should be reflected. The assumption is that the result table is updated when the user changes the input columns.

(9) Is the following query possible? If so, how do I do I add a filter as an OR condition across two columns?

```sql
SELECT "Title"
FROM "Movies"
WHERE
  "sequel_id" IS NOT NULL OR
  "prequel_id" IS NOT NULL;
```

I think we only support AND conditions. @kgodey might have a better answer.

(10) Within the "Input Column Source" wireframe, I find the "Link" info to be a bit confusing.

We are attempting to create a more natural approach to describing the links that does not rely on diagrams. For example, "Displaying values from the title column in Movies, which is linked via the recommendationId field in movie recommendations`". Definitely, I believe it can be made clearer.

(12) If we don't want to use my idea above, then I'd suggest that within the current wire frame we display "Table" before "Column" because table is a higher-level entity than column and I need to know the table first before I know the column.

This makes sense. However, we started with a column to better identify the source of the new column values. Because we will be displaying sources in places where the selection is done by column rather than a table, the order made more sense. I'm not sure what the best approach is here; I'll think about it more and respond with a better answer.

(13) The highlighted "Filter" section in [this wireframe]

Rather than calling it filter, we could call it filter records or include only records that match this condition.

(14) Can I use the Data Explorer to build this query?

This, I believe, is possible. In the interactive prototype, I will include an example of this.

(15) What are the options for the "Aggregation" dropdown?

https://wiki.mathesar.org/en/product/specs/2022-01-views/04-formulas/4a-record-aggregations

  • (16) I'm curious how we intend to handle JOIN vs LEFT JOIN.

@kgodey, I'll defer to you.

(17) Can we get wireframes for the UI of each transform?

Will be included in the interactive prototype.

(18) For simplicity's sake, what if we only allow the user to delete the last transformation step? If this is not acceptable, then I have some further concerns which are more complex and will take more time to articulate.

We talked about it, and while your approach makes sense, not all step deletions will result in errors. We'd have to assess how simple it would be for users to update the remaining steps that cause errors. Imagine if a user only wants to remove an initial filter and can't do it without losing all subsequent steps?

(19) After I save to a view, what happens? Does Mathesar keep me on the edit screen so that I can continue tweaking the query? Or does Mathesar navigate me to view the view?

It should remain and disable the save button until changes have been made so that users know it has been saved. I believe we'll need to provide a way to get to the actual view.

(20) If I use the query builder to edit a view, how do I update the view definition? Does every step within the query builder automatically re-save to the view? (I would hope not.) But some users might expect auto-save. How will users know their view definition is not saved?

See the previous answer.

  • (21) The specs say:

    If a user deletes an input column utilized in a formula, then that formula column needs to display the error and prompt the user to resolve it.

I fully agree that this is a tricky scenario. It would be a good idea to go over this further on our next call.

@seancolsen
Copy link
Contributor

Thanks @ghislaineguerin. More thoughts...

  • (2) @seancolsen said:

    From the Data Explorer, how do I navigate to view a table?

    I still have this question.

    Maybe I wasn't clear enough originally. What I mean is: from the Data Explorer page, how do I leave the Data Explorer, and return to the page we currently have implemented where I see different tabs for my tables? If there is a link on the very top of the page, what's it called?

  • (5) @ghislaineguerin said:

    According to my understanding, pagination is merely a display option, but limit and offset impact the actual number of records in the table

    The manner in which the wireframe presents "limit" and "offset" led me to believe that they will be treated as display options. If those settings alter the results of the saved view, then I think the Data Explorer should present the settings as transform steps, along with the other possible transforms.

    Further, the user may actually want to page through the Data Explorer results, so perhaps we add "limit" and "offset" transform steps and add pagination to the Data Explorer results (which does not affect the saved view).

  • (8) @ghislaineguerin said:

    When a user leaves the field, the update should be reflected.

    I would like us to aim for a UX which is more explicit and leaves less for the user to wonder about. Consider the following sequence of events

    1. I enter a new value into the "Column name" text box, then instinctively tab or click away out of habit.

    2. I see that the column name updated without me having to click any "Save" button, so I deduce (incorrectly) that the values update as I type.

    3. I enter a new column name, but this time it happens that I don't instinctively blur the input field. Now I'm confused as to why the column name didn't change to reflect the new value I entered, especially since I don't see a save button. I may be a user who doesn't even understand the concept of "focus" and I may not even think to try blurring the input field as a way of saving the value.

    I'd like to have "Save" buttons. They can be small -- little green check marks, even. And they can be hidden when they're not necessary. But I do think they're somewhat important to creating a UX that doesn't make the user think.

    If we go with the "Ephemeral queries", we may be able to avoid save buttons for some inputs like column names because we could update those as the user types. In contrast, for filter values, we'll need to send an API request, and I'd rather not do that as the user types. All things considered though, I'd rather just use "save" buttons everywhere. Most operations within the Data Explorer will need "save" buttons, so using them everywhere provides a more consistent UX, which I think is important with a complex tool like this.

  • (18) @ghislaineguerin Said:

    Imagine if a user only wants to remove an initial filter and can't do it without losing all subsequent steps?

    Yep. I acknowledge that would be a frustrating limitation. But I think it's okay for the alpha release. I have some front-end implementation feasibility concerns with the design as-spec'ed, which are related to my concerns with (21). Creating data structures to represent the Data Explorer UI state will already be complex. But those data structures will be more complex and more brittle if they need to represent UI state which is invalid in weird ways. I can explain more on a call. I have some pretty strong concerns about this though.

    not all step deletions will result in errors.

    What if we only allow the user to delete steps that don't result in errors? If the step will cause errors, then the delete button would be disabled.

@kgodey
Copy link
Contributor

kgodey commented Apr 6, 2022

Responses to comments from @seancolsen and @ghislaineguerin:

I think the "Select Table" step should be its own page. When I'm selecting a table, I don't need to see any of the query-building UI. And when I'm building my query, I'd like to see a button somewhere like "Start over" or "New Query".

This was integrated into the main query builder page so that if a new user clicked on "Data Explorer", they could have an idea about what was going to happen next. If we just ask them to select a table in a separate page (we had this in a previous wireframe version), they won't know what the context is.

@ghislaineguerin It would be good to integrate reasoning like this into the spec so that if we go back and look at the spec, we'll remember why we made the design decisions we did.

  • (4) A substantial amount of screen space is dedicated to the top header bar with "Exploring" and "Movie". Can we get rid of this header?

We need to have the base table in a prominent place because the rest of the query depends on it. The columns available are based on relationships to the base table, the source information displayed is relative to the base table. I think it helps orient users to have it displayed over everything else. I do think we could put in an explanation there about why it's relevant.

We could also use the header for other purposes (e.g. if we're using the data explorer to edit an existing view, we could put information about the view there).

@ghislaineguerin It would be good to document that the main purpose of the header to orient the user and provide context for the whole query.

(6) Like Pavish mentioned on the call, I'm also interested in merging the left and right sidebars into one.

Please see my response to @pavish's comment here: #42 (comment) I think that making the panes collapsible is a better option than trying to merge them because having a separation of "where am I in the query" and "what am I currently working on" would help orient users.

(9) Is the following query possible? If so, how do I do I add a filter as an OR condition across two columns?

@ghislaineguerin I do think we should support OR filters similarly like we do for tables. We should have a consistent filter component everywhere we use filters so that the user knows what to expect when they see the word "filter".

(16) I'm curious how we intend to handle JOIN vs LEFT JOIN.

If I don't want to aggregate my values, I may want to choose between a JOIN (which might eliminate rows from the base table) and a LEFT JOIN (which might have NULL values for the columns in my joined table). Maybe we should give users the choice between the two join types?

We're not currently giving users this option, we're defaulting to LEFT JOIN. I think this is fine for the alpha release. If there are queries that users want to make that are not served by this version, we'll figure that out later.

  • (19) After I save to a view, what happens? Does Mathesar keep me on the edit screen so that I can continue tweaking the query? Or does Mathesar navigate me to view the view?

  • (20) If I use the query builder to edit a view, how do I update the view definition? Does every step within the query builder automatically re-save to the view? (I would hope not.) But some users might expect auto-save. How will users know their view definition is not saved?

These are out of scope for this spec, they will be handled in:

Further, the user may actually want to page through the Data Explorer results, so perhaps we add "limit" and "offset" transform steps and add pagination to the Data Explorer results (which does not affect the saved view).

I think this is a good idea. I also commented about this here: #42 (comment)

Most operations within the Data Explorer will need "save" buttons, so using them everywhere provides a more consistent UX, which I think is important with a complex tool like this.

I agree with this.

What if we only allow the user to delete steps that don't result in errors? If the step will cause errors, then the delete button would be disabled.

@ghislaineguerin I think it would be fine to only allow users to delete the last step. I don't think we can figure out which steps will cause errors and which won't, since the query is built in-order. Even if we can figure it out, it will be a hard problem to solve technically and I don't think it's worth putting effort into for the alpha release.

The other alternative I see is that deleting a step would also delete all subsequent steps, but that seems like a worse UX.

@github-actions
Copy link

This pull request has not been updated in 45 days and is being marked as stale. It will automatically be closed in 30 days if not updated by then.

@github-actions github-actions bot added the stale May be out of date label May 22, 2022
@github-actions github-actions bot closed this Jun 22, 2022
@kgodey kgodey deleted the data-explorer-specs branch June 22, 2022 22:25
@ghislaineguerin ghislaineguerin merged commit 0e5fb41 into master Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale May be out of date status: review In review
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Design for visual query builder ("Data Explorer")
5 participants