Skip to content

Conversation

@betodealmeida
Copy link
Member

This PR introduces a tagging system for Superset, allowing dashboards, charts and queries to be tagged. It also allows searching for a given tag, and will be the basis for a new landing page (see #5327).

Adding tags

Here's an example showing adding tags to a chart (dashboards work the same way). Suggestions come from pre-existing tags, and are ordered by popularity:

add_tag

And here's how it looks like when a user is not an owner of the chart/dashboard:

screen shot 2018-07-30 at 3 44 31 pm

Searching for tags

Clicking on a tag will take the user to a page searching for that tag. For example, clicking on the "test" tag:

screen shot 2018-07-30 at 3 45 50 pm

For this, I added a new tab "Tags" to the welcome page, using the same layout as the "Favorites" tab. I changed the page so that the active tab can be specified by the URL anchor, as well as the search parameter. For example, the URL /superset/welcome?q=test#tags will show the "Tags" tab instead of the default "Dashboards" with the search form pre-populated with "test".

Implicit tags

Dashboard, chart and (saved) queries have implicit tags related to their owners, types and favorites. For example, all objects owned by the admin have the tag owner:1. All charts have the tag type:chart. Objects favorited by the admin have the tag favorited_by:1.

These tags are automatically added by a migration script, and kept in sync through SQLAlchemy event listeners. They are currently not surfaced to the user, but can be searched for. For example, it's possible to search for owner:1 in the welcome page to see all objects owned by the admin, or even search for owner:{{ current_user_id() }}.

Queries

Currently queries cannot be tagged, because I wasn't sure were to add the component. Any suggestions, @vylc?

@betodealmeida
Copy link
Member Author

Fixing tests.

from .base import BaseSupersetView


class ObjectTypeConverter(BaseConverter):
Copy link
Member

@hughhhh hughhhh Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to a utils.py file?

Also get some unit test for these basic methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, this class is used for validating the path in the views defined in this file. For example:

@expose('/tags/<object_type:object_type>/<int:object_id>/', methods=['GET'])
def get(self, object_type, object_id):
    ...

Using the validator, /tags/chart/1/ is a valid path (since chart is in the ObjectTypes enum) but /tags/foo/1 is invalid. Since the concept of object_type is only used in tag views, I think moving this to utils.py will not be very useful, since it's unlikely that it will be used in other places.

I'll add unit tests, good point.

@vylc
Copy link

vylc commented Jul 31, 2018

(1) Would suggest that the 'Altered' token goes before the tags, so as to reduce all the movement when you add tags
(2) What is the flow to search for charts/dashes with multiple tags? ie. if I wanted to search for 'test' and 'druid'. Seems like the Search box should be tokens instead of text?
(3) Re: queries. Not urgent, would rather get landing page shipped. We can put it in the dropdown menu for the query name tab
Looks good!

@betodealmeida
Copy link
Member Author

betodealmeida commented Jul 31, 2018

(1) Would suggest that the 'Altered' token goes before the tags, so as to reduce all the movement when you add tags

⚡️

(2) What is the flow to search for charts/dashes with multiple tags? ie. if I wanted to search for 'test' and 'druid'. Seems like the Search box should be tokens instead of text?

You're right. Right now it works by separating the tags with a comma, test,druid, but tokens would be much better. I'll change it.

(3) Re: queries. Not urgent, would rather get landing page shipped. We can put it in the dropdown menu for the query name tab

👍

@codecov-io
Copy link

codecov-io commented Aug 1, 2018

Codecov Report

Merging #5524 into master will decrease coverage by 0.05%.
The diff coverage is 58.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5524      +/-   ##
==========================================
- Coverage    63.6%   63.54%   -0.06%     
==========================================
  Files         359      364       +5     
  Lines       22781    23185     +404     
  Branches     2530     2556      +26     
==========================================
+ Hits        14489    14734     +245     
- Misses       8277     8436     +159     
  Partials       15       15
Impacted Files Coverage Δ
...ets/src/SqlLab/components/TemplateParamsEditor.jsx 91.83% <ø> (ø) ⬆️
superset/config.py 92.59% <ø> (ø) ⬆️
superset/views/__init__.py 100% <100%> (ø) ⬆️
...uperset/assets/src/dashboard/components/Header.jsx 61% <100%> (+2.48%) ⬆️
superset/views/base.py 70.39% <100%> (+0.33%) ⬆️
superset/assets/src/dashboard/util/constants.js 100% <100%> (ø) ⬆️
superset/assets/src/modules/utils.js 44.52% <20%> (-0.55%) ⬇️
superset/assets/src/chart/chartAction.js 48.76% <20%> (+1.98%) ⬆️
superset/assets/src/tags.js 26.98% <26.98%> (ø)
superset/views/tags.py 33.69% <33.69%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 222b79d...b708540. Read the comment docs.

@betodealmeida
Copy link
Member Author

Phew! So many fixes.

@betodealmeida
Copy link
Member Author

@vylc I added better searching for tags:

tag_search

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to release this round of review on the Python side, I'll do another round on the JS side, coming soon.

Session = sessionmaker(autoflush=False)


class TagTypes(enum.Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though enums were a py3.4 thing but they appear to have been backported to 2.7, maybe 2.7.6?

I'm starting to think we should deprecate our supporte for <py3.6 and embrace all the new great stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think even in 2.7.3. There's also a backport enum34 which we could add as an optional dependency, something like:

# setup.py
if sys.version_info < (2, 7, 3):
    deps.append('enum34')

And I'm +1 on moving towards newer versions. I think when developing libraries it's important to be more conservative, but for applications it's fine.

"""

# explicit tags, added manually by the owner
custom = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally [despite the perf implications] I prefer having readable strings in the db. You could also model this as fks to reference tables so that the db has all the information instead of a fk to an enum that lives only in code.

From what I'm seeing I'm thinking the tag name may have the info anyways, which makes it work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLAlchemy is storing them as strings in the DB, though (at least in sqlite):

sqlite> SELECT DISTINCT object_type FROM tagged_object;
chart
dashboard
query
sqlite> .schema tagged_object
CREATE TABLE tagged_object (
        created_on DATETIME,
        changed_on DATETIME,
        id INTEGER NOT NULL,
        tag_id INTEGER,
        object_id INTEGER,
        object_type VARCHAR(9),
        created_by_fk INTEGER,
        changed_by_fk INTEGER,
        PRIMARY KEY (id),
        FOREIGN KEY(tag_id) REFERENCES tag (id),
        CONSTRAINT objecttypes CHECK (object_type IN ('query', 'chart', 'dashboard')),
        FOREIGN KEY(created_by_fk) REFERENCES ab_user (id),
        FOREIGN KEY(changed_by_fk) REFERENCES ab_user (id)
);

session = Session(bind=connection)

# add `owner:` tags
for owner_id in cls.get_owners_ids(target):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could refactor 109-117 into it's own method as it looks the same as 148-158

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️

object_type = 'query'

@classmethod
def get_owners_ids(cls, target):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note that eventually I'd like to get all models to be consistent and have and owners many to many field. No need to change it as part of this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

TaggedObject.object_id == target.obj_id,
Tag.type == TagTypes.favorited_by,
Tag.name == name,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just query.delete() at the next line here (as long as you change session.query(TaggedObject.id) -> session.query(TaggedObject)), no need for another query I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, but SQLAlchemy complained saying it can't delete() on a join. It works where we have only a single table, eg:

session.query(TaggedObject).filter(
    TaggedObject.object_type == cls.object_type,
    TaggedObject.object_id == target.id,
).delete()


@expose('/tags/suggestions/', methods=['GET'])
def suggestions(self):
query = db.session.query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the way I like to break lines when doing method chaining in python is to wrap in parenthesis and break before the dot as in:

query = (
    db.session.query(TaggedObject)
    .group_by(TaggedObject.tag_id)
    .order_by(func.count().desc())
    .all()
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very javascripty, but I'm ok with doing it that way. :)

I miss the pipe operator from Hack:

function piped_example(array<int> $arr): int {
  return $arr
    |> \array_map($x ==> $x * $x, $$)
    |> \array_filter($$, $x ==> $x % 2 == 0)
    |> \count($$);
}

tags = json.dumps([
{'id': obj.tag.id, 'name': obj.tag.name} for obj in query])

return Response(tags, status=200, content_type='application/json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use return self.json_success(tags) where tag is the object, not the json string.

optional side-mission: Looks like we may want to move https://github.com/apache/incubator-superset/blob/master/superset/views/core.py#L86 to view/base.py as a method of BaseSupersetView

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this new BaseSupersetView.json_success expects an object, it could be a way to standardize to simplejson.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is more complicated than I was expecting. The core view uses different encoding options depending on the visualization type (by calling viz_obj.json_dumps), so it requires a lot of refactoring.

Let's do that on a separate PR.

return Response(status=201) # 201 CREATED

@expose('/tags/<object_type:object_type>/<int:object_id>/', methods=['DELETE'])
def delete(self, object_type, object_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that some of these endpoints we'd get for free by creating a FAB ModelView object, it may not be exactly what you need though, unclear. I'm ok with the more explicit/controlled endpoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

TaggedObject.object_id == SavedQuery.id,
TaggedObject.object_type == ObjectTypes.query,
),
).group_by(TaggedObject.object_id, TaggedObject.object_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand what kind of SQL will come out of that query object... Is the group_by needed?

Also you may be able to parameterize the ObjectTypes as in:

tagged_models = [Slice, Dashboard, Query]
for tm in tagged_models:
    query = query.outerjoin(tm, {...})

It would make it easier to add more tagged models in the future but may make the code more opaque.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group by is needed because a user might search for tags "a" and "b", and if an object is tagged with both it would return duplicates.

@betodealmeida
Copy link
Member Author

@williaster, I've replaced the URL polyfill with urijs (which actually has a nicer interface, resulting in smaller code!).

@betodealmeida
Copy link
Member Author

@elibrumbaugh, thanks for the feedback!

I can’t tell based on the .gif what visual affordances in the UI tell the dashboard owner this is where they add a tag.
Dashboards can have really long names. The example currently shown is with a best case dashboard name and not faced with the space issue.

Good point, right now this is far from optimal:

width

I'll see if I can make the component have a fix width and have the tags wrap. Thoughts on this?

The tag visuals clash with the visuals for altered. Side note the white text on yellow tags isn't accessible. The blue isn't either.

The colors come from the Bootstrap theme. I created #5590 to address the problem separately.

You could place the tags in the dropdown menu as well. Quickly accessible but not reducing data ink.
Why not allow the dash owner to add these on the edit dashboard metadata page?
Alt. why not allow the dash owner to add these in a modal that is linked from the dropdown with other similar actions?

From a UX point of view we think they should not be hidden, neither to read nor to write. Let's talk about this Friday during our meeting? I'm ok with changing the way tags are created — it should be easy to change that while keeping the data model and the REST endpoints from this PR.

I like the tag link to a search experience that captures all content with that tag.

😄

@vylc
Copy link

vylc commented Aug 10, 2018

@elibrumbaugh can you think of a good way to get back to that overall landing page w/out having to click on a tag? not completely intuitive clicking on a tab will take you back to that page

@betodealmeida
Copy link
Member Author

@williaster, are you ok with moving forward and merging this PR, and then changing the way the tag component is display in dashboards?

@williaster
Copy link
Contributor

@betodealmeida in the meetup last week I thought we discussed possibly splitting this into two parts: one for backend and one for frontend?

So I'd propose either

  1. just merge backend here, do explore/dashboard frontend in another PR or
  2. do both here if you don't want to split it

I think we should not merge PRs that affect UI/UX if the UI/UX is not where we want it to be.

@betodealmeida
Copy link
Member Author

@williaster, I removed the tag input from this PR, leaving only the backend.

return (
<div className="react-tags-rw">
{this.state.tags.map(tag => (
<Label bsStyle="info">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a key prop as explained here.

@betodealmeida
Copy link
Member Author

Moved the backend to #6823.

@stale
Copy link

stale bot commented Apr 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 10, 2019
@stale stale bot closed this Apr 17, 2019
@1AB9502
Copy link

1AB9502 commented Apr 30, 2019

@betodealmeida ,

this looks so good, is there a link to the frontend files?

@betodealmeida
Copy link
Member Author

@1AB9502 I'll post the PR for the frontend today for review.

@betodealmeida
Copy link
Member Author

@1AB9502 still need some fixes, but here it is: #7418

@1AB9502
Copy link

1AB9502 commented May 1, 2019

@betodealmeida

Huge thanks!

@junlincc
Copy link
Member

junlincc commented Mar 9, 2021

@mihir174 @michael-s-molina @Steejay

Hey both, not sure if you were aware of this PR. Tagging was once bought up with both BE and FE solutions, it was closed because the proposed UI/UX was not ideal. To bring tagging(BE) back to life, we will have to rewrite a big trunk of work basically, but still, a shortcut to solve the content organization problem.
We were wondering if we can use tagging for grouping, searching, hiding even temporarily deleting objects.

@Steejay
Copy link

Steejay commented Mar 9, 2021

thanks @junlincc! tagging seems like a potential solution!

From previous research around query grouping, sharing was another part of the workflow that was important to consider when thinking about obj organization. how might we share a group of queries (dashboards, charts, etc)?

Searching for the object is one of many steps in a workflow. What other steps do we need to consider when designing for object organization? Some open questions to consider as we move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inactive Inactive for >= 30 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants