Skip to content

[Tagging] Implementation of Tag Validation #127

Closed
@BethanyG

Description

@BethanyG

Background

While implementing tagging tests, a number of bugs with the slugify() code and tagging were revealed. In particular, issue #123 and #124 have started a discussion around what we want to allow or disallow in tags, how we want validation and error handling to work with tagging, and weather or not we want to include related tags from a particular endpoint (e.g. resources, hangouts, learning paths, discussions, projects, etc.). Since much of that discussion happened inside the PR for tagging tests, we decided we needed to port it to it's own issue, as we work out the intended behavior and archetecture.

Relevant section of PR #121 around the implementation of tag validation is below:


chris48s** 7 days ago

Author Member

In these two cases, the tests pass based on the values from the fixtures but it seems like these aren't generating particularly useful slugs (esp when you consider the objective of slugging is to make the text unique and URL-safe). What should we be doing here?
If a single emoji isn't a valid tag name, shoudl we just be validating here so that trying to save a tag called "🐸" throws ValidationError?

BethanyG 7 days ago

Member

I think that if we can't get emojis to work correctly through slugifying, then yes - we should throw a validation error. I can log an issue about that...along with our issues around Hindi and Telugu. I really don't want to have to transliterate, but we may need to go that route if we can't find an alternative to what slugify is doing to those languages.

BethanyG 7 days ago

Member

Logged issue #123 for the script fail and #124 for the emoji.

lpatmo 7 days ago

Member

(fwiw from a product perspective, I think it's totally fine for us not to support emojis in tags. Validating against it sounds reasonable!)

👍 1

BethanyG 20 hours ago

Member

So - I think adding in validators (and tossing errors) for emoji and picto-ascii would be a good thing. But we have some behaviors to think through for the api/app. I am seeing multiple scenarios - but also have questions:

  1. Single or repeated unicode or ascii emoji/symbols as the sole content of a tag : reject out of hand.
    • validator for DRF should toss an error here. HOWEVER - does that mean the entire creation of the resource fails? I don't think so, since tags are not technically part of a resource....but that then creates some issues.
    • if a validation error is thrown, do we create the resource, but hand back a message to the effect that one or more of the tags was invalid? HTML 207 with an embedded 400 and a message of "Your tag text contains one or more unsupported characters, and was not saved."? This needs some detailed scoping.
    • if a validation error is thrown, do we create the resource but omit all the tags and hand back a message?
    • if only one of multiple tags has a problem, do we drop the "bad" tag, or do we fail the whole set of tags?
  2. Single or repeated unicode or ascii emoji/symbols as part of a tag : do we drop these silently - creating the tag/slug without them, or do we throw a validation error for the entire tag? And if we do throw a validation error for the entire tag, which scenario from above do we apply when creating the resource ?
  3. When we validate, will we be validating for both the name and the slug?

chris48s 12 hours ago

Author Member

Single or repeated unicode or ascii emoji/symbols as the sole content of a tag : reject out of hand.

Done.

Single or repeated unicode or ascii emoji/symbols as part of a tag : do we drop these silently - creating the tag/slug without them, or do we throw a validation error for the entire tag?
When we validate, will we be validating for both the name and the slug?

So far, the only validation I've done applies to the slug, so basically whichever variation claims the slug first gets it i.e: "Javascript 🙂" is a legal tag name, and if you create "Javascript 🙂" first, that tag now owns the slug javascript.

If you want to apply the same rules to the name, I wonder if it is actually useful for the tag name and slug to be different?


The other questions on API behaviour are probably best moved to another issue to avoid trying to boil the ocean in one PR as this is already getting long. Lets just focus on the model behaviour here.

My gut instinct though is that working out these behaviours would probably be much easier if the operations of:

  • creating tags
  • creating resources
  • applying tags to resources

were 3 separate endpoints/operations. Then you can know if your tag is valid before you apply it to a resource. You can still tie those 3 things into what feels like a single 'page' on the frontend for a nice UX.

lpatmo 3 hours ago

Member

Ohh, I see -- so after a user finishes typing a tag, we can fire off a POST request to a /tags endpoint to check whether that tag is valid. Then they submit the form.

applying tags to resources

Can you say more about how this endpoint would work?


HOWEVER - does that mean the entire creation of the resource fails?

From a naive user perspective, if I was submitting to create a resource and had a tag that was invalid in some way (e.g. had an emoji character), then having the entire form submission fail on submit (e.g. no tags or resources is created) is reasonable to me. I'd expect to see an error about some invalid character in one of the tags, change it appropriately, then re-submit the form.

BethanyG (#121 (comment))

Member

Humm.. as @chris48s suggested -- we probably should take this to a separate issue. So I will make that shortly. Especially since this convo will be hidden once the PR is merged in.

I think the web UX could be fairly straightforward, and could include both a UI message (the following characters are not allowed for tags) & blocking the user from typing said emoji/ascii/character sets (a background form/field validation that dropped or highlighted any unicode or ascii characters in that range, similar to what I have seen with some set password fields). I don't think there is even a need there to fire off a POST - we just need to agree which characters/character sets are going to be off limits, and then fail them before they can get to the backend (theoretically).

But the API itself is a bit different (since we're building for more than the website, and can't give feedback to users in the same way) -- and in that case, for clarity, we probably do want to have a whole creation failure of an object with a message about why. ("your tags contain invalid/out of bounds characters. Characters in the following ranges are not allowed. ...").

OR we have a a separate tagging endpoint & action (pass in the GUID of the object to be tagged, along with a list of tags and get back a verification that tagging for that object has succeeded or failed w/ a message about why).

Now - what characters would cause that is still a bit open....

I would disallow emoji and picto ascii altogether in either tag names or slugs. Why go there? Doesn't seem to add significant value to our users at the moment.

On the other hand, having the ability to tag or note something in your native language does have value - so I still want to noodle on that problem.

lpatmo](https://github.com/lpatmo)** 22 minutes ago

Member

Sounds good! (Btw, to clarify my earlier statement, I was talking about the error message from an API call as well -- that is, even though we'd have front-end validation, I'd like to see an error message returned if I say make a POST request in Postman to create a resource with problematic tags included. Then I'd expect the entire request to fail, if that makes sense. :P)

image
--> error


Decisions We've Made So Far:

  1. The backend/api will throw a ValdiationError for any tag that contains what we consider "invalid" characters.
  2. A validation error will cause the POST request to create a new resource to fail with a return message along the lines of "One or more of your tags contain invalid/unsupported characters."

What We Still Need to Decide/Discuss

  1. What sets of characters will cause a ValidationError?
  2. Will both the name of the tag and the slug of a tag be validated?
  3. Will we continue having tags returned as part of a resources endpoint, or will we move toward having tagging as its own endpoint?
  4. If tagging becomes its own endpoint, what does that look like, and what is the expected flow/interaction?

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is neededneeds discussionThe fix for this issue needs discussionquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions