Skip to content

Conversation

@hippietrail
Copy link
Collaborator

Issues

No issue but this is the result of my thoughts on #2164

Description

Currently ExprLinter works on "chunks", where a chunk can be thought of as a clause, or a section of text between commas.

This means if you want to work on whole sentences, you either have to instead use Linter and not be able to use patterns/expressions, or when a sentence contains a comma, you'll only see parts of a sentence at once.

In the case of the potential missing-question-mark linter, that could result in an ExprLinter thinking it's looking at the beginning or ending of a sentence (an interrogative or ending with something other than a ?)

This PR adds a way to optionally choose to run an ExprLinter over sentences instead of chunks. The default is still chunks and no existing code needs to be changed to retain the default behaviour.

How Has This Been Tested?

I added unit tests to use the same toy ExprLinter on either chunks or sentences with a pattern containing a comma that fails for chunks and succeeds for sentences.

Please go ahead and critique my implementation!

Checklist

  • I have performed a self-review of my own code
  • I have added tests to cover my changes

"A test"
}

fn get_unit(&self) -> Unit {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the linter could return different values for different invocations, I'm not sure this is the best strategy.

Ideally, and I'm not sure if you could reuse this work for this, we would have two distinct traits, say ChunkExprLinter and SentenceExprLinter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did at first think I'd split it into two linters but I also thought making the change smaller and keeping the logic in one place (DRY) might help, such as the _with_context enhancement - which should combine with this smoothly.

What are your thoughts on parameterizing it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it this way, we'd have to refactor a significant portion of the lint_group's caching system to account for sentences. My main concern is that the get_unit function can return different values on each invocation.

I agree with your sentiment around keeping things DRY. Maybe using an associated type would be a happy medium?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

associated type

I'll be investigating that today then!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it this way, we'd have to refactor a significant portion of the lint_group's caching system to account for sentences. My main concern is that the get_unit function can return different values on each invocation.

I agree with your sentiment around keeping things DRY. Maybe using an associated type would be a happy medium?

I spent a few hours on this and feel like there's some advanced Rust stuff I've seen in videos but that is not yet within my reach that might help here.
The point is that each linter will only work on chunks or sentences and never both. Doing so would be an error but the way I coded it up in the PR has no way to enforce that.
I wanted to not have to change every ExprLinter. Maybe some way of making ExprLinter and ExprSentenceLinter as newtypes of a parameterized "abstract class" for want of a better term, that could somehow encapsulate iterate_chunks vs iterate_sentences into some kind of type or trait, but I'm not at a skill/knowledge level yet in Rust where I can even discern if that's possible, let alone implement it.

Any ideas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I should've been able to find a way to implement it sentence iteration without having to change every ExprLinter but I gave up and went with a way that modifies them all.

Is it going to avoid the problems with the caching?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your solution solves the caching problem. I think it's good that we have to declare it in every linter, since that makes it easy to tell which is which at a glance.

Copy link
Collaborator

@elijah-potter elijah-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. This PR is almost perfect. I like the test cases, the implementation, with just a few nits about naming. Once this is merged, I'd love to see a follow-up with an actual sentence-based ExprLinter so we can have one running in production as soon as possible.

///
/// This function is not significantly different from [`Self::add`], but allows us to take
/// advantage of some properties of [`ExprLinter`]s for cache optimization.
pub fn add_expr_linter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now restricted to only Chunk-based ExprLinters, I think the naming of this function and the associated expr_linter field should assume more appropriate names.

Something else to consider: should we have a specialized cache for sentence linters as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think I did rename that to add_chunk_expr_linter in a previous attempt. Smaller diffs make PR reviewing easier though so changing it now is probably good.

As for the caching that's a part of the code I haven't looked at at all. I don't know if the associated types propagate the same way that parameterized types do but I think so right? And I don't know if the caching code is inherently affected by that type?

"A test"
}

fn get_unit(&self) -> Unit {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your solution solves the caching problem. I think it's good that we have to declare it in every linter, since that makes it easy to tell which is which at a glance.

@hippietrail
Copy link
Collaborator Author

Great stuff. This PR is almost perfect. I like the test cases, the implementation, with just a few nits about naming. Once this is merged, I'd love to see a follow-up with an actual sentence-based ExprLinter so we can have one running in production as soon as possible.

The quintessential linters that would be affected by this are surely the Oxford comma linters, which are now just Linter's rather than ExprLinter's but fail if a list follows a clause? that ends with a comma.

But that's probably a big job. I had some other problem in mind when I first thought of sentence iterators but I forget what that was...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants