feat: allow `ExprLinter` to work on sentences as well as chunks #2165

hippietrail · 2025-11-10T11:08:04Z

Issues

No issue but this is the result of my thoughts on #2164

Description

Currently ExprLinter works on "chunks", where a chunk can be thought of as a clause, or a section of text between commas.

This means if you want to work on whole sentences, you either have to instead use Linter and not be able to use patterns/expressions, or when a sentence contains a comma, you'll only see parts of a sentence at once.

In the case of the potential missing-question-mark linter, that could result in an ExprLinter thinking it's looking at the beginning or ending of a sentence (an interrogative or ending with something other than a ?)

This PR adds a way to optionally choose to run an ExprLinter over sentences instead of chunks. The default is still chunks and no existing code needs to be changed to retain the default behaviour.

How Has This Been Tested?

I added unit tests to use the same toy ExprLinter on either chunks or sentences with a pattern containing a comma that fails for chunks and succeeds for sentences.

Please go ahead and critique my implementation!

Checklist

I have performed a self-review of my own code
I have added tests to cover my changes

elijah-potter · 2025-11-13T19:12:30Z

harper-core/src/linting/expr_linter.rs

+            "A test"
+        }
+
+        fn get_unit(&self) -> Unit {


Since the linter could return different values for different invocations, I'm not sure this is the best strategy.

Ideally, and I'm not sure if you could reuse this work for this, we would have two distinct traits, say ChunkExprLinter and SentenceExprLinter.

I did at first think I'd split it into two linters but I also thought making the change smaller and keeping the logic in one place (DRY) might help, such as the _with_context enhancement - which should combine with this smoothly.

What are your thoughts on parameterizing it?

If we do it this way, we'd have to refactor a significant portion of the lint_group's caching system to account for sentences. My main concern is that the get_unit function can return different values on each invocation.

I agree with your sentiment around keeping things DRY. Maybe using an associated type would be a happy medium?

associated type

I'll be investigating that today then!

If we do it this way, we'd have to refactor a significant portion of the lint_group's caching system to account for sentences. My main concern is that the get_unit function can return different values on each invocation.

I agree with your sentiment around keeping things DRY. Maybe using an associated type would be a happy medium?

I spent a few hours on this and feel like there's some advanced Rust stuff I've seen in videos but that is not yet within my reach that might help here.
The point is that each linter will only work on chunks or sentences and never both. Doing so would be an error but the way I coded it up in the PR has no way to enforce that.
I wanted to not have to change every ExprLinter. Maybe some way of making ExprLinter and ExprSentenceLinter as newtypes of a parameterized "abstract class" for want of a better term, that could somehow encapsulate iterate_chunks vs iterate_sentences into some kind of type or trait, but I'm not at a skill/knowledge level yet in Rust where I can even discern if that's possible, let alone implement it.

Any ideas?

I thought I should've been able to find a way to implement it sentence iteration without having to change every ExprLinter but I gave up and went with a way that modifies them all.

Is it going to avoid the problems with the caching?

It looks like your solution solves the caching problem. I think it's good that we have to declare it in every linter, since that makes it easy to tell which is which at a glance.

…pr-linter-by-sentence

…trail/harper into expr-linter-by-sentence

elijah-potter

Great stuff. This PR is almost perfect. I like the test cases, the implementation, with just a few nits about naming. Once this is merged, I'd love to see a follow-up with an actual sentence-based ExprLinter so we can have one running in production as soon as possible.

elijah-potter · 2025-11-24T17:08:22Z

harper-core/src/linting/lint_group.rs

    ///
    /// This function is not significantly different from [`Self::add`], but allows us to take
    /// advantage of some properties of [`ExprLinter`]s for cache optimization.
    pub fn add_expr_linter(


Since this is now restricted to only Chunk-based ExprLinters, I think the naming of this function and the associated expr_linter field should assume more appropriate names.

Something else to consider: should we have a specialized cache for sentence linters as well?

Oh I think I did rename that to add_chunk_expr_linter in a previous attempt. Smaller diffs make PR reviewing easier though so changing it now is probably good.

As for the caching that's a part of the code I haven't looked at at all. I don't know if the associated types propagate the same way that parameterized types do but I think so right? And I don't know if the caching code is inherently affected by that type?

harper-core/src/linting/expr_linter.rs

elijah-potter · 2025-11-24T17:11:46Z

harper-core/src/linting/expr_linter.rs

+            "A test"
+        }
+
+        fn get_unit(&self) -> Unit {


It looks like your solution solves the caching problem. I think it's good that we have to declare it in every linter, since that makes it easy to tell which is which at a glance.

hippietrail · 2025-11-25T10:18:17Z

Great stuff. This PR is almost perfect. I like the test cases, the implementation, with just a few nits about naming. Once this is merged, I'd love to see a follow-up with an actual sentence-based ExprLinter so we can have one running in production as soon as possible.

The quintessential linters that would be affected by this are surely the Oxford comma linters, which are now just Linter's rather than ExprLinter's but fail if a list follows a clause? that ends with a comma.

But that's probably a big job. I had some other problem in mind when I first thought of sentence iterators but I forget what that was...

feat: allow ExprLinter to work on sentences as well as chunks

8d04152

elijah-potter reviewed Nov 13, 2025

View reviewed changes

hippietrail added 3 commits November 21, 2025 17:17

Merge branch 'master' of https://github.com/Automattic/harper into ex…

22c00c3

…pr-linter-by-sentence

feat: senter linter using associated types

fcf4c47

Merge branch 'expr-linter-by-sentence-3' of https://github.com/hippie…

3c9023f

…trail/harper into expr-linter-by-sentence

hippietrail requested a review from elijah-potter November 21, 2025 16:21

hippietrail added 2 commits November 22, 2025 01:04

Merge branch 'master' into expr-linter-by-sentence

c232263

fix: two linters missed conversion

308d572

elijah-potter requested changes Nov 24, 2025

View reviewed changes

feat: allow ExprLinter to work on sentences as well as chunks #2165

Are you sure you want to change the base?

feat: allow ExprLinter to work on sentences as well as chunks #2165

Uh oh!

Conversation

hippietrail commented Nov 10, 2025

Issues

Description

How Has This Been Tested?

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elijah-potter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hippietrail commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: allow `ExprLinter` to work on sentences as well as chunks #2165

feat: allow `ExprLinter` to work on sentences as well as chunks #2165