Skip to content

Conversation

LiamPattinson
Copy link
Collaborator

Fixes #64

This was a lot harder to implement than I expected!

  • Adds a function that replaces all text in strings and comments with whitespace/'!' respectively, allowing text rules to avoid matching on characters in that context.
    • The difficulty here is catching multiline strings, and the fact that the following is valid:
"This is a character string &
  & going over &
  !!! surprise comment !!!
  & multiple lines"
  • Adds rule that finds semi-colons in code and replaces them with a newline and an appropriate amount of indentation.

I've left this as a draft as I'd like to add a companion rule, superfluous-semicolon, for a semicolon at the end of a line, and I still need to add tests.

@ZedThree
Copy link
Member

Could you do this without the blanking by something like: find ; nodes in the AST, then check if the next non-comment sibling is on the same line?

@LiamPattinson
Copy link
Collaborator Author

As far as I can tell ; doesn't show up in the AST. If you switch the search in check.rs from root_node.named_descendants() to just root_node.descendants() and print every node kind, things like ::, , and = show up, but there's no sign of ;. I'm not sure if I'm just missing something though.

@LiamPattinson
Copy link
Collaborator Author

While working on this further I've found something that might be a tree-sitter-fortran bug. The following code raises syntax errors, but as far as I know it's valid (albeit awful) Fortran:

program p
  implicit none
  integer :: i, j
  ! 1st, 2nd/3rd, and 4th semicolons are superfluous
  ! 3rd/2nd separates the two statements
  ;i = 1;; j = 2;
  write (*,*) i, j
end program p

It doesn't like a semicolon at the start of a line or multiples grouped together.

I can't see a single instance of either one of these scenarios in my limited collection of test projects, so this is a very low priority bug.

@LiamPattinson
Copy link
Collaborator Author

I've added a rule superfluous-semicolon, but ultimately I don't think it would be a good idea to merge these rules in their current state.

Blanking all comments and strings is a very expensive operation, and as we can't have one rule return multiple kinds of violation, it has to be done twice per file if both rules are active. Benchmarking suggests this slows the code by ~7%:

❯ hyperfine -i './target/release/fortitude check --output-format=concise --preview --ignore=S081,S082 ../geos-chem'
Benchmark 1: ./target/release/fortitude check --output-format=concise --preview --ignore=S081,S082 ../geos-chem
  Time (mean ± σ):      1.763 s ±  0.046 s    [User: 8.042 s, System: 0.396 s]
  Range (min … max):    1.684 s …  1.808 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 
❯ hyperfine -i './target/release/fortitude check --output-format=concise --preview ../geos-chem'
Benchmark 1: ./target/release/fortitude check --output-format=concise --preview ../geos-chem
  Time (mean ± σ):      1.883 s ±  0.075 s    [User: 8.973 s, System: 0.414 s]
  Range (min … max):    1.761 s …  1.996 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 

The difference would be a lot less noticeable in a realistic IO-constrained run (geos-chem takes almost 10 seconds on my machine with full output!), but this still indicates that we need a less expensive solution to this problem.

@ZedThree Would it be possible to have tree-sitter recognise semicolons in the AST?

@ZedThree
Copy link
Member

Ah, I think the issue is that "end-of-statement" is an unnamed node and done entirely through an external scanner.

We could change the grammar to something like:

_end_of_statement: $ => choice(';', $._scan_end_of_statement),

@LiamPattinson LiamPattinson mentioned this pull request Jan 2, 2025
@ZedThree
Copy link
Member

We've just released tree-sitter-fortran 0.4.0 which has ; as an unnamed node, so we should now be able to significantly simplify this rule

@LiamPattinson
Copy link
Collaborator Author

Fantastic, I'll try to get that going now.

@LiamPattinson
Copy link
Collaborator Author

LiamPattinson commented Jan 23, 2025

I've got it working with the new tree-sitter-fortran features -- this is a huge improvement!

TODO:

  • Add tests
  • Match local indentation when applying fixes -- I'll add some generic functions for this, as it's likely to come up again.

@LiamPattinson LiamPattinson marked this pull request as ready for review January 27, 2025 17:45
let line = src
.slice(TextRange::new(start_line, start_line + start_offset))
.to_string();
line.chars().take_while(|&c| c.is_whitespace()).collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have some fun with tabs here! (although tabs are technically not valid characters in fortran source)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured tabs might be present in some codes, and hence I wrote indentation to return a string of the current indentation level rather than a count of the number of spaces. That way it can include a mix of tabs and spaces. I think this is the safest way to handle them, but could be totally wrong on that.

On this topic, we should probably have a 'no tabs allowed' rule.

@LiamPattinson
Copy link
Collaborator Author

I think I'm finally there. The fixes behave strangely with regards to indentation if the user is doing anything really weird, like using multiple semicolons in a row, but should hold up in most cases.

@LiamPattinson LiamPattinson merged commit bcec2b6 into main Jan 27, 2025
25 checks passed
@LiamPattinson LiamPattinson deleted the rule/multiple_statements_per_line branch January 27, 2025 17:56
@ZedThree ZedThree added the rule A new rule for the linter label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule A new rule for the linter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid multiple statements on the same line
2 participants