Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Report line numbers in mdtest relative to the markdown file, not the test snippet #13804

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 30 additions & 7 deletions crates/red_knot_test/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ use red_knot_python_semantic::types::check_types;
use ruff_db::files::system_path_to_file;
use ruff_db::parsed::parsed_module;
use ruff_db::system::{DbWithTestSystem, SystemPathBuf};
use ruff_source_file::OneIndexed;
use std::collections::BTreeMap;
use std::path::PathBuf;

type Failures = BTreeMap<SystemPathBuf, matcher::FailuresByLine>;
type Failures = BTreeMap<AbsoluteLineNumberPath, matcher::FailuresByLine>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm inclined to use a regular Vec here and instead sort it before returning it in run_test. Vec's are just overall simpler data structure (e.g. for iterating). I then would add the md_offset to the FailuresByLine data structure (do we still need line_number or can we replace it with the md_offset?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the grouping by embedded-file path from the output altogether, as I suggested below, that also suggests simplifying this data structure.


mod assertion;
mod db;
Expand All @@ -34,12 +35,24 @@ pub fn run(path: &PathBuf, title: &str) {
any_failures = true;
println!("\n{}\n", test.name().bold().underline());

for (path, by_line) in failures {
for (
AbsoluteLineNumberPath {
path,
line_number: absolute_line_number,
},
by_line,
) in failures
{
Comment on lines +38 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just because you can do arbitrary unpacking in Rust doesn't mean you should ;)

Here, I'd find this more readable:

Suggested change
for (
AbsoluteLineNumberPath {
path,
line_number: absolute_line_number,
},
by_line,
) in failures
{
for (path, by_line) in failures {
let AbsoluteLineNumberPath {
path,
line_number: absolute_line_number,
} = path;

println!("{}", path.as_str().bold());
for (line_number, failures) in by_line.iter() {
for failure in failures {
let absolute_line_info = format!(
"{}:{}",
title,
absolute_line_number.saturating_add(line_number.get())
);
let line_info = format!("line {line_number}:").cyan();
println!(" {line_info} {failure}");
println!("{absolute_line_info} {line_info} {failure}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation should come before absolute_line_info here, so that the per-line diagnostics are indented below the "heading", which is the name of the embedded file

Suggested change
println!("{absolute_line_info} {line_info} {failure}");
println!(" {absolute_line_info} {line_info} {failure}");

This means that the output in the terminal looks is beautifully organised like this if you have a test which has multiple Python snippets in the same Markdown test, and failures in more than one of those code blocks:

image

I checked, and I'm still able to double-click on the exception_control_flow.md:157 if I run the test in VSCode, and it takes me straight to the line, even if it's indented.


Separately, I'm wondering if we need the relative line number to be reported at all, or if we could just make do with the relative line number? It feels like it just adds noise. Maybe we could just do this (diff relative to your PR branch):

diff --git a/crates/red_knot_test/src/lib.rs b/crates/red_knot_test/src/lib.rs
index 691a71528..e31fad615 100644
--- a/crates/red_knot_test/src/lib.rs
+++ b/crates/red_knot_test/src/lib.rs
@@ -50,9 +50,9 @@ pub fn run(path: &PathBuf, title: &str) {
                             "{}:{}",
                             title,
                             absolute_line_number.saturating_add(line_number.get())
-                        );
-                        let line_info = format!("line {line_number}:").cyan();
-                        println!("{absolute_line_info}    {line_info} {failure}");
+                        )
+                        .cyan();
+                        println!("    {absolute_line_info} {failure}");
                     }
                 }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the many line numbers and file names slightly confusing... Especially because they're next to each other.

But it's hard to come up with a sane grouping. The grouping by file seems reasonable but is also somewhat confusing with diagnostics that then have line numbers absolute to the entire file.

And do we still need to show both line numbers? Aren't the md file line numbers sufficient?

/src/single_except.py
	exception/control_flow.md:157 unmatched assertion: revealed: list[int]
	exception/control_flow.md:159 unmatched assertion: revealed: list[int]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we still need to show both line numbers? Aren't the md file line numbers sufficient?

I already asked that; I agree with you. Take a look at the final suggested diff in my comment above :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh shoot. Reading is hard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with removing the old embedded-relative line number entirely, in favor of just the new absolute line number.

I would favor removing the grouping by embedded-file altogether in the output, and just having flat output of one failure per line, per test. It's weird to me to have it grouped by embedded file but then the line numbers be relative to the markdown file. If you find the given line in the markdown file, it'll be obvious what embedded file you are in. And so many of the embedded file names are just /src/test.py anyway, it feels like mostly noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would favor removing the grouping by embedded-file altogether in the output, and just having flat output of one failure per line, per test. It's weird to me to have it grouped by embedded file but then the line numbers be relative to the markdown file. If you find the given line in the markdown file, it'll be obvious what embedded file you are in. And so many of the embedded file names are just /src/test.py anyway, it feels like mostly noise.

That rationale makes sense to me!

}
}
println!();
Expand All @@ -52,11 +65,17 @@ pub fn run(path: &PathBuf, title: &str) {
assert!(!any_failures, "Some tests failed.");
}

#[derive(PartialEq, PartialOrd, Eq, Ord)]
struct AbsoluteLineNumberPath {
path: SystemPathBuf,
line_number: OneIndexed,
}

fn run_test(test: &parser::MarkdownTest) -> Result<(), Failures> {
let workspace_root = SystemPathBuf::from("/src");
let mut db = db::Db::setup(workspace_root.clone());

let mut system_paths = vec![];
let mut paths: Vec<AbsoluteLineNumberPath> = Vec::with_capacity(test.files().count());

for file in test.files() {
assert!(
Expand All @@ -65,12 +84,16 @@ fn run_test(test: &parser::MarkdownTest) -> Result<(), Failures> {
);
let full_path = workspace_root.join(file.path);
db.write_file(&full_path, file.code).unwrap();
system_paths.push(full_path);
paths.push(AbsoluteLineNumberPath {
path: full_path,
line_number: file.md_line_number,
});
}

let mut failures = BTreeMap::default();

for path in system_paths {
for numbered_path in paths {
let path = numbered_path.path.clone();
let file = system_path_to_file(&db, path.clone()).unwrap();
let parsed = parsed_module(&db, file);

Expand All @@ -84,7 +107,7 @@ fn run_test(test: &parser::MarkdownTest) -> Result<(), Failures> {
);

matcher::match_file(&db, file, check_types(&db, file)).unwrap_or_else(|line_failures| {
failures.insert(path, line_failures);
failures.insert(numbered_path, line_failures);
});
}
if failures.is_empty() {
Expand Down
18 changes: 18 additions & 0 deletions crates/red_knot_test/src/parser.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use once_cell::sync::Lazy;
use regex::{Captures, Regex};
use ruff_index::{newtype_index, IndexVec};
use ruff_source_file::OneIndexed;
use rustc_hash::{FxHashMap, FxHashSet};

/// Parse the Markdown `source` as a test suite with given `title`.
Expand Down Expand Up @@ -131,6 +132,7 @@ pub(crate) struct EmbeddedFile<'s> {
pub(crate) path: &'s str,
pub(crate) lang: &'s str,
pub(crate) code: &'s str,
pub(crate) md_line_number: OneIndexed,
}

/// Matches an arbitrary amount of whitespace (including newlines), followed by a sequence of `#`
Expand Down Expand Up @@ -186,6 +188,9 @@ struct Parser<'s> {
/// The unparsed remainder of the Markdown source.
unparsed: &'s str,

/// Current line number of the parser
current_line_number: OneIndexed,

/// Stack of ancestor sections.
stack: SectionStack,

Expand All @@ -205,6 +210,7 @@ impl<'s> Parser<'s> {
sections,
files: IndexVec::default(),
unparsed: source,
current_line_number: OneIndexed::new(1).unwrap(),
stack: SectionStack::new(root_section_id),
current_section_files: None,
}
Expand All @@ -225,6 +231,12 @@ impl<'s> Parser<'s> {
}
}

fn increment_line_count(&mut self, captures: &Captures<'s>) {
self.current_line_number = self
.current_line_number
.saturating_add(captures[0].lines().count());
}

fn parse_impl(&mut self) -> anyhow::Result<()> {
while !self.unparsed.is_empty() {
if let Some(captures) = self.scan(&HEADER_RE) {
Expand All @@ -235,6 +247,7 @@ impl<'s> Parser<'s> {
// ignore other Markdown syntax (paragraphs, etc) used as comments in the test
if let Some(next_newline) = self.unparsed.find('\n') {
(_, self.unparsed) = self.unparsed.split_at(next_newline + 1);
self.current_line_number = self.current_line_number.saturating_add(1);
Copy link
Member

@MichaReiser MichaReiser Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the line numbers off after a code block (that could span multiple lines)?

I think a better approach is to track the offset (TextSize) and increment it in self.scan and here. Then use LineIndex during rendering to get the line number.

} else {
break;
}
Expand Down Expand Up @@ -270,6 +283,8 @@ impl<'s> Parser<'s> {

self.current_section_files = None;

self.increment_line_count(captures);

Ok(())
}

Expand Down Expand Up @@ -303,6 +318,7 @@ impl<'s> Parser<'s> {
// CODE_RE can't match without matches for 'lang' and 'code'.
lang: captures.name("lang").unwrap().into(),
code: captures.name("code").unwrap().into(),
md_line_number: self.current_line_number,
});

if let Some(current_files) = &mut self.current_section_files {
Expand All @@ -324,6 +340,8 @@ impl<'s> Parser<'s> {
self.current_section_files = Some(FxHashSet::from_iter([path]));
}

self.increment_line_count(captures);

Ok(())
}

Expand Down
Loading