-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
format doctests in docstrings #8811
Changes from 1 commit
14d6615
4a6f113
172e4c4
6d3e446
ba83505
d584998
0d4f1c1
1386716
154d733
d51ae51
47e6e76
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1071,6 +1071,83 @@ impl<'src> DocstringLine<'src> { | |
} | ||
} | ||
} | ||
|
||
/// A single code example extracted from a docstring. | ||
/// | ||
/// This represents an intermediate state from when the code example was first | ||
/// found all the way up until the point at which the code example has finished | ||
/// and is reformatted. | ||
/// | ||
/// Its default state is "empty." That is, that no code example is currently | ||
/// being collected. | ||
#[derive(Debug, Default)] | ||
struct CodeExample<'src> { | ||
/// The kind of code example being collected, or `None` if no code example | ||
/// has been observed. | ||
kind: Option<CodeExampleKind>, | ||
/// The lines that have been seen so far that make up the code example. | ||
lines: Vec<CodeExampleLine<'src>>, | ||
} | ||
|
||
impl<'src> CodeExample<'src> { | ||
/// Attempt to add an original line from a docstring to this code example. | ||
/// | ||
/// Based on the line and the internal state of whether a code example is | ||
/// currently being collected or not, this will return an "action" for | ||
/// the caller to perform. The typical case is a "print" action, which | ||
/// instructs the caller to just print the line as though it were not part | ||
/// of a code snippet. | ||
fn add(&mut self, original: DocstringLine<'src>) -> CodeExampleAddAction<'src> { | ||
match self.kind.take() { | ||
// There's no existing code example being built, so we look for | ||
// the start of one or otherwise tell the caller we couldn't find | ||
// anything. | ||
None => match self.add_start(original) { | ||
None => CodeExampleAddAction::Kept, | ||
Some(original) => CodeExampleAddAction::Print { original }, | ||
}, | ||
Some(CodeExampleKind::Doctest(doctest)) => { | ||
if let Some(code) = doctest_find_ps2_prompt(&doctest.indent, &original.line) { | ||
let code = code.to_string(); | ||
self.lines.push(CodeExampleLine { original, code }); | ||
// Stay with the doctest kind while we accumulate all | ||
// PS2 prompts. | ||
self.kind = Some(CodeExampleKind::Doctest(doctest)); | ||
return CodeExampleAddAction::Kept; | ||
} | ||
let code = std::mem::take(&mut self.lines); | ||
let original = self.add_start(original); | ||
CodeExampleAddAction::Format { | ||
code, | ||
kind: CodeExampleKind::Doctest(doctest), | ||
original, | ||
} | ||
} | ||
} | ||
} | ||
|
||
/// Looks for the start of a code example. If one was found, then the given | ||
/// line is kept and added as part of the code example. Otherwise, the line | ||
/// is returned unchanged and no code example was found. | ||
/// | ||
/// # Panics | ||
/// | ||
/// This panics when the existing code-example is any non-None value. That | ||
/// is, this routine assumes that there is no ongoing code example being | ||
/// collected and looks for the beginning of another code example. | ||
fn add_start(&mut self, original: DocstringLine<'src>) -> Option<DocstringLine<'src>> { | ||
assert_eq!(None, self.kind, "expected no existing code example"); | ||
if let Some((indent, code)) = doctest_find_ps1_prompt(&original.line) { | ||
let indent = indent.to_string(); | ||
let code = code.to_string(); | ||
self.lines.push(CodeExampleLine { original, code }); | ||
self.kind = Some(CodeExampleKind::Doctest(CodeExampleDoctest { indent })); | ||
return None; | ||
} | ||
Some(original) | ||
} | ||
} | ||
|
||
/// The kind of code example observed in a docstring. | ||
#[derive(Clone, Debug, Eq, PartialEq)] | ||
enum CodeExampleKind { | ||
|
@@ -1117,6 +1194,97 @@ struct CodeExampleLine<'src> { | |
/// The code extracted from the line. | ||
code: String, | ||
} | ||
|
||
/// An action that a caller should perform after attempting to add a line from | ||
/// a docstring to a code example. | ||
/// | ||
/// Callers are expected to add every line from a docstring to a code example, | ||
/// and the state of the code example (and the line itself) will determine | ||
/// how the caller should react. | ||
#[derive(Debug)] | ||
enum CodeExampleAddAction<'src> { | ||
/// The line added was ignored by `CodeExample` and the caller should print | ||
/// it to the formatter as-is. | ||
/// | ||
/// This is the common case. That is, most lines in most docstrings are not | ||
/// part of a code example. | ||
Print { original: DocstringLine<'src> }, | ||
/// The line added was kept by `CodeExample` as part of a new or existing | ||
/// code example. | ||
/// | ||
/// When this occurs, callers should not try to format the line and instead | ||
/// move on to the next line. | ||
Kept, | ||
/// The line added indicated that the code example is finished and should | ||
/// be formatted and printed. The line added is not treated as part of | ||
/// the code example. If the line added indicated the start of another | ||
/// code example, then is won't be returned to the caller here. Otherwise, | ||
/// callers should pass it through to the formatter as-is. | ||
Format { | ||
/// The kind of code example that was found. | ||
kind: CodeExampleKind, | ||
/// The Python code that should be formatted, indented and printed. | ||
/// | ||
/// This is guaranteed to be non-empty. | ||
code: Vec<CodeExampleLine<'src>>, | ||
/// When set, the line is considered not part of any code example | ||
/// and should be formatted as if the `Ignore` action were returned. | ||
BurntSushi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// Otherwise, if there is no line, then either one does not exist | ||
/// or it is part of another code example and should be treated as a | ||
/// `Kept` action. | ||
original: Option<DocstringLine<'src>>, | ||
}, | ||
} | ||
|
||
/// Looks for a valid doctest PS1 prompt in the line given. | ||
/// | ||
/// If one was found, then the indentation prior to the prompt is returned | ||
/// along with the code portion of the line. | ||
fn doctest_find_ps1_prompt(line: &str) -> Option<(&str, &str)> { | ||
let trim_start = line.trim_start(); | ||
// Prompts must be followed by an ASCII space character[1]. | ||
// | ||
// [1]: https://github.com/python/cpython/blob/0ff6368519ed7542ad8b443de01108690102420a/Lib/doctest.py#L809-L812 | ||
let code = trim_start.strip_prefix(">>> ")?; | ||
let indent_len = line | ||
.len() | ||
.checked_sub(trim_start.len()) | ||
.expect("suffix is <= original"); | ||
let indent = &line[..indent_len]; | ||
Some((indent, code)) | ||
} | ||
|
||
/// Looks for a valid doctest PS2 prompt in the line given. | ||
/// | ||
/// If one is found, then the code portion of the line following the PS2 prompt | ||
/// is returned. | ||
/// | ||
/// Callers must provide a string containing the original indentation of the | ||
/// PS1 prompt that started the doctest containing the potential PS2 prompt | ||
/// in the line given. If the line contains a PS2 prompt, its indentation must | ||
/// match the indentation used for the corresponding PS1 prompt (otherwise | ||
/// `None` will be returned). | ||
fn doctest_find_ps2_prompt<'src>(ps1_indent: &str, line: &'src str) -> Option<&'src str> { | ||
let (ps2_indent, ps2_after) = line.split_once("...")?; | ||
// PS2 prompts must have the same indentation as their | ||
// corresponding PS1 prompt.[1] While the 'doctest' Python | ||
// module will error in this case, we just treat this line as a | ||
// non-doctest line. | ||
// | ||
// [1]: https://github.com/python/cpython/blob/0ff6368519ed7542ad8b443de01108690102420a/Lib/doctest.py#L733 | ||
if ps1_indent != ps2_indent { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I addmit I haven't read the code in full, but would it be sufficient to check if the indents have the same column/character length as done in the Lexer or is it necessary that the strings match exactly? See There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Python Technically we are a little more relaxed here because we allow any kind of whitespace in the indentation. But I preserved the "indentation should be byte-for-byte equivalent" check. So bottom line is that it's hard for me to say whether this is necessary or not. Probably not. But I do think it is pretty simple? Are you concerned about the costs of carrying the indentation string around? I think for code snippet formatting it is probably a lot less of a concern than when parsing arbitrary Python. The size scales are different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was mainly trying to understand the semantics to ensure we are as strict as Python when it comes to handling indentation. Thanks for the explanation. |
||
return None; | ||
} | ||
// PS2 prompts must be followed by an ASCII space character unless | ||
// it's an otherwise empty line[1]. | ||
// | ||
// [1]: https://github.com/python/cpython/blob/0ff6368519ed7542ad8b443de01108690102420a/Lib/doctest.py#L809-L812 | ||
match ps2_after.strip_prefix(' ') { | ||
None if ps2_after.is_empty() => Some(""), | ||
None => None, | ||
Some(code) => Some(code), | ||
} | ||
} | ||
} | ||
|
||
/// If the last line of the docstring is `content" """` or `content\ """`, we need a chaperone space | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for converting these to
String
rather than storing borrowed&str
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to my other answer, at least with respect to
code
, I think when I was writing the types it wasn't obvious to me that the code portion of a line would necessarily be a proper substring of it. But I can't think of any contrary case.