-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dip1036e - enhanced interpolation #15715
Conversation
Thanks for your pull request and interest in making D better, @adamdruppe! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + dmd#15715" |
Only two things I'm not happy about:
Needs a test case, and chuck it under a preview flag awaiting DIP. Other than these four things, this looks to be the simplest solution and by all rights should mergable. |
I don't really care on $$ vs \$ except insomuch as it affects the layering. This implementation could do either equally well, but would need additional work to be added for the different kinds of string literals. Formatting blocks are irrelevant, that's a library concern, not the language's responsibility. |
I only half agree with this. Once extracted the formatting string is the library's concern. However, it is the language's responsibility to keep the format associated with the expression independent from the string that can be printed. This then matches say Python/C++'s |
No, it has nothing to do with the language. The language doesn't even need to know what a format string even is. (And I suspect most uses of this feature also won't care about it.) You can do this kind of thing if you want it formally tied together:
Simple implementation, simple use. Follows all existing language rules. You could also ufcs if you prefer: Let's not add unnecessary special cases to the compiler. |
I take it back, it does work! I must have been confused about something else. UPDATE: Yeah, it's only parameters that have default values that can't match IFTI arguments. That's what I was thinking of. |
Yeah, the reason I slapped together the implementation is to try it, so we can be sure about things like that. That said though, if you did a nested interpolated element (which lol i thought was broke for a sec because i write
Gives:
Which looks fine, there's the nested thing, but if you wanted to do it recursively, it wouldn't match anymore; you'd have to count open Headers and closed Footers to slice the arguments tuple; the compiler won't magically be able to do that for you (similar to regex matching parens etc). Of course, if you just ignored that it is nested, most things would still work! But still, the bracketing making it at least possible to process it in full detail is nice. (BTW I do think we could and should go ahead and trim out the empty interpolated literals in semantic, since the presence of InterpolatedExpression makes them redundant and the odd/even rule doesn't work once you have an interpolated tuple anyway, so probably better to not even make it look possible.) |
so btw re the escape thing, $$ or \$ or whatever, there is a third option: no escape at all. You can always do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lacks:
- complete and accurate specification, or even a description
- comments
- test cases
On the interpolation-examples page, you might include a comparison of how you would do it with DIP 1027 (if possible). |
So here's a question for everyone: what about postfixes? Supporting them in the compiler is a two line patch, forwarding the postfix to all the child strings, but that also changes the public api: Expanding the druntime templates to allow this is easy enough, but it'd affect every user too as they'd have to check for the various string types. I don't think it is worth it. If you want a wstring or dstring, it is easy to call a function that creates one given the existing text. We could pass the postfix as an argument, maybe to the InterpolationHeader, but I also don't think that'd be worth it. So my proposal is to just go ahead and ban the postfixes on these literals, as you can see implemented in the following commit. Any big objections? |
The added failure test case asserts that there is an error with postfixes since that's banned (unless people come in with good objections, but remember, it can complicate user code so be prepared to justify it), and the runnable test case asserts things work as designed, including in some embedded char cases that might be trickier to parse. |
Yeah, I agree, leave suffixes alone. I would like to see wysiwyg strings work, like they were planned in 1036. Is that a thing here? |
Yes, like I said in the (edited) opening post: "Only i"" is implemented in the lexer at this time but the intention is to do it for all of them. " That's the the implementation factors out the stuff into a separate function, just need to plug that into the other locations too. I'm just doing this in between other obligations so it is kinda in 30 minute spurts. I don't think those will take long to add so maybe next time. Probably next thing to do will be i`` strings then iq{} strings. |
Answering here instead of the forum, where it will get lost. Not a review, but things to consider. (As I do not see other good place for this). Around 2007, I wrote string interpolation implementation toy library in D. https://github.com/baryluk/echo/blob/master/echo.d It still compiles and it works! I am sure it was written in D1, but compiles today with latest compiler without a single warning. (there is also echo2 in Pretty functional, but I did not use it that much actually. It was just proof of concept written on some lazy evening long long time ago. If I would do it now (after using many other languages and implementing also countless printf-like interfaces from scratch in few languages, primarily C, C++, D, Python, Erlang and Go), I would: Use Python-style: x=5
y=1
z=666
d=9 # runtime width value
f"a {x} and {y:3} and {z:d}" # formatted string
# 5 and 1 and 666 It is clean, allows custom formatting (width, precision/decimals, alignment using Be semi-lazy, do not form a string, but instead object (could be tuple) that can be passed to proper sink in a streaming fashion (expressions in-between Allow specifying custom formaters, that should either be interpreted by a type being formatted, or by the sink. Example: from datetime import date
major = 3
minor = 11
release = date(2023, 10, 2)
print(f"Python {major}.{minor + 1:03d} is released on {release:%B %-d}") In first version we could disallow usage of Allow nesting (like in Python 3.12): Be sure to support escape sequences properly to support char and string literals: f"{'\n'.join(words)}" Support multi-line and comments: f"""Storing employee's data: {
employee['name'].upper() // Always uppercase name before storing
}""" f"""{
f'''{
f"{f'{42}'}"
}'''
}""" Good error messages: >>> f"{42 + }"
File "<stdin>", line 1
f"{42 + }"
^
SyntaxError: f-string: expecting '=', or '!', or ':', or '}' And then consider few more things: concatenation (especially for breaking very long ones): write(f"{x} {y}" ~
f"{z}" ~
f"suffix"); That should not form a string, but create a mega-tuple with everything. (hard to do if there are ternary operators or function calls between Escaping of I think
But in real life (not library and meta programming), Also consider:
It is unclear what it does:
Same with other operators, like dot, Balanced Allow self describing formatting: a = 3
b = 5
print(f"a={a} b={b}")
print(f"{a=} {b=}") # same as above
# a=3 b=5
# a=3 b=5
print(f"{a+b=} {f(x)+2=}")
# a+b=8 f(x)+2=666 And finally, be able to do lazy on formatted/interpolated strings: void MaybeLog(T)(lazy Args args) {
if (...) {
static foreach (arg; args) {
sink(arg());
}
}
}
MaybeLog(f" {f(x)} {time()} {factorial(y)}"); should of course be "equivalent" to: MaybeLog(void delegate() { return f" {f(x)} {time()} {factorial(y)}"; }); And finally consider syntax that is amenable to having similar string used for formatting where format string is known only at runtime (with some restrictions, and probably only subset of features supported). It should be possible to implement Adam's proposal is decent, but not perfect. (I really dislike Looking at example, it is pretty cool to use it for things like Url escaping, SQL, internationalization.
I think more natural would be, to just have it directly nested, like Some elements of the tuple to be some sub-tuples. Not sure how to type it, but should be possible. You can always flatten it later easily, and even have a helper template for this in Phobos. (You can also unflatten but it is less easy) What about this: foo(i"$(a)", f(), g(), i"$(y)"); How would you declare a generic function signature for this, so it is still easy to implement. Also, of course this feature, however implemented, should be usable without normal runtime or phobos. (I.e. in embedded system or kernel on bare metal). PS. Only support PS2. Be sure to check Swift language formatting. It is pretty well engineered. I did not use Swift personally, but it is very versatile what they do. I do not want it like that in D, but still a good source of inspiration. PS3. In closed dip you show these example indeed for the justification of enum result = text(
i"@property bool {name}() @safe pure nothrow @nogc const {{
return ({store} & {maskAllElse}) != 0;
}}
@property void {name}(bool v) @safe pure nothrow @nogc {{
if (v) {store} |= {maskAllElse};
else {store} &= cast(typeof({store}))(-1-cast(typeof({store})){maskAllElse});
}}\n"
); vs enum result = text(
i"@property bool $name() @safe pure nothrow @nogc const {
return ($store & $maskAllElse) != 0;
}
@property void $name(bool v) @safe pure nothrow @nogc {
if (v) $store |= $maskAllElse;
else $store &= cast(typeof($store))(-1-cast(typeof($store))$maskAllElse);
}\n"
); It is my opinion, that in fact the first one (Python style) is cleaner. |
I have the As long as we go with But I am firmly in the belief that interpolated strings should be based upon double quoted strings and that includes the suffixes. Those too can be added later, so are not something that has to limit an acceptance of 1036e. |
@@ -1784,9 +1866,17 @@ class Lexer | |||
* D https://dlang.org/spec/lex.html#double_quoted_strings | |||
* ImportC C11 6.4.5 | |||
*/ | |||
private void escapeStringConstant(Token* t) | |||
private void escapeStringConstant(Token* t, bool supportInterpolation = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably better to split this into two functions (with a bit of repetition) instead. Should help with compilation speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have measurements of compilation speed difference?
Of course, such internal implementation details can be changed at any time.
With this proposal, you could write your own format function that takes i"{$myVar:3}" and turns it into a call to newFormat("{%d:3}", myVar), right? Any specific formatting lib API could be built on top of this without any more compiler help, I think? |
Yes, you could do that. It'd be kinda similar to the code in the 02-formatting example, but the tokens it looks for would be in both the preceding literals and the following one. Can pretty easily throw if something is malformed too in the library code. I really encourage people to experiment with this before dismissing it and/or demanding changes. Some of the techniques in the library are a bit of a pain - I'm not gonna tell you handling nested sequences as a unit or handling a sequence of sequences is necessarily trivial, you might have to get creative with tuple slicing and object wrapping - but it is all doable and I'm building up a little set of examples for many of these things. The amount of things this little change to dmd enables is really remarkable. and omg rebase again, again, on the auto-geenrated file. that's obnoxious but thankfully not hard to resolve |
There's this in the spec:
which it implies it is impossible to generate a "$(" into the output. If my suggestion is incorporated, this could be done with |
It took me a while. But yes, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs a spec PR as well. I expect it would need its own page to properly document it.
I can work on the spec, thank you! |
Does anyone have info on why it's failing tests? |
Possible it just timed out and needs a rebase and repush. |
This implements the Enhanced Interpolated Expression Sequence proposal: i"" or iq{} or q`` with a $(expression) in the middle are converted to a tuple of druntime types for future processing by library code.
Looks like the failing test is related to a vector thing:
I think there are multiple tests trying to write this file at once. |
It's green now, I think the test might be a race condition, that intermittently fails. |
Looks like an unrelated test heisenbug. |
I'm debating if I should change the changelog, or try and do the other two string types. Anyone else know how to add them? I'm not good at compiler. |
Where's that quote from? The changelog and the lexer are in agreement: three forms are implemented and three forms are described (i"", i``, and iq{}). |
ok, it was in the top. I didn't read the whole thing, sorry. |
I need to pull this down to a linux box to test, dmd doesn't work on my mac... |
oh yeah, oops, i wrote that months ago and never updated since october lol
the ldc bundled with OpenD's release download supports it too https://github.com/opendlang/opend/releases/tag/CI just sayin lol |
I'm writing the spec, so I need to actually play with this and see how it works. Will post a link when I'm done. Man, DDOC is so painful to write in... |
Hmm. How do I escape
How do print literal |
|
Escape the dollar |
I see. using That is not very intuitive. Because |
It is the same as every other escape. I.e. |
There is finite number of escapes defined. I.e. one cannot do https://dlang.org/spec/lex.html#escape_sequences - does not list (I know the docs are not yet updated). |
FWIW, the spec is not updated. I'm working on it. https://github.com/schveiguy/dlang.org/blob/istring/spec/istring.dd |
Looking good so far. I am assuming one can add arguments attributes, including void processIES(Sequence...)(InterpolationHeader, lazy Sequence data, InterpolationFooter)
{
// process data here
} |
Whatever works today should work there. |
spec is ready for review: dlang/dlang.org#3768 |
This is based on an older draft of dip1036 but merging in the benefits from the YAIDIP.
This slight change supports interpolation of tuples and nested i-strings while still keeping the full CTFE capability of yaidip. It also retains the simplified processing of the original dip1036.
See example repo here with variety of use cases: https://github.com/adamdruppe/interpolation-examples
The interpolated expression sequence is a string literal prefixed with the letter
i
in the source code with embedded items with the format of$identifier
or$(expression)
where the identifier and expression are defined according to normal D rules. The lexer considers it a single token that may follow other token rules inside, similar to aq{}
already in D. You can use\$
to put in a dollar sign followed by ( or identifier chars that does not trigger the interpolation in a double quoted i-strings. In other types of strings and i-strings, this does not apply.Its semantics are to convert the interpolated expression sequence token into a tuple of the form:
(Please note each of the Interpol* structs there is defined in
core.interpolation
and is strictly looked up from that module, not from the current scope.)That is, each part of the original string is broken up into items and written in order, with the actual value following the output in the sequence too.
This is easier to explain if you just look at the source code and/or the examples so idk why im writing this.
Only
i""
is implemented in the lexer at this time but the intention is to do it for all of them. I'll come back to it eventually. The string suffixes could also be applied to the literals it passes to the templates inside if we want.