-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Go 2: integrated compile time language #37292
Comments
I made a similar proposal at #32620, but I decided to close this issue it as this need not be integrated in the Go compiler itself, a pre-processor can do this as well. I would first implement such a pre-processor, and once it has proven its utility and has become widely popular, it could be considered to integrate it with the compiler itself. |
The Rust compiler incorporates a Rust interpreter: https://github.com/rust-lang/miri |
Having a Turing complete interpreter in the compiler is a headache because the compiler would have to solve the halting problem to see if your program even compiles. In other words, it's easy to write loops in such an interpreter that might seriously slow down compilation or even cause it to go into an eternal loop. A macro preprocessor would be nice to have for Go, but for the reasons I mentioned, it should not be Turing complete. |
Compile time code execution / macros / etc. are available in a plenty of languages and their pros/cons are well known. So, IMO, writing a pre-processor to track its utility is not required. If CTFE/Macros are available in Go, they are guaranteed to be used by plenty of people. A pre-processor comes not even close to a proper compile time code execution or macro system. But afaict, because they slow down the compile times, and do not follow the simplicity and design goals of Go, so most likely this proposal won't be accepted.
Not really, no. It is very easy problem to solve. For eg., for compile-time code, the Nim compiler tracks the number of iterations of loops (easy to implement), and sets an upperbound on them. If the iterations cross the threshold, the compiler gives an error that the loop iterations are too many. For complex programs, which might require more iterations, it provides But anyways, this proposal is for adding a compile-time "language" ( |
What this proposal describes is a language that is very different from Go. Go, like any language, supports generating code using a separate code generator, and Go programmers use that routinely via |
An integrated compile-time language makes a compiler essentially programmable; there's no question that this would be very powerful. In a similar vain, at a time it was fashionable to make programming languages customizable, from the simple ability to define the precedence of operators (see e.g., Prolog), to the more complex of controlling evaluation of operands, etc. While such languages can make it much easier to solve certain problems, they also make it much harder to understand what is going on because the programming language can be changed. A reader will effectively have to learn a new language before they can read the code. Your proposal description is sufficiently high-level and the examples don't show the meta-language, so it's unclear to me how flexible an approach you are envisioning. But clearly you want to analyze format strings at compile time, so there's plenty of power here that could lead to programs that are extremely hard to understand. For instance, the meta-language could modify format strings and thus making it almost impossible to determine what a program is doing w/o actually performing the compile-time computation in one's head. This leads to code that may be easy to write but hard to read for anybody else but the writer (and it will be hard to read for the writer as well after enough time has passed). In short, I believe any such attempt would significantly reduce Go's readability, yet readability is one of the top priorities for Go, so we're not going to easily compromise on that. |
This proposal is incomplete in that it doesn't explain the compile time language. If it did, the additional complexity added to Go, as discussed above, would mean that we will not adopt it. Based on emoji voting, this proposal does not have strong support. Therefore, this is a likely decline. Leaving open for four weeks for final comments. |
Compile time language is about making code much more readable, efficient, robust and maintainable. It can greatly simplify many complex problems. Let's assume the meta language is GO so that it's simple for everyone to use. Let's assume that #xxxx( ) calls this go function and passes the parsed data between the parenthesis. The only thing to learn is calling the compiler API's such as inserting code and using compile time reflection. Go generate or writting a pre-processor are a waste of time in resolving many problems. If they worked well for compile time logic, then how would you propose fixing the CPU overhead in the compiler. This feature does not change the GO language. It just allows insertion of GO code. Most meta language programs won't be more than a hundred lines. Of all the concerns, "abuse" is the only legitimate concern but that is true with other features (solved using standards and best practices). If you doubt this works, then ask yourself why the CPU overhead in scanner.go has never been addressed. The answer is it's painful without compile time logic. E.g. the following is code from func scanComment() processing for // comments: s.next() // ignore the // for s.ch != '\n' && s.ch >= 0 { if s.ch == '\r' { numCR++ } s.next() } CPU usage could be reduced by over 50% (as much as 90%) for parsing but do you want to code it correctly when it looks so nasty? Do you want to repeat this everywhere this everywhere it occurs? s.next() // ignore the // for s.findNext( &[256]byte{0:1, '\n':1, '\r':1, 128:1, --- repeat to 255 --- } ) && s.ch != '\n' { if s.ch == '\r' { numCR++ } } With compile time language, this would look like: s.next() // ignore the // for s.#findNext( "\u0000\r\n" ) && s.ch != '\n' { if s.ch == '\r' { numCR++ } } #findNext( ) is the compile time function which is passed positional arg 1 "\u0000\r\n" and no keyword parms. It produces the findNext call that you would have coded by hand. func #findNext(args) { goCode = "findNext( &[256]byte{ " for i:=0; i < len(args.1); { r, w = utf8.DecodeRune(s.src[s.rdOffset:]) if w == 1 { if r == utf8.RuneError { error("illegal UTF-8 encoding") } else { goCode += args.1[i] + ":1, " } } i += w } for i:=128; i<256; I++ { // process all runes goCode+= i + ":1, " } goCode += "})" ast.insert(*,goCode) } #findNext adds a runtime call to Func findNext( ) which is a slightly modified version of the func next( ) from scanner.go. Replace the function prototype. A return true is needed where a character is found and return false when EOF is reached. The CPU reduction comes from the for loop skipping characters that are not used. func (s *Scanner) findNext(charTable [256]byte) { for ; s.rdOffset < len(s.src) && charTable[s.src[s.rdOffset]] == 0; s.rdOffset++ { } This feature offers great benefits to all GO developers with a minor amount of change. Most developer's do not have experience with this type of feature so they don't understand how this changes the design of their code. It will take time to change the mindset. |
@jonperryman I'm not sure I understand your example - can you elaborate more how I would get the 50% (or even 90%) reduction of CPU usage in |
Languages such as Go and C strongly encourage a byte processing mentality. C get's around some of this inefficiency by providing multi-byte functions (e.g. memcmp, strcmp, memcpy, strcpy). It does not provide functions such as indexAny() because it is very inefficient without a compile time language. You can do bandaid fixes to make scanner.go faster but to make it a speed demon, you will need a different mindset and do a rewrite. To make this easy to understand, lets discuss the small piece of code from scanComment that searches for \n as the end of the // comment. for s.ch != '\n' && s.ch >= 0 { <=== if s.ch == '\r' { <=== numCR++ } s.next() <=== } func (s *Scanner) next() { if s.rdOffset < len(s.src) { s.offset = s.rdOffset <=== if s.ch == '\n' { <=== s.lineOffset = s.offset s.file.AddLine(s.offset) } r, w := rune(s.src[s.rdOffset]), 1 <=== switch { case r == 0: <=== s.error(s.offset, "illegal character NUL") case r >= utf8.RuneSelf: <=== // not ASCII r, w = utf8.DecodeRune(s.src[s.rdOffset:]) if r == utf8.RuneError && w == 1 { s.error(s.offset, "illegal UTF-8 encoding") } else if r == bom && s.offset > 0 { s.error(s.offset, "illegal byte order mark") } } s.rdOffset += w <=== s.ch = r <=== } else { s.offset = len(s.src) if s.ch == '\n' { s.lineOffset = s.offset s.file.AddLine(s.offset) } s.ch = -1 // eof } } Notice the statements with "<===" that are executed for each character. These statements don't do anything useful except for /r, /n, 0 and all runes because they are more than 1 byte. To efficiently execute these statements only for those characters, we need to pass a character translation array with the characters of interest on the next( ) function call. s.next( &[256]byte{0:1, '\n':1, '\r':1, 128:1, --- repeat to 255 --- } ) This is a constant array that must be passed by reference (&) so avoid the overhead of copying the array at each call to next(). The array size must be 256 because a character value can be 0 to 255. We index into this array using each value of each character in the src. Each value in the array defaults to 0 and 0, \n and \r are set to a value of 1 (translation value). func (s *Scanner) next(translationTable *[256]byte) { for ; s.rdOffset < len(s.src) && translationTable[s.src[s.rdOffset]] == 0; s.rdOffset++ { } The next( ) function is modified to accept the translation table by reference from the caller. A tight "for loop" loop through the source until there's no more data or the translation value is not 0 (found a character of interest). The compiler should optimize the "for loop" to be very fast because it's so small. UTF8 is designed to be fast by having all bytes of a rune look like a rune (>127). Since comments are not interested in rune characters, they actually don't need to be included in translation table unless you want rune validation which probably isn't important. As for other parts of scanner.go, they need the same changes but are convoluted. They should be made readable for clarity and maintenance. To further improve maintainability and reduce CPU, the GOTO statement should be modified to allow GOTO DEPENDING ON where it supports multiple labels and the label is selected using a variable that contains an index. By specifying the index to a label for each character used in the translation table, you avoid the overhead for IF and CASE statements. The maintainability is improved by reducing repetition. Also note that other modules need these changes (e.g. xml and json parsing). |
Both C and Go provide functions equivalent to indexAny, strpbrk and strings.IndexAny respectively. Go's IndexAny even dynamically builds a bitset for ASCII-only sets of characters, which'd take less memory than an array of 256 bytes. I don't think indexAny is a particularly convincing example of why a compile-time language would be useful. If you're searching a constant set of characters, you could just do a one-time conversion to a fast data-structure once at program-startup time, and then repeatedly query using that data structure. |
@jonperryman Thanks for the detailed explanations, appreciated. I like to make a few observations specifically to your example (and somewhat unrelated to this issue):
Then there's also the bigger issue of how C/C++ deals with large pieces of software compared to Go: Because C/C++ uses include files, an enormous amount of source text is consumed in each compilation, vastly more than for an equivalent Go program (because Go has proper imports). Thus, super fast scanning is much more important for C/C++ than for Go. In summary, I don't think special use cases such as scanning/parsing are a good reason for extending Go itself by some sort of compile-time language. In fact, those special use cases led to specialized code generators such as yacc, antlr, coco/r, etc. which can do a much better job. |
It looks like what you want is something like |
There was further discussion but no change in consensus. The costs of this proposal remain unaddressed. Closing. |
Would you consider yourself a novice, intermediate, or experienced Go programmer?
Intermediate.What other languages do you have experience with?
C, C++, Assembler, Python, SAS, JavaScript, PHP, HTML/CSS and more.Would this change make Go easier or harder to learn, and why?
Advanced API's and frameworks can be made trivial to use. E.g. Messages should be multi-lingual but it's so difficult that even the GO compiler does not offer this capability. Look at the example below to see how this feature makes it trivial where all products could easily become multi-lingual.Has this idea, or one like it, been proposed before?
Don't confuse this with C macro's nor code generators. Nor is this an alternative for functions. Instead, this provides control and flexibility at compile time that would normally require GO compiler source changes. As part of the compiler, it has some basic knowledge (e.g. making reflection available at compile time).
Who does this proposal help, and why?
Everyone, from beginners to the compiler developer's will greatly benefit from this feature. Some of the benefits are:
What is the proposed change?
An integrated compile time language for GO that everyone can use. It will provide the ability to create compile time logic. It provides access to reflection at compile time, insertion of GO code and issue compile time error messages. The compile time language needs to be determined. GO would be nice because it would be consistent syntax and run efficiently but compile time programs must be compiled before compiling programs that use them along with dynamically loading / calling these programs. A language like Python might be more suitable because it is interpreted at execution time but it's drawbacks are another programming language to learn and high overhead during GO compile.
Demonstrating the importance of this functionality thru the UI.
I have real experience with the integrated compile time language that is built into the High Level Assembler. There are many real use cases so I will only comment about a few. I'll start with messages.
Easy to maintain call for a multi-lingual message
Elegant and efficient message definition in an english message module
Why are most messages in english instead of being multi-lingual? Why don't messages have help text? Where are the advanced message features?
The simple answer is GO and C are not convenient, efficient nor easy in solving many problems. We must settle for mediocre solutions such as "gettext( )" that are high runtime overhead with a new language to learn. Worse yet, you must understand the impact of changing the gettext message id.
The example above was taken from my product written in High Level Assembler but changed to be compatible with GO syntax. This solution suited my needs but could easily be changed to be more flexible. The message definition is from one of the english language message modules so that the documentation writers can easily fix problems. The help text explains the message and guides users thru problem resolution. The message ID crucial to identify the message regardless of user language. Equally important is that automation deals with the correct message. Customer's can easily identify automation changes when a message changes syntax. The error level ensures the return code is set correctly.
Excessive overhead in strings.IndexAny( )
This is the actual strings.IndexAny( ) code in GO. Whether you code IndexAny( ) in GO or C, this logic is the best you can do because there is not any compile time flexibility built into the language. An integrated compile time language allows you to create very robust solution. In this case, there are 3 compile time scenarios that must be considered to cover all situations.
Scenario 1: quoted search characters: strings.IndexAny( myString, "<>&;" )
The current implementation results in unnecessary runtime overhead that could be performed at compile time. The double FOR loop results in 4 times the execution time in this example. Using makeASCIISet( ) is a run time solution that occurs with every call to IndexAny. The correct solution would be to create the ascii table at compile time (asciTable := [256]byte{ '<' : 1, '>' : 1, '&' : 1, ';' : 1 } but that's expecting too much from the developer.
With this new functionality, creating the asciiTable statement is a simple loop thru the search characters. Non-ascii unicode characters can generate an additional search for those characters. Beginners don't notice any difference.
Scenario 2: search characters in a variable: strings.IndexAny( myString, searchCharactersVariable )
The standard implementation is the current implementation. It does not do any compile time optimization. This is easy for beginners because they don't notice any difference.
Scenario 3: advanced search optimization features
For ease of use, beginners can ignore advanced features. In fact, these features are only needed in extreme cases (e.g. large amounts of data fewer than 8 search characters). This could be something as simple as a pre-allocated ascii buffer / search structure. Using this feature could be keyword parameters or different call types. Implementation is truly up to the developer of this functionality.
Formatted printing could be improved without UI changes
Why isn't the formatting string parsed at compile time? Why aren't the formatting fields defitions determined at compile time to reduce runtime overhead? Why aren't the argument types validated with the formatting string at compile time? With this feature, we could easily solve these problems.
XML processing made easy and efficient
versus todays implementation
My XML parser written in High Level Assembler executes in less than half the time it takes in GO. The example above is in a GO compatible syntax. Compared with the current syntax, it is much easier to comprehend for GO beginners who understand XML. Additionally, the current implementation only allows for 1 standard implementation. There are several XML standards and advanced features which would be better served using an integrated compile time language. The example above shows multiple root elements (person & company) which is not available in the current implementation.
Please describe as precisely as possible the change to the language.
The minimum changes to the compiler:
Advanced features could be discussed later if this is accepted.
What would change in the [https://golang.org/ref/spec](language spec)?
The only change is a new section to cover the integrated compile time language.
Please also describe the change informally, as in a class teaching Go.
To be written if this feature is to be implemented. Because of the impact, this feature will be large.
Is this change backward compatible?
Yes.
Breaking the Go 1 compatibility guarantee is a large cost and requires a large benefit.
Go 1 compatibility is not affected by this change.
Show example code before and after the change.
See above.
What is the cost of this proposal? (Every language change has a cost).
TBD.
How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
Some generated code may not be available to the tools which may cause false errors such as missing variables. Otherwise, keeping the UI syntax consistent with GO should allow most tools to function correctly.
What is the compile time cost?
Use of this feature will increase compile time which could be significant depending upon implementation and compile time programs written. This can only be determined during the design phase.
What is the run time cost?
Run time will never increase. In some cases, run time will decrease.
Can you describe a possible implementation?
Do you have a prototype? (This is not required.)
Some UI prototypes are above. Called function would depend upon language chosen.
How would the language spec change?
Addtion to the language spec.
Orthogonality: how does this change interact or overlap with existing features?
Not applicable.
Is the goal of this change a performance improvement?
No, but that is one of the side benefits.
If so, what quantifiable improvement should we expect?
How would we measure it?
Does this affect error handling?
No.
If so, how does this differ from previous error handling proposals?
Is this about generics?
While not specifically about generics, it solves the basic complaint (repetitive coding). You only need look at messages to see how great opportunities are missed because of compiler limitations. When will the compiler have multi-lingual messages? When will the compiler have help text for messages.
If so, how does this differ from the the current design draft and the previous generics proposals?
The text was updated successfully, but these errors were encountered: