Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: parameter tags for static analysis tools #24889

Closed
ianlancetaylor opened this issue Apr 16, 2018 · 16 comments
Closed

proposal: Go 2: parameter tags for static analysis tools #24889

ianlancetaylor opened this issue Apr 16, 2018 · 16 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal v2 An incompatible library change
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Apr 16, 2018

A frequent request for Go is a const type qualifier along the lines of C (e.g., #20443, #22876). C also has additional type qualifiers volatile, restrict, and _Atomic, all of which are potentially meaningful for Go.

clang supports thread safety analysis for C++ (https://clang.llvm.org/docs/ThreadSafetyAnalysis.html) based on annotations in the source code. This would clearly be applicable to Go code.

The Go language currently has only one type qualifier (channel direction) and has historically resisted adding more. In general, Go encourages programming with code, rather than programming with types. It's not always clear which type qualifiers are useful. It's hard to add new ones.

The main purpose of a type qualifier is to document and enforce function behavior. Adding a const qualifier to a function pointer argument informs the reader of the code that the function does not change the pointed-to value using that pointer. The compiler gives an error for any attempt to do so. It may seem that it would also provide an opportunity for compiler optimization, but generally such optimizations are fairly limited. This is because putting a const qualifier on one pointer does not mean that the value is not changed in some other way. While there are some opportunities for optimization based on type qualifiers, generally their effect on generated code is minimal.

Instead, type qualifiers are best thought of as a kind of compiler-enforced documentation for how the program behaves. The const qualifier is useful information for the code reader: this function is not going to change this value.

Go is designed to have good support for static analysis tools. Those tools could be used to enforce any set of desired rules. That suggests that rather than adding type qualifiers to the language itself, we instead provide a way to annotate program source code so that separate analysis tools can identify desired type qualifiers and enforce them. Each organization could decide which set of tools it wishes to run.

It follows that we should consider a common mechanism for annotating program source code. While the exact details of each annotation can be left to the tools, there should be a generally agreed convention on how those annotations will be added, so that different tools can cooperate.

We already have a limited set of annotations, implemented by the compiler, in the form of //go: comments serving as compiler directives (see https://golang.org/cmd/compile). Using comments works nicely in that tools that don't care about the particular annotation will quite naturally ignore it. However, it is much less nice when applied to individual function arguments: the natural place to add the comments is within the function signature, but then placement becomes critical and the annotations interfere with reading of the code. Or so it seems to me.

I propose that we instead extend the existing concept of struct tags to permit them to be used in more places. Specifically, permit tags to be placed after the type in a function signature, including the signatures used in interface types.

func CopySlice(dest []byte, src []byte `qual: "const"`)

type Writer interface {
    Write(p []byte `qual: "const"`) (int, error)
}

Thread safety analysis can generally use the existing struct tags:

type Balance struct {
    mu sync.Mutex
    value uint64 `thread:"guarded_by(mu)"`
}

Function level qualifiers can appear attached to any parameter.

func (b *Balance) lockedWithdraw(val uint64 `thread:"requires(mu)"`) { ... }

or perhaps we can also permit tags on the signature as a whole.

I propose further that we permit all tags to be string constant expressions, not just string literals. This will permit defining standard tags in packages that may be imported.

import "qual"

type Writer interface {
    Write(p []byte qual.Const) (int, error)
}

These tags will not affect compilation at all, except that they will be placed into reflection information (we'll add a Tag method to the reflect.Type interface) and, naturally, into the export data. Otherwise the new tags will only be used by static analysis tools, possibly including vet.

This will provide us with a flexible framework for letting different groups define their own analysis, without, I hope, going too far in the direction of programming by types.

@ianlancetaylor ianlancetaylor added LanguageChange Suggested changes to the Go language v2 An incompatible library change Proposal labels Apr 16, 2018
@ianlancetaylor ianlancetaylor added this to the Proposal milestone Apr 16, 2018
@jimmyfrasche
Copy link
Member

Would

func (s *Something) Close() (error `nil:"always"`) { // ...

be legal?

What about the below?

func (s *Something `qual:"const"`) Pred() bool { // ...

A natural generalization of this and struct tags would be to also allow any variable or constant declaration to be tagged. Was this considered?

A perhaps more useful generalization is tags on any type, like

type foo int `bar:"baz"`

Could these coexist with some rule about how to combine the tags from the type tag and the parameter tag?

If they cannot coexist it doesn't let you annotate things like "don't copy values of this type" or "the zero value of this type shouldn't be used" without copying the tag to every function accepting values of that type. Sometimes things are just properties of the type and exposing that to static analyzers is useful.

@ianlancetaylor
Copy link
Contributor Author

Would
func (s *Something) Close() (error nil:"always") { // ...
be legal?

Yes.

What about the below?
func (s *Something qual:"const") Pred() bool { // ...

Yes.

A natural generalization of this and struct tags would be to also allow any variable or constant declaration to be tagged. Was this considered?

Yes. I don't have any use cases for that, so I didn't suggest it. Note that in C a variable with a const qualified type has a very different meaning than a function parameter whose type is a pointer to a const qualified type. The variable is immutable. The function argument forbids changing the pointed-to value using the function parameter. I am interested in immutable values in Go, but we shouldn't confuse that with annotations for static analysis of function/method calls.

Tags on any type likely makes sense given this proposal, but we need to consider not just what it looks like for simple types like int for also for struct and interface types. Where does the tag go, such that it is clear. Always putting the tag after the type seems like it would be easily lost for a large struct. Perhaps after the struct or interface keyword? But then different types are handled differently.

@balasanjay
Copy link
Contributor

I like the idea of making tags usable in more places, and I concur that this will be great for building static analysis tools.

But, I don't think Go should lean further into purely unstructured strings as tags. They are too prone to typos, and are difficult to namespace. Relatedly, they are difficult to version (imagine upgrading from a threadsafety v1 package to a v2 package). Permitting string constants from imported packages is a good step away from this, but I think we should go further.

#23637 has a proposal for more structured tags. I'd prefer to see something like that.

One could imagine that in addition to allowing structs in the list of tags, we could have an additional rule like "An entry in a tag list may reference a top-level variable declaration either in the current package, or another package. This variable declaration must have an initializer expression, which must be itself valid as a tag expression". This would allow your qual.Const example above.

One could also imagine that we could have a further additional rule like "An entry in a tag list may be a function call, where each argument to the function call must be valid as a tag expression. The body of the called function must consist of only a single return expression, which must be valid as a tag expression, with the exception that it may refer to arguments to the function as though they were constant expressions". This would allow writing a tag like [syncx.Requires("mu")].

@balasanjay
Copy link
Contributor

Or on second thought, it'd probably be better to just leave out those latter two additional complications. Writing qual.Const{} is not that bad, and writing syncx.Requires{"mu"} is pretty much identical.

(And if we reeally wanted to fix that former, then perhaps a better solution would be to have the lexer insert the "{}" when it sees a "," or a "]")

@ianlancetaylor
Copy link
Contributor Author

ianlancetaylor commented Apr 17, 2018

Thanks. I want to clearly separate this proposal, which is about permitting tags on parameter types, from another proposal about changing the syntax of tags. Sure, if we change the syntax of tags on struct field types then this proposal would adopt the new syntax too. But those are two different ideas that should be considered separately.

@balasanjay
Copy link
Contributor

Fair enough.

@jimmyfrasche
Copy link
Member

I didn't mean to imply anything about immutable variables or bindings with the variable case. I was mostly interested in why the line would be drawn there. The only use case for variables that I can think of would be to annotate that a global variable isn't changed after package init so that that property can be linted. There might be some other interesting uses for verifying invariants within a func. It's probably worth skipping just on complexity grounds (for the language and for the reader), though.

———

for struct and interface types. Where does the tag go, such that it is clear. Always putting the tag after the type seems like it would be easily lost for a large struct. Perhaps after the struct or interface keyword? But then different types are handled differently.

Either the struct/interface or the tag can be large so whether you choose struct/interface tag def or struct/interface def tag, there are always going to be extreme cases where one or both are large. If it's after, it's going to start on the same line as the }. It also means that:

type S struct {}
func F(s S `tag`)

and

func F(s struct{} `tag`)

are the same in the sense that you could replace S with its definition in that example without juggling. Regularity seems preferable.

———

If types and struct fields/func params can both have tags there needs to be a rule to deal with collisions. I think the simplest would be to treat the tagspace like a map: if the same key is used again, its value wins. That way, in

type S struct{} `x:"a,-" y:"z"`
func F(s S `x:"b"`)

the parameter s has the tags x:"b" y:"z"

Allowing them on both also allows repeated type+tag combos to be named like

type cbytes = []byte `qual:"Const"`

which could help documenting some of the more abstruse

———

While I agree that #23637 is the right place to talk about that proposal, I'd note that allowing const strings in addition to literals as proposed here would work better with that one. If you wanted to extend an existing pre-defined tag, you'd need to copy the original definition inline or do

const aNewName = qual.Mod + ` other:"thing"
func F(T aNewName)

unless you allow

func F(T qual.Const + ` other:"thing"`)

(which isn't going to win any syntax awards).

@robpike
Copy link
Contributor

robpike commented Apr 17, 2018

While I appreciate the value of tagged declarations (I was even the original proposer of the feature, which was an expansion of a protocol-buffer-supporting feature of Sawzall), I find them very ugly in practice. Putting them on every declaration will make code messy and unreadable. Worse, each new static tool will introduce new annotations, producing an ever growing list of quoted strings on each declaration.

I know it's an important feature for structs, and I appreciate the need for them there. In an ideal world, though, I'd find another way. They have become a blight on the page.

Please let's not make it worse.

@jimmyfrasche
Copy link
Member

@robpike currently each new static analysis tool that requires tagging has to introduce a special comment which isn't a lot better and far more ad hoc.

Part of the problem is that the tags are strings and part is that they have to be declared each time. With the tweaks I laid out in the post above yours, common patterns could be named via type alias (at least within a package) to reduce repetition and provide a single point for documentation.

If this were accepted along with something like #23637 (comment) there's still a change of horrible proliferation but it would be more manageable since the tags would at least be structured, versioned, easily moved around (via type aliases), and they'd autolink to documentation in go doc. There could even (much later) be an x/ repo for collecting tags that many tools agree upon to reduce their number and have a shared single point of truth.

It's unpleasant if there are a lot of annotations. Right now there's as much chance for an over proliferation of annotations but when you do need them it's a pain to manage them because they're magic comments and magic strings in struct tags. If there's language support for them, it's at least easier to deal with.

@zippoxer
Copy link

zippoxer commented Apr 18, 2018

While I would like to see Go's type system evolve to make Gophers feel safer about their code, I think introducing immutability to Go with

src []byte `qual: "const"`

instead of with a keyword like

src []byte const

is worse than having no immutability at all.

The former feels like a second-class citizen and immutability should be a first-class citizen. Also, as @robpike said, it doesn't look good on the page.

Regardless, I'm glad to see discussion about immutability.

@jimmyfrasche
Copy link
Member

Annotations are a way to give ad hoc second class citizens uniform first class support.

First class immutability would be great but that should be discussed elsewhere. I don't believe @ianlancetaylor's example meant to say "this []byte is immutable": it meant "I promise not to mutate this mutable []byte". They're related but distinct concepts.

@bcmills
Copy link
Contributor

bcmills commented Apr 19, 2018

I like the idea of making annotations extensible, and there is certainly precedent for tags intended for external tools. However, in the cases where those have been tried, they tend to converge to standardized annotations anyway.

For example, a subset of PEP 3017 annotations in Python were standardized in PEP 484, although to my knowledge they are still only checked by third-party tools such as pytype. Similarly, C++11 annotations have a standardized subset.

Generalized syntax may aid in experimentation and extension, but it does not remove the need to standardize the annotations that are found to be useful in practice.

@jimmyfrasche
Copy link
Member

I figure that a (somewhat dubious) proxy for the potential usage of any extension to struct tags is how many structs have tags currently.

To gather data, I wrote a quick counter: https://github.com/jimmyfrasche/tag-struct-ratio

It goes through all the .go files in import paths you give it and counts instances of *ast.StructType with at least one field (I skipped struct{} because its use as a literal inflated the count). For each counted struct, if at least one field has a struct tag, it updates a separate counter.

For cmd std it reports

Tagged 226
Total 6153

That's what I'd expect and roughly the same ratio as my code.

For all, it reports

Tagged 27453
Total 59107

which was far more than I would have thought.

@bcmills
Copy link
Contributor

bcmills commented Apr 23, 2018

@jimmyfrasche That's a nice analysis, but I'm not sure it's applicable.

The main use-case for struct tags today is for run-time reflection (encoding/json and friends). I would expect a lot of those, often involving code that is otherwise very simple.

In contrast, the tags Ian is proposing here are explicitly for compile-time analysis: those may or may not be common (depending on what they indicate), but some, such as lock annotations, are likely to be rare. (We do have a few compile-time annotations today as //go: comments, but they are not generally used in application code.)

@jimmyfrasche
Copy link
Member

Indeed. It's not really an analysis—just a data point.

@ianlancetaylor
Copy link
Contributor Author

I'm withdrawing this proposal.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

7 participants