Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

const now works with ref types #15528

Closed
wants to merge 13 commits into from

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Oct 8, 2020

example 1: nested ref

when true:
  block:
    type Foo = ref object
      x1: string
      x2: seq[Foo]
    const j1 = Foo(x1: "asdf", x2: @[Foo(x1: "bar")])
    echo j1[]
    let j2 = j1
    doAssert j2.x2[0].x1 == "bar"

example 2: Table

when true:
  import std/tables
  const a = {1: "1", 2: "2"}.newTable

More generally, this enables very useful patterns, eg reading a config file at CT (json, etc) into some ref const, and caching it for reuse, making it available at RT and CT. This would not be possible before this PR.

More on this config file idea later (a DRY + typesafe config available at RT + CT)

links

scratch below

@Araq
Copy link
Member

Araq commented Oct 9, 2020

Try this snippet please:

type
  Foo = ref object
    le: Foo
    data: string

const
  c = Foo(le: Foo(data: "abc"))

var s: seq[Foo]

proc a(x: Foo) = 
  s.add x
  x.le = x

a(c)

Compile with --gc:arc -d:useMalloc and run under valgrind.

I consider this feature fundamentally broken as your const objects have no refcount field the GCs could use. And once you add these, you need to ensure that a special RC value of e.g. -1 means "don't touch, it's readonly memory". And that code that is usually on the critical path, every check you add there shows up in benchmarks.

@timotheecour
Copy link
Member Author

timotheecour commented Oct 9, 2020

your const objects have no refcount field the GCs could use

The const objects are never codegen'd. Each expression involving a const (eg a(c)) behaves as if the const value was inline, eg a(Foo(le: Foo(data: "abc"))) (by design), so I don't follow your objection.

Try this snippet please:
Compile with --gc:arc -d:useMalloc and run under valgrind.

valgrind isn't supported unfortunately on OSX anymore, so we can try on linux, but I tried without valgrind with --gc:arc -d:useMalloc and got the results I expected, both in terms of program output after adding:

  # after a(c)
  echo cast[int](s[0])
  echo s[0][].data
  echo cast[int](s[0][].le)

as well as in terms of codegen:

a(c) generates essentially same codegen as a(Foo(le: Foo(data: "abc"))) where we inline the const value.

a(c):

	T1_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*)0;
	T1_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*) nimNewObj(sizeof(tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ));
	T2_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*)0;
	T2_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*) nimNewObj(sizeof(tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ));
	(*T2_).le = NIM_NIL;
	(*T2_).data = TM__trcROVvP6NYggAxei7rzRw_5;
	(*T1_).le = T2_;
	(*T1_).data = TM__trcROVvP6NYggAxei7rzRw_7;
//  a(c)
	a__7VXyJR39bgzTxJrehXZdx0g(T1_);

a(Foo(le: Foo(data: "abc")))

	T1_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*)0;
	T1_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*) nimNewObj(sizeof(tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ));
	T2_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*)0;
	T2_ = (tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ*) nimNewObj(sizeof(tyObject_FoocolonObjectType___TuItCDa8xP9cPPUDKgbfgjQ));
	(*T2_).data = TM__trcROVvP6NYggAxei7rzRw_5;
	(*T1_).le = T2_;
	colontmpD_ = T1_;
//  a(Foo(le: Foo(data: "abc")))
	a__7VXyJR39bgzTxJrehXZdx0g(colontmpD_);

(the difference should not be relevant for this PR, ie nil string vs "")

@Araq
Copy link
Member

Araq commented Oct 9, 2020

The const objects are never codegen'd.

Aha! Smart... That changes everything, need to reconsider.

@Araq
Copy link
Member

Araq commented Oct 10, 2020

Well, but then const x: array[10, T] produces very different code based on the T. Also, how does this interact with threads and .gcsafe? See? RFCs could help to clarify these issues without me having to read your source code diffs...

@timotheecour
Copy link
Member Author

Well, but then const x: array[10, T] produces very different code based on the T

again, this is no different than inlining the const value where it's being used at runtime:

when true:
  type Foo = object
    f0: int
  type Goo = ref object
    g0: int
  const a1 = [Foo(f0:2)]
  const a2 = [Goo(g0:3)]
  echo a1[0]
  echo a2[0][]
  echo [Goo(g0:4)][0][]
{
	tyArray__nHXaesL0DJZHyVS07ARPRA T1_;
	tyArray__nHXaesL0DJZHyVS07ARPRA T2_;
	tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ* T3_;
	tyArray__nHXaesL0DJZHyVS07ARPRA T4_;
	tyArray__x2M3wO1Il4HYptGntmwQsA T5_;
	tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ* T6_;
	nimZeroMem((void*)T1_, sizeof(tyArray__nHXaesL0DJZHyVS07ARPRA));
	T1_[0] = dollar___U5KnzlPwjOo0M9b129acZ0tA(TM__trcROVvP6NYggAxei7rzRw_2);
	echoBinSafe(T1_, 1);
	nimZeroMem((void*)T2_, sizeof(tyArray__nHXaesL0DJZHyVS07ARPRA));
	T3_ = (tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ*)0;
	T3_ = (tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ*) newObj((&NTI__9bzDvICyBeK9aVP46T6Eu9cZg_), sizeof(tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ));
	(*T3_).g0 = ((NI) 3);
	T2_[0] = dollar___mY9cPlK9c627nRYPc9b6BuzOg((*T3_));
	echoBinSafe(T2_, 1);
	nimZeroMem((void*)T4_, sizeof(tyArray__nHXaesL0DJZHyVS07ARPRA));
	nimZeroMem((void*)T5_, sizeof(tyArray__x2M3wO1Il4HYptGntmwQsA));
	T6_ = (tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ*)0;
	T6_ = (tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ*) newObj((&NTI__9bzDvICyBeK9aVP46T6Eu9cZg_), sizeof(tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ));
	(*T6_).g0 = ((NI) 4);
	T5_[0] = T6_;
	T4_[0] = dollar___mY9cPlK9c627nRYPc9b6BuzOg((*T5_[(((NI) 0))- 0]));
	echoBinSafe(T4_, 1);
}

@timotheecour
Copy link
Member Author

timotheecour commented Oct 11, 2020

Also, how does this interact with threads and .gcsafe

no interaction.

same as above, this is no different than inlining the const value where it's being used at runtime:

when true:
  type Goo = ref object
    g0: int
  const a = Goo(g0: 1)
  # let a = Goo(g0: 1)
  proc bar(b: int){.thread.} =
    # {.gcsafe.}: # not needed (only needed if using `let a = Goo(g0: 1)` instead of `const a = ...`)
      echo a[] # same as: echo Goo(g0: 1)[]
  var t: Thread[int]
  createThread(t, bar, 1)
  bar(2)
  joinThread(t)

generates:

N_LIB_PRIVATE N_NIMCALL(void, bar__s9b844G4lsRMWUu7eTT89coQ)(NI b) {
	tyArray__nHXaesL0DJZHyVS07ARPRA T1_;
	tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ* T2_;
	nimZeroMem((void*)T1_, sizeof(tyArray__nHXaesL0DJZHyVS07ARPRA));
	T2_ = (tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ*)0;
	T2_ = (tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ*) newObj((&NTI__9bzDvICyBeK9aVP46T6Eu9cZg_), sizeof(tyObject_GoocolonObjectType___d2psUr9ahZ3ClOxT4kVLWgQ));
	(*T2_).g0 = ((NI) 1);
	T1_[0] = dollar___mY9cPlK9c627nRYPc9b6BuzOg((*T2_));
	echoBinSafe(T1_, 1);
}

=> the newObj occurs inside bar, it is hence gcsafe, as would be with echo Goo(g0: 1)[]

@cooldome
Copy link
Member

Please check my understanding.
The codegen for const ref is actually same as for var ref in this PR.
Hence, const refs are now allowed which is good thing but they don't use readonly memory like seq, string do.

@timotheecour
Copy link
Member Author

timotheecour commented Oct 13, 2020

The codegen for const ref is actually same as for var ref in this PR.

no. const (including const ref) is never codegend, unlike var/let. Only usages of a const (including const ref) in a runtime context are codegen'd, treating the const (including const ref) expression as if it were inlined in each place it's used in. eg:

when true:
  type Foo = ref object
    f0: int

  var a1 = Foo(f0:1) # codegend
  var a2 = Foo(f0:1).static # codegend
  const a3 = Foo(f0:1) # not codegend
  var a4 = a3 # codegend as var a4 = Foo(f0:1)
  doAssert a4.f0 == 1
  doAssert a3.f0 == 1 # codegend as doAssert Foo(f0:1).f0 == 1
  doAssert Foo(f0:1).f0 == 1 # same codegen as above line
  doAssert a3.f0 == 1 # codegend again as doAssert Foo(f0:1).f0 == 1: `a3` is inlined in each runtime use
  const a5 = a3 # not codegened, not inlined
  static: doAssert a5.f0 == 1 # not codegend
  var u = 1
  doAssert a5.f0.static == u # Foo(f0:1) is not codegend, the LHS is codegend directly as `1`

  ## practical example:
  block:
    proc expensiveFun(a: int): Foo = Foo(f0: a)
    let b1 = expensiveFun(3).static # Foo(f0: 3) codegend here
    doAssert b1.f0 == 3 # `Foo(f0: 3)` not inlined here, b1 var directly accessed
    const b2 = expensiveFun(3) # in case you need to re-use the const object for further CT usage
    const b3f = b2.f0 # like here
    let b3 = b2 # now you can use b3 at runtime to avoid inlining b2 everywhere it's used at runtime
    doAssert b3.f0 == 3 # `Foo(f0: 3)` not inlined here, b3 var directly accessed

=> check the codegen with: nim r --nimcache:/tmp/d23 -d:danger --embedsrc main.nim

Hence, const refs are now allowed which is good thing but they don't use readonly memory like seq, string do.

again, const refs, like const, aren't codegen'd so use neither uses ROM nor regular memory;
only when used in a runtime context is codegen used, and depending on whether it's a string a cstring, either a direct reference to the ROM (cstring) or a copy from ROM (string) will be used for those.

when true:
  type Foo1 = object
    f0: string
  type Foo2 = ref object
    f0: string
  const a1 = Foo1(f0: "abc1") # no codgen
  const a2 = Foo2(f0: "abc2") # no codgen
  let v1 = a1 # calls `copyStringRC1` from a readonly memory (ROM) to a non-ROM memory
  v1.f0[0].unsafeAddr[] = 'A' # no SIGBUG (not ROM)
  doAssert v1.f0[0] == 'A'
  let v2 = Foo2(f0: "abc2") # also calls `copyStringRC1` from ROM to non-ROM
  let v3 = a2 # same semantic as `let v2 = ...`

  let s4 = "abc3" # also calls `copyStringRC1` from ROM to non-ROM
  s4[0].unsafeAddr[] = 'A'

  # cstring is different
  var s5: cstring = "abc4" # ROM: no copy currently (https://github.com/nim-lang/Nim/issues/8463)
  s5[0] = 'A' # SIGBUS

note

  • const (including const ref) behaves identical to D's enum (aka; manifest constant), it's not codegen'd by itself but pasted in each runtime expression where it's used. D also has a notion of static const, as explained here Support default values for object properties RFCs#126 (comment), for CT expressions written to ROM:

D distinguishes between:
enum a = expr;: enum creates a compile-time only construct; it has no address; pretty close to nim's const a = expr
static const a = new Foo();: this computes at CT as well but puts in the static data segment, and has an address you can access at runtime. in nim, that'd be the analog, for example, of let a = "foo".cstring, which puts "foo" in the data segment as readonly memory

(also mentioned in #8463)

@timotheecour
Copy link
Member Author

@Araq friendly ping on this PR

nim-lang/RFCs#271 (comment)

Thanks. It's growing on me...

@timotheecour
Copy link
Member Author

ping @Araq

@Araq
Copy link
Member

Araq commented Jul 21, 2021

It's still against the "consts must not inline" idea and it's unclear whether it makes the language harder or simpler to use/understand. I can easily write template t: untyped = stuffWithRef instead of const t = suffWithRef. I know, I know, your way is better because "consistency" and "code silos" and "generic code" but there is no proof that consistency has much value as a guiding design principle. Lisp is much more consistent than Python and now look at where these languages are. Lacking any evidence, "Different things should be written differently as it makes things less prone to errors" is an equally valid design principle.

@timotheecour
Copy link
Member Author

timotheecour commented Jul 21, 2021

I can easily write template t: untyped = stuffWithRef instead of const t = suffWithRef.

I don't see how this relates to this PR, nor how does it help with caching CT computations where the result type contains (nested) ref; your approach can't work for that. This PR enables important use cases that were previously impossible, and lots of people want it.

It's still against the "consts must not inline" idea

I don't know what idea that is; consts used in an RT context already get codegen'd as a literal, even before this PR.

@Araq
Copy link
Member

Araq commented Jul 22, 2021

I don't see how this relates to this PR, nor how does it help with caching CT computations where the result type contains (nested) ref; your approach can't work for that.

"when codegen encounters a const sym, it gets expanded out via its ast as if the whole const expression was inlined"

so what kind of caching are you talking about?

@timotheecour
Copy link
Member Author

so what kind of caching are you talking about?

any computation that should be done just once (either for CT performance reasons or for correctness) and involves a ref.
eg:

const data = staticRead("data.json").parseJson

now you can use data at CT (or codegen parts of it to RT) without having to re-read data.json or call parseJson again.

when true:
  import std/json
  import std/wrapnils
  const cfg = staticRead("data.json").parseJson
  const envs = ?.cfg.getOrDefault("envs")
  echo ?.envs.getOrDefault("foo").getOrDefault("bar").getInt.static # codegen's only final `int` in this example

@Araq
Copy link
Member

Araq commented Jul 22, 2021

So what precisely is const cfg = staticRead("data.json").parseJson transformed into?

@timotheecour
Copy link
Member Author

timotheecour commented Jul 22, 2021

const just triggers CT evaluation, ie it's a NimNode, the same as you'd get if it were evaluated in a VM context. It's not codegen'd unless used in a RT context, in which case it's transformed into a literal.

see example below:

  • a isn't codegen'd
  • a1 is codegen'd using a literal computed from a1
  • a2 is codegen'd using a literal explicitly written

the code generated for a1 and a2 is identical (look at the .c files)

when true:
  type N = ref object
    val: int
    lhs, rhs: N
  proc getNode(): N =
    result = N(val: 0)
    result.lhs = N(val: 1)
    result.rhs = N(val: 2)
    result.rhs.lhs = N(val: 3)
  proc toStr(result: var string, a: N) =
    if a == nil:
      result.add "nil"
    else:
      result.add '('
      result.toStr a.lhs
      result.add ',' & $a.val & ','
      result.toStr a.rhs
      result.add ')'
  proc toStr(a: N): string = result.toStr(a)
  proc main =
    const a = getNode()
    let a1 = a
    let a2 = N(val: 0, lhs: N(val: 1, lhs: nil, rhs: nil), rhs: N(val: 2, lhs: N(val: 3, lhs: nil, rhs: nil), rhs: nil))
    echo a1.toStr
    echo a2.toStr
  main()

prints:
((nil,1,nil),0,((nil,3,nil),2,nil))
((nil,1,nil),0,((nil,3,nil),2,nil))

@timotheecour
Copy link
Member Author

ping @Araq

@ringabout
Copy link
Member

It also fixes #6014

@stale
Copy link

stale bot commented Oct 1, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Oct 1, 2022
@stale stale bot closed this Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nim crash; constant array of refs is not reported as an error
4 participants