Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defModule macro to define neural nets without inline C++ #5

Open
mratsim opened this issue Jan 10, 2021 · 30 comments
Open

defModule macro to define neural nets without inline C++ #5

mratsim opened this issue Jan 10, 2021 · 30 comments
Labels
enhancement New feature or request

Comments

@mratsim
Copy link
Collaborator

mratsim commented Jan 10, 2021

Currently we need to inline C++ code due to Nim not being able to generate C++ code with default value or wrap types without default constructors nim-lang/Nim#4687

This leads to the following code

emitTypes:
"""
struct Net: public torch::nn::Module {
torch::nn::Linear fc1{nullptr};
torch::nn::Linear fc2{nullptr};
torch::nn::Linear fc3{nullptr};
};
"""
type Net
{.pure, importcpp.}
= object of Module
fc1: Linear
fc2: Linear
fc3: Linear

with emittypes a workaround nim-lang/Nim#16664

template emitTypes*(typesection: static string): untyped =
## Emit a C/C++ typesection
{.emit: "/*TYPESECTION*/\n" & typesection.}
template emitGlobals*(globals: static string): untyped =
## Emit a C/C++ global variable declaration
{.emit: "/*VARSECTION*/\n" & globals.}
template emitIncludes*(includes: static string): untyped =
## Emit a C/C++ global variable declaration
{.emit: "/*INCLUDESECTION*/\n" & includes.}

Instead it would be more convenient to have a macro defModule (name subject to bikeshedding) with the following syntax:

defModule:
  type Net = object of Module
    fc1: Linear = nil
    fc2: Linear = nil
    fc3: Linear = nil

and generates the proper C++ code (nil mapped to {nullptr}) and rewrite the typesection with the proper {.pure, importcpp.} pragma.

This is similar to nim-lang/RFCs#126, nim-lang/RFCs#252

Implementation details:

  • To interpolate with the proper symbol, use the syntax {.emit:["type something {", NimTypeInterpolated, " field{nullptr};"].}
  • With interpolation we should properly support both C++ types without extracting from their importcpp pragma and Nim types without fiddling with gensym.
@mratsim mratsim added the enhancement New feature or request label Jan 10, 2021
@HugoGranstrom
Copy link
Member

Thus far I've got an untyped macro that can generate the emitted C++ code and add the correct pragmas. Would it make sense to also generate a init proc as well like the one here:

emitTypes:
"""
struct Net: public torch::nn::Module {
torch::nn::Conv2d conv1{nullptr};
torch::nn::Conv2d conv2{nullptr};
torch::nn::Dropout2d conv2_drop{nullptr};
torch::nn::Linear fc1{nullptr};
torch::nn::Linear fc2{nullptr};
};
"""
type Net {.pure, importcpp.} = object of Module
conv1: Conv2d
conv2: Conv2d
conv2_drop: Dropout2d
fc1: Linear
fc2: Linear
proc init(net: var Net) =
net.conv1 = net.register_module("conv1", Conv2d.init(1, 10, 5))
net.conv2 = net.register_module("conv2", Conv2d.init(10, 20, 5))
net.conv2_drop = net.register_module("conv2_drop", Dropout2d.init())
net.fc1 = net.register_module("fc1", Linear.init(320, 50))
net.fc2 = net.register_module("fc2", Linear.init(50, 10))

We could use your syntax from above (if we assume all types are initialized with nullptr) and write it like this:

defModule:
  type Net = object of Module 
   conv1: Conv2d = (1, 10, 5)
   conv2: Conv2d = (10, 20, 5)
   conv2_drop: Dropout2d # no value means passing no parameters
   fc1: Linear = (320, 50)
   fc2: Linear = (50, 10)

Then the values would be passed to the type's init proc. We could make it optional to generate it as well so people could choose. But this does of course build upon the fact that we are using an untyped macro as the syntax isn't legal Nim code.

I'm not too used to Torch yet so I'm not really sure whether this is an approach that would work in general for a typical Net, so any feedback is appreciated.

@mratsim
Copy link
Collaborator Author

mratsim commented Mar 17, 2021

I think something like this is better:

defModule:
  type Net = object of Module 
    conv1 = Conv2d(1, 10, 5)
    conv2 = Conv2d(10, 20, 5)
    conv2_drop: Dropout2d # no value means passing no parameters
    fc1 = Linear(320, 50)
    fc2 = Linear(50, 10)

It avoids floating untyped parenthesis and mimics object construction.

The tricky parts are:

  • shared weights which can have a dedicated syntax
  • nested nets, for example initializing a Net2 from Net

@HugoGranstrom
Copy link
Member

That does indeed look more intuitive. 👍 And with the power of pattern matching not that much harder either 😎

  • Yeah, I'll keep in mind! I'll try my best to make the code extendable for future special syntaxes.
  • That one sounds a bit trickier 🤔 Are you thinking about transfer learning and using the weights from a pretrained network or something like that? Then I could see the problems if one wants to modify it by replacing the last layers or something.

And another thing that I think got lost yesterday, should we support Nim types in a Net? I don't really see how we could include them in the emitted C++ types so I'd say: no 😛 Plus we have no way of telling if it is a Nim type or C++ type in a untyped network.

@HugoGranstrom
Copy link
Member

Think I realized what you meant by nested nets now, something along these lines, right?

defModule:
  type 
	Net = object of Module 
		conv1 = Conv2d(1, 10, 5)
		conv2 = Conv2d(10, 20, 5)
	    conv2_drop: Dropout2d # no value means passing no parameters
	    fc1 = Linear(320, 50)
	    fc2 = Linear(50, 10)
	Net2 = object of Module
		net1: Net1
		fc = Linear(100, 100)

If we go for an untyped macro we'd need a separate syntax for nested nets. Perhaps we could do it like this, that custom Nets are specified in "Nim-style" (net1: Net1) and built-in layers are defined using our new "Construction-style" (fc = Linear(100, 100)):

Net2 = object of Module
  net1: Net1
  fc = Linear(100, 100)

This would be easy to parse at least. But the again, it depends on what types we should allow in a Net.

@Vindaar
Copy link
Member

Vindaar commented Mar 17, 2021

Why not just do:

defModule:
  type 
    Net = object of Module 
      conv1 = Conv2d(1, 10, 5)
      conv2 = Conv2d(10, 20, 5)
      conv2_drop: Dropout2d # no value means passing no parameters
      fc1 = Linear(320, 50)
      fc2 = Linear(50, 10)
    Net2 = object of Module
      net1 = Net()
      fc = Linear(100, 100)

in those cases?

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 17, 2021

@Vindaar Because we need to be able to differentiate between them when generating the emitted C++. The code for Net2 should be:

struct Net2: public torch::nn::Module { 
     Net net1{nullptr}; // here's the difference, no namespace
     torch::nn::Linear fc{nullptr}; 
   }; 

But the generated code will be this if we can't see a difference between custom Nets (that we generate the C++ code for) and the built-in ones:

struct Net2: public torch::nn::Module { 
     torch::nn::Net net1{nullptr}; // here's the difference, wrong namespace
     torch::nn::Linear fc{nullptr}; 
   }; 

Built-in layers like conv2_drop: Dropout2d could also miss initial parameters and could (and perhaps should) be specified as conv2_drop = Dropout2d(). Perhaps that clashes with what you suggested?

@Vindaar
Copy link
Member

Vindaar commented Mar 17, 2021

Ah, but that can be easily solved by keeping track of all nets that were user generated (e.g. Net and Net2 in this example) and only prefixing the torch:nn prefix for those, which are not user defined, no?

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 17, 2021

Yes, that should work but then we're back at how to do that safely between macros. As people could use multiple defModule calls and use types defined in one in another. And thus we would need a shared CT table/set. (macrocache doesn't have support for checking if a key is in a table or not btw)

edit: This approach also makes the C++ generation more complex as it must first parse all the code, and then create all of the C++ structs. The current approach just creates the C++ code while parsing which is more simple. With different syntaxes that could still be the case, but that's just me being lazy 🤣

@Vindaar
Copy link
Member

Vindaar commented Mar 17, 2021

Oh ok, now I'm starting to see what you're talking about.

A shared CT table is something that should be supported anyways, imo. So it's only a matter of time until that is / should be implemented. Maybe one of us can help out improving the macrocache module for the purpose?

With the fully untyped, no state approach, another option might be something like wrapping the name in a call, e.g.:

defModule:
  type 
    Net = object of Module 
      conv1 = Conv2d(1, 10, 5)
      conv2 = Conv2d(10, 20, 5)
      conv2_drop: Dropout2d # no value means passing no parameters
      fc1 = Linear(320, 50)
      fc2 = Linear(50, 10)
    Net2 = object of Module
      net1 = custom(Net())
      fc = Linear(100, 100)

e.g. custom to indicate this is not a builtin torch type. Bikeshed on the name. custom, local, ...
Leaves the syntax alive, is easy to extract, can be documented well.

@HugoGranstrom
Copy link
Member

Yeah, it definitely should be supported, and we probably should help improve it. But it feels like it's quite convoluted to handle everything in an IC-safe way without digging our fingers into how the internals of IC works 🤔

That approach is more feasible and future-proof in my opinion 😄 And you have a good point in that it is easier to document and understand than different syntaxes. And the hardest part is of course finding a good name for it 🤣 but custom is a good name that may very well stay. And it looks even more like a syntactic thing with net1 = custom Net() (that reminds me I'll have to handle two cases:P)

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 19, 2021

Mnist now works without having to manually emit the C++ type and the init proc is generated automatically as well. I'm not really sure how to handle the nested Nets though, and specifically how to emit the C++ code for them. I thought I could just do:

struct Net: public torch::nn::Module {
	torch::nn::Conv2d conv1{nullptr};
	torch::nn::Conv2d conv2{nullptr};
	torch::nn::Dropout2d conv2_drop{nullptr};
	torch::nn::Linear fc1{nullptr};
	torch::nn::Linear fc2{nullptr};
};
struct Net2: public torch::nn::Module {
	torch::nn::Linear fc1{nullptr};
	Net net1{nullptr}; // error here
};

But then I get this error:

error C2664: 'Net::Net(const Net &)': cannot convert argument 1 from 'nullptr' to 'const Net &'
note: Reason: cannot convert from 'nullptr' to 'const Net'
note: No constructor could take the source type, or constructor overload resolution was ambiguous

It feels like Net isn't a shared_ptr really as it should be? Any ideas on how to approach it? Must we define it as a pointer ourselves or does the inheritance handle that for us?

@Vindaar
Copy link
Member

Vindaar commented Mar 19, 2021

Great work!

About the issue, maybe this doesn't make any sense, but:
Maybe the issue is simply that Net does not define a constructor that takes a single pointer? So that the solution would be to add a simple constructor line for each custom module we create? I tried checking the source of what kind of constructors those already implemented classes provide, but when I saw the TORCH_MODULE macro here:
https://pytorch.org/cppdocs/api/program_listing_file_torch_csrc_api_include_torch_nn_modules_linear.h.html
I kinda gave up.

To be honest, I don't even know exactly what providing a value within a struct definition does. I suppose a default value?

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 19, 2021

Thanks 😄

That sound very reasonable, the built-in layers are shared_ptrs so making nullptr default for them makes sense but not for us. So the question is what to put in there instead. I had a look here https://pytorch.org/tutorials/advanced/cpp_frontend.html and they seem to define a struct first and then run the TORCH_MODULE to create a pointer-y type of it somehow. But I'm not sure if we want that? We would have to change all net.param to net->param, right? Plus they have the initialization in the struct body itself which sucks for us. I'm going to think about this in the sauna for a moment now 🤔

@HugoGranstrom
Copy link
Member

Something as easy as removing the default value altogether seems to have stopped the compiler from complaining at least: Net net1;. Sometimes it's so easy 🤣 Don't know if it works though must test a nested net if it runs, but that's for tomorrow 🤞

@Vindaar
Copy link
Member

Vindaar commented Mar 19, 2021

Something as easy as removing the default value altogether seems to have stopped the compiler from complaining at least: Net net1;. Sometimes it's so easy Don't know if it works though must test a nested net if it runs, but that's for tomorrow

Indeed, that does make sense. Now you're just saying "this struct contains a member Net", same as when defining a Nim object with a field Net.
The question is should we eventually support those default values? Have to read your link up there in more detail, but it does seem like TORCH_MODULE mainly turns a type T into a fancy shared_ptr<T> wrapped in an object.

@HugoGranstrom
Copy link
Member

Yeah exactly 👍
Idk tbh. We are kind of handling the default values ourselfs in our generated init proc so it seems unnecessary to try and make the generated C++ code even more complex?
Yes that's my understanding of it as well. And for the builtin layers it doesn't matter because we don't access their members, just a few select functions that we have wrapped. But if we want to access the fields in a Nim Net we want to be able to access the fields and that's harder if we wrap it in a shared_ptr I think. But maybe we could fix that somehow as well, have to think a bit more on it

@Vindaar
Copy link
Member

Vindaar commented Mar 19, 2021

Personally I would probably try to see how far your current approach can work. It's "only" a macro and we can always rewrite parts of it. :)

@HugoGranstrom
Copy link
Member

Agreed! Both because it is simpler and because I don't want to start over 🤪

@HugoGranstrom
Copy link
Member

Tested it and initializing nested Nets seems to work (tomorrow is another day, namely today :P)

Something that doesn't work is defining custom Nets in one file and importing it into another though. And that makes sense as the emitted C++ in the original file isn't visible to the file that imports the types. I guess this is a issue for us? A hack I found to "solve" it is to define the types in a template and importing and running the template. This way all emitted types are visible. Here an example:

# nets.nim
template importNets*: untyped {.dirty.} =
  defModule:
    type 
      Net* = object of Module
        conv1* = Conv2d(1, 10, 5)
        conv2* = Conv2d(10, 20, 5)
        conv2_drop* = Dropout2d()
        fc1* = Linear(320, 50)
        fc2* = Linear(50, 10)
      Net2* = object of Module
        net1* = custom Net()
        fc1* = Linear(4, 1)
# training.nim
import ./nets
importNets()
var model: Net
model.init()

I don't know how we could solve it otherwise unless we can get the emitted C++ code visible across files. It's the C++ compiler that complains in the end that Net is undeclared.

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 20, 2021

Have done some further testing with nesting now and it doesn't seem to work unless we register the subnet and it seems to expect a shared_ptr to a net. So we may have to use TORCH_MODULE after all :/
The only thing that would be cumbersome is that we can't access the members with dot notation and instead must use arrows. Is that something that's acceptable? Otherwise, we could try writing a dot macro and rewrite all unresolved dot operations to arrows https://nim-lang.github.io/Nim/manual_experimental.html#special-operators-dot-operators
So something like net.conv1 becomes net->conv1.
Any better suggestions now that we have to use the pointer version of the nets?

Edit: realized that the dot macros won't work because the Nim compiler thinks it can access the fields so the macro won't be invoked. :/

@Vindaar
Copy link
Member

Vindaar commented Mar 20, 2021

I'm not sure what you mean by we have to use ->. Do you mean within the emit statements of the macro?

@HugoGranstrom
Copy link
Member

No, I mean when we want to access the fields of our Nets (in the forward proc mostly). If our Net is a shared_ptr (it's a wrapper around a shared pointer in reality) then we can't access the fields with net.conv1.forward(x) in C++ but instead must use net->conv1.forward. But in the C++ code that the compiler generates it uses a dot instead of an arrow. So it has nothing to do with the emit. Just what code the compiler generates.

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 20, 2021

Sorry for spamming, just want to get my thoughts down in writing while they are on the top of my head.

Here's how it works in C++:

struct LinearImpl : torch::nn::Module {
  // define the members here
};

TORCH_MODULE(Linear);

LinearImpl is an ordinary Module with value semantics. Linear on the other hand is a nn:ModuleHolder which is a wrapper around shared_ptr<LinearImpl>. And it's this that seems to be the recommended way of doing things in LibTorch. The question then is how to wrap this in Nim.
We can't really wrap the LinearImpl directly because we won't get a shared pointer then. And if we wrap Linear we get the pointer but now we can't access the fields (according to the Nim compiler which only sees that it is an "empty" imported type). So my guess would be that we need to wrap the Linear type but somehow provide the fields of the struct to the Nim compiler and make sure it outputs C++ code that uses -> instead of . for field access.
A naive idea on how to do this would be to generate procs in the macro for each and every field that is defined using importcpp: "#->`fieldName`" like this:

type Net {.importcpp.} = object of ModuleHolder
# it has fields conv1, conv2, fc1
# generate the procs for accessing them:
proc conv1*(net: var Net): ModuleHolder {.importcpp: "#->conv1".}
proc conv2*(net: var Net): ModuleHolder {.importcpp: "#->conv2".}
proc fc1*(net: var Net): ModuleHolder {.importcpp: "#->fc1".}

# Now it can be used as an ordinary object:
var net: Net
net.init()
let y = net.conv1.forward(x)

Another approach would be to wrap LinearImpl in a shared_ptr<LinearImpl> ourselves and then using a dot macro (Mamy has already implemented one for methods I think, so doing one for field access should be doable). This way the Nim compiler could know of the fields' existence without us having to tell it about it. So in this case the code would be something like:

type Net = CppSharedPtr[NetImpl]

var net: Net
net.init()
let y = net.conv1.forward(x) # `.` macro handles it so we generate `->` instead of `.`

I haven't implemented any of them yet though so no idea if it could work.

edit: Not sure if it was clear but the reason we need the module as a pointer is that register_module, which is used to register submodules (nested nets), requires the model to be a ModuleHolder or a shared_ptr.

@HugoGranstrom
Copy link
Member

Here comes a bit more rambling from me. I've thought a bit more about this and I think doing it the torch::nn::ModuleHolder way is the way to go. And the strongest reason is that we'd have to implement it either way because the built-in layers like Linear and Conv2d are ModuleHolders and not Modules. And so my thinking is that we should do it the same way that Libtorch is doing it, something like this:

type
  Module* {.bycopy, pure, inheritable, importcpp: "torch::nn::Module".} = object
    ## A LibTorch neural network module that can be inherited from
  ModuleHolder*[T: Module] {.bycopy, pure, importcpp: "torch::nn:ModuleHolder<'0>".} = object
    ## A Libtorch wrapper around std::shared_ptr<Module> 
type
  LinearImpl* {.pure, bycopy, importcpp: "torch::nn::LinearImpl".} = object of Module
    options*{.importc.}: LinearOptions
    weight*{.importc.}: Tensor
    bias*{.importc.}: Tensor
  Linear* {.pure, bycopy, importcpp: "torch::nn::Linear".} = ModuleHolder[LinearImpl]
    # Linear is a shared_ptr underneath.
    # The ptr is bycopy which results in the actual data being byref.

  Conv2dImpl* {.pure, bycopy, importcpp: "torch::nn::Conv2dImpl".} = object of Module
    options*{.importc.}: Conv2DOptions
    weight*{.importc.}: Tensor
    bias*{.importc.}: Tensor
  Conv2d* {.pure, bycopy, importcpp: "torch::nn::Conv2d".} = ModuleHolder[Conv2dImpl]
    # Conv2d is a shared_ptr underneath.
    # The ptr is bycopy which results in the actual data being byref.

Then we just define a dot macro to handle the dereferencing of a ModuleHolder[Module] to Module and then everything should hopefully work.

@HugoGranstrom
Copy link
Member

Of course, Nim doesn't like it when we use inherited types as generic types 🤣:P Have to find another way to do this it seems.

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 22, 2021

Heureka! Think I've got it to work finally 🙃 So I'm leaning more and more towards using CppSharedPtr now because we can get the underlying type then. Hopefully, nothing breaks until I'm back from a break now 😱

@HugoGranstrom
Copy link
Member

@mratsim Any reason why you went with

var net: Net
net.init()

# instead of

var net: Net = Net.init()

The second way is how the builtin layers are initialized so for consistency we could use it for custom Nets as well?

@HugoGranstrom
Copy link
Member

Ok, now I've got something that works actually works for nested nets as well. Added support for parameters (Tensors) as well. So my final proposal for the syntax is this:

defModule:
  type
    Net* = object of Module
      conv1* = Conv2d(1, 10, 5)
      conv2* = Conv2d(10, 20, 5)
      conv2_drop* = Dropout2d()
      fc1* = Linear(320, 50)
      fc2* = Linear(50, 10)
    Net2 = object of Module
      anotherNet = custom Net
	  w = param randn(100, 100)
      bias = param randn(100)

var net = Net.init()
var net2 = Net2.init()

The only thing that feels redundant is object of Module mainly because it isn't entirely correct. Net is a CppSharedPointer[Module] after all. The only reason I could see for leaving it be would be if we want to allow the user to define non-net types in defModule as well but I don't know.

Now you are free from my spamming here for now 🤣

@mratsim
Copy link
Collaborator Author

mratsim commented Mar 22, 2021

@mratsim Any reason why you went with

var net: Net
net.init()

# instead of

var net: Net = Net.init()

The second way is how the builtin layers are initialized so for consistency we could use it for custom Nets as well?

Because some layers might not be copyable and so I want to avoid assignments.

@HugoGranstrom
Copy link
Member

HugoGranstrom commented Mar 22, 2021

@mratsim Okay 👍 But if we wrap all of them in shared_ptr we won't have to deal with that, right? And from what I know, most, if not all, built-in layers seem to be ModuleHolders and thus are fancy shared_ptrs and have reference semantics. Do you know of any case that I haven't mentioned here? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants