-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Fluid new API scaffolding #10313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluid new API scaffolding #10313
Conversation
if path: | ||
self._load(path) | ||
|
||
def _load(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why load is defined as a private method but save it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is not optimal. We will remove this class: #10313 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed params.py in #10354
|
||
class Params(object): | ||
def __init__(self, path=None): | ||
self.scope = core.Scope() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relationship between a Params instance and this scope instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Params is a wrapper on top of scope, and provides additional functionality to save and load parameters.
We think it's not necessary, I will send a PR to remove it.
The new interface of the Trainer constructor will be:
def Trainer(object):
def __init__(self, network_func, optimizer, param_path=None, scope=None, place=None):
The parameters of the trainer can be optional initialized by the saved parameters in param_path
, or taken from scope
. If nothing is provided, the parameters will be randomly initialized.
The scope
argument is necessary because we want to share the scope between inferencer and trainer. One use case is GAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed params.py in #10354
# reference: save_persistables in io.py | ||
pass | ||
|
||
def add_params(self, scope): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand here -- the method name is add params, but the method comment says that it takes keys from the scope. Does it really add something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan to remove this class, and add a merge
method to core.Scope
. It takes another scope, and set all the vars in the other scope to itself. If vars with the same name is present, its own vars will be overridden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Scope
class, can we expose the vars
and make it public? That might be necessary to implement the merge
function.
We could use LocalVarNames
to do the iteration as well. So if we do want to hide the vars_, we can still do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jetfuel maybe we can implement merge
in the C++? how do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed params.py in #10354
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helinwang Yes, it should be in the C++. My approach is to use the LocalVarNames and check one by one.
void Scope::merge(Scope* scope) const {
std::vector<std::string> names = scope.LocalVarNames;
for (auto name_it = names.begin(); name_it != names.end(); name_it ++) {
auto it = vars_.find(*name_it);
if (it == vars_.end()) {
vars_[*name_it] = scope.Var(*name_it)
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jetfuel Thanks! Every var in the scope
argument will be put into the receiving scope, if the receiving scope already has the same name, it will be replaced. So I guess we can just do:
void Scope::merge(Scope* scope) const {
std::vector<std::string> names = scope.LocalVarNames;
for (auto name_it = names.begin(); name_it != names.end(); name_it ++) {
vars_[*name_it] = scope.Var(*name_it)
}
}
Sorry the spec was different from what we discussed yesterday (yesterday's was to not replace if already exists).
class Event(object): | ||
BEGIN_EPOCH = 0 | ||
END_EPOCH = 1 | ||
BEGIN_STEP = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "step" here "iteration"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think step is more commonly used in deep learning. In the same time iteration is more often used in programming languages. Since here is the event for training, I think the user would be more familiar with "step".
self.place = place | ||
# TODO(helin): support distributed training | ||
|
||
def train(self, reader, num_epochs, event_handler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite remember how do we define a pass, and epoch, with our current reader design? If a reader reads gRPC requests, it would never end and would have no pass separations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A call to the reader will return a iterator that contains a single pass of data. Yes, if a reader reads gRPC requests, it would never end and would have no pass separations.
self.type = Event.BEGIN_EPOCH | ||
|
||
|
||
class Trainer(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a base class, which implies that we need derived classes like MultiGPUTrainer
, MultiNodeTrainer
, EDLTrainer
, etc, or it is a plain class and both single-node and multi-node training logics should be implemented in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current plan is "a plain class and both single-node and multi-node training logics should be implemented in it", the logic is controlled by environment variable. Here is an example: #10316
The scaffolding defines the scaffold of each class, so we can start implementing in parallel.