Skip to content

Fluid new API scaffolding #10313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 1, 2018
Merged

Fluid new API scaffolding #10313

merged 5 commits into from
May 1, 2018

Conversation

helinwang
Copy link
Contributor

The scaffolding defines the scaffold of each class, so we can start implementing in parallel.

@helinwang helinwang requested a review from cs2be May 1, 2018 22:40
cs2be
cs2be previously approved these changes May 1, 2018
@helinwang helinwang merged commit d1f9959 into develop May 1, 2018
@helinwang helinwang deleted the new_api_scaffolding branch May 1, 2018 23:55
if path:
self._load(path)

def _load(self, path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why load is defined as a private method but save it public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is not optimal. We will remove this class: #10313 (comment)

Copy link
Contributor Author

@helinwang helinwang May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed params.py in #10354


class Params(object):
def __init__(self, path=None):
self.scope = core.Scope()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between a Params instance and this scope instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Params is a wrapper on top of scope, and provides additional functionality to save and load parameters.

We think it's not necessary, I will send a PR to remove it.

The new interface of the Trainer constructor will be:

def Trainer(object):
  def __init__(self, network_func, optimizer, param_path=None, scope=None, place=None):

The parameters of the trainer can be optional initialized by the saved parameters in param_path, or taken from scope. If nothing is provided, the parameters will be randomly initialized.

The scope argument is necessary because we want to share the scope between inferencer and trainer. One use case is GAN.

Copy link
Contributor Author

@helinwang helinwang May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed params.py in #10354

# reference: save_persistables in io.py
pass

def add_params(self, scope):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand here -- the method name is add params, but the method comment says that it takes keys from the scope. Does it really add something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to remove this class, and add a merge method to core.Scope. It takes another scope, and set all the vars in the other scope to itself. If vars with the same name is present, its own vars will be overridden.

Copy link
Contributor

@jetfuel jetfuel May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Scope class, can we expose the vars and make it public? That might be necessary to implement the merge function.

We could use LocalVarNames to do the iteration as well. So if we do want to hide the vars_, we can still do that.

Copy link
Contributor Author

@helinwang helinwang May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jetfuel maybe we can implement merge in the C++? how do you think?

Copy link
Contributor Author

@helinwang helinwang May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed params.py in #10354

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang Yes, it should be in the C++. My approach is to use the LocalVarNames and check one by one.

void Scope::merge(Scope* scope) const {
  std::vector<std::string> names = scope.LocalVarNames;
  for (auto name_it = names.begin(); name_it != names.end(); name_it ++) {
    auto it = vars_.find(*name_it);
    if (it == vars_.end()) {
      vars_[*name_it] = scope.Var(*name_it)
    }
  }
}

Copy link
Contributor Author

@helinwang helinwang May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jetfuel Thanks! Every var in the scope argument will be put into the receiving scope, if the receiving scope already has the same name, it will be replaced. So I guess we can just do:

void Scope::merge(Scope* scope) const {
  std::vector<std::string> names = scope.LocalVarNames;
  for (auto name_it = names.begin(); name_it != names.end(); name_it ++) {
    vars_[*name_it] = scope.Var(*name_it)
  }
}

Sorry the spec was different from what we discussed yesterday (yesterday's was to not replace if already exists).

class Event(object):
BEGIN_EPOCH = 0
END_EPOCH = 1
BEGIN_STEP = 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "step" here "iteration"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think step is more commonly used in deep learning. In the same time iteration is more often used in programming languages. Since here is the event for training, I think the user would be more familiar with "step".

self.place = place
# TODO(helin): support distributed training

def train(self, reader, num_epochs, event_handler):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite remember how do we define a pass, and epoch, with our current reader design? If a reader reads gRPC requests, it would never end and would have no pass separations, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A call to the reader will return a iterator that contains a single pass of data. Yes, if a reader reads gRPC requests, it would never end and would have no pass separations.

self.type = Event.BEGIN_EPOCH


class Trainer(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a base class, which implies that we need derived classes like MultiGPUTrainer, MultiNodeTrainer, EDLTrainer, etc, or it is a plain class and both single-node and multi-node training logics should be implemented in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current plan is "a plain class and both single-node and multi-node training logics should be implemented in it", the logic is controlled by environment variable. Here is an example: #10316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants