A very preliminary draft #2696

wangkuiyi · 2017-07-03T02:32:19Z

No description provided.

jacquesqiao · 2017-07-05T02:09:43Z

paddle/framework/framework.md

+class Variable {
+ public:
+  bool Estimated() const;
+  bool SetEstimated(bool);


what is the meaning of this return bool?

I think we can make it just void.

jacquesqiao · 2017-07-05T02:32:37Z

paddle/framework/framework.md

+
+The global and nested local scopes form a hierarchy.  The following Python functions make it convenient to program scopes:
+
+1. `paddle.scope.current()` returns the current scope, which defaults to


is this line which defaults to not complete？

I meant that it defaults to the value returned by the next (second) line. I will rewrite it to make it clear.

QiJune · 2017-07-05T02:49:27Z

paddle/framework/framework.md

+};
+```
+
+Another example is that the Gemm operator need to create a tensor on `Context::places_[0]` and assign the tensor to its output variable:


What about data parallelism on multi-GPUs? I think that the places_[0] should not be used inside a operator's run method.

Anyway, we must specify one place to aggregate data from multiple GPUs.

@reyoung suggested defining the aggregation as an operator. I am not sure. It seems much easier if a net runs on a single device as in #2696 (comment).

I agree with @QiJune on #2696 (comment) -- using data parallelism and only places_[0] would be enough.

QiJune · 2017-07-05T02:52:04Z

paddle/framework/framework.md

+    if (paddle::platform::IsGPUPlace(ctx.places_[0])) {
+      cuDNNGemm(
+         Output(0).mutable_data<float>(ctx.places_[0], DerivedSizeFromInputs()),
+         ...);


Here should be cublasGemm.
And cublasGemm needs to acquire a cublasHandle to finish computation.
And nearly all the computation in Run method need to acquire a eigen device.
Please refer to #2648

Good point. I am reading and reviewing your design.

QiJune · 2017-07-05T02:55:06Z

paddle/framework/framework.md

+  std::vector<Place> places_; // a network might run on multiple devices.
+  bool training_;
+};
+```


I think that the context of Operator's Run method is different from Net's Run method.
The Net can run on multi-devices, but the operator can only run on a specific device.
So, the operator may need a OpContext to Run.

To be frank I am not even sure that a net should be able to run on multiple devices. It seems that we can use multiple devices by doing data parallelism -- each device runs only one copy of the net, and we can do gradient aggregation using NCCL. In this way, it seems that both net and operators need just one device.

it seems that both net and operators need just one device.

Is there a higher level concept to run NCCL in C++ or just let Python run NCCL? If that concept in CPP, it seems that is also a Network.

class MultiDeviceNetwork { private: // holds networks on each device. vector<Network> networks_; }

However, I suggest that we should only concern about the single device in basic Network. It is easy to change a single device Network into multiple device Network by using NCCL.

Up to what I understand now, I'd avoid running a network on multiple devices, because the maintenance of multiple CUDA streams means a very complicated CUDAContext or OpKernelContext or something like that.

And it seems that we can make use of multiple devices using data parallelism, which requires only running a net on a single device.

reyoung · 2017-07-05T09:29:20Z

paddle/framework/framework.md

+
+### Gradient Operators
+
+Each operator has a corresponding gradient operator that defines the gradient computation.


One forward op is not only corresponding to one gradient operator but gradient operators.

For example, FcOp has three inputs. Not all of them needs gradient in practice. It could implement each input corresponding to one gradient operator, so FcOp could have three gradient ops.

More generally, Caffe2 and MXNet return an array of ops as one op's gradient ops.

reyoung · 2017-07-05T09:33:43Z

paddle/framework/framework.md

+x = paddle.variable.operation("x", paddle.operator.gemm(...)) # an operation
+x = paddle.variable.tensor("x", 
+      numpy.random.randn(200, 100), # set the value of the tensor
+      estimated = true)             # will be updated by the backward algorithm.


estimated is not very straight forward name. Maybe need_backward or requires_grad is better? Not all variables that need gradient will be updated, only parameters do.

reyoung · 2017-07-05T09:38:02Z

paddle/framework/framework.md

+
+### Gradient Operators
+
+A gradient operator should be build and linked only if we are building a binary that supports training.  If we are building an "inference-only" binary, we shouldn't link gradient operators.


Gradient Operator could also be a forward operator.

For example, operator mul's gradient operator is also mul.

reyoung · 2017-07-05T09:41:25Z

paddle/framework/framework.md

+```python
+x = paddle.variable.new("x") # create a variable of not yet known type
+x = paddle.variable.tensor("x") # create a tensor typed varible without value
+x = paddle.variable.int("x", 10) # create an unnamed int varaible and set to 10


Is that variable unnamed or typo? Maybe x = paddle.variable.int(10) ?
I cannot find why it is necessary for a variable can be unnamed?

I think maybe an independent Scope could represent all attribute variables of an operator?

Like here I proposed.

I don't see that attribute is so special and cannot be represented by a Variable.

reyoung · 2017-07-05T09:47:46Z

paddle/framework/framework.md

+
+### Operators as Functions
+
+An operator is intrinsically a function, which has inputs and outputs.  For example, the functional representation of the GEMM operator is


Maybe InferShape is also need?

reyoung · 2017-07-05T09:49:02Z

paddle/framework/framework.md

+}
+```
+
+Note that operators might call other operators.  In above example, `gemm` calls `act`.


Maybe a better example for an Op call other Ops is RnnOp. :-) Because normal gemm doesn't contain activation.

reyoung · 2017-07-05T09:52:37Z

paddle/framework/framework.md

+Some Python API proposals:
+
+```python
+x = paddle.variable.new("x") # create a variable of not yet known type


This APIs could be unified by one paddle.variable.new?

x_var = paddle.variable.new("x", 10) y = paddle.variable.Tensor() y_var = paddle.variable.new("y", y) z = paddle.variable.new("z", "A string") no_name = paddle.variable.new(name=None, val=100) # actually I think each variable should have a name. It might have some independent scope to hold them.

reyoung · 2017-07-05T09:54:25Z

paddle/framework/framework.md

+x = paddle.variable.new("x") # create a variable of not yet known type
+x = paddle.variable.tensor("x") # create a tensor typed varible without value
+x = paddle.variable.int("x", 10) # create an unnamed int varaible and set to 10
+x = paddle.variable.operation("x", paddle.operator.gemm(...)) # an operation


Also, why a variable can be an operation? Is that need for RnnOp?

reyoung · 2017-07-05T09:58:06Z

paddle/framework/framework.md

+};
+```
+
+`Get` and `GetMutable` implements *lazy memory allocation*, as described in the [Variable design doc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.md).


Maybe only GetMutable implements lazy memory allocation. Get should raise an error if that variable is not created?

reyoung · 2017-07-05T10:01:37Z

paddle/framework/framework.md

+  std::shared_ptr<Scope> parent_;
+  std::vector<Scope*> children_;
+
+  Mutex mutex_;  // Make this class thread-safe.


It seems reasonable, maybe I should update Scope design and add mutex today.

reyoung · 2017-07-05T10:03:48Z

paddle/framework/framework.md

+A neural network is a program.  Training or inference is to execute it.  The runtime environment of execution is known as a *context*:
+
+1. a scope,
+1. device(s), or places,


Maybe the only device for GPU is not enough. It could be a DeviceContext for each GPU which holds

computation stream

handles for cuDNN/cuBLAS etc.

jacquesqiao · 2017-07-05T11:35:03Z

paddle/framework/framework.md

+ private:
+  std::map<std::string /*name*/, std::unique_ptr<Variable> > vars_;
+  std::shared_ptr<Scope> parent_;
+  std::vector<Scope*> children_;


Why we need children_ of a scope?

Debug printing is the only usage in my mind.

luotao1 · 2019-02-01T12:08:50Z

感谢您给PaddlePaddle贡献文档。由于文档已迁移至FluidDoc repo，因此关闭您的PR，欢迎您向FluidDoc Repo贡献文档。
Thanks for contributing to PaddlePaddle! Since documents have been moved to FluidDoc repo, we close this PR. Welcome to contribute to FluidDoc repo.

wangkuiyi and others added 2 commits July 2, 2017 19:30

A very priliminary draft

f596541

Update design

367962d

jacquesqiao reviewed Jul 5, 2017

View reviewed changes

QiJune reviewed Jul 5, 2017

View reviewed changes

reyoung reviewed Jul 5, 2017

View reviewed changes

jacquesqiao reviewed Jul 5, 2017

View reviewed changes

luotao1 closed this Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A very preliminary draft #2696

A very preliminary draft #2696

wangkuiyi commented Jul 3, 2017

jacquesqiao Jul 5, 2017

wangkuiyi Jul 5, 2017

jacquesqiao Jul 5, 2017

wangkuiyi Jul 5, 2017

QiJune Jul 5, 2017

typhoonzero Jul 5, 2017

wangkuiyi Jul 5, 2017

wangkuiyi Jul 5, 2017

QiJune Jul 5, 2017 •

edited

Loading

wangkuiyi Jul 5, 2017

QiJune Jul 5, 2017 •

edited

Loading

wangkuiyi Jul 5, 2017 •

edited

Loading

reyoung Jul 5, 2017

wangkuiyi Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017 •

edited

Loading

reyoung Jul 5, 2017

reyoung Jul 5, 2017 •

edited

Loading

reyoung Jul 5, 2017 •

edited

Loading

wangkuiyi Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017

reyoung Jul 5, 2017

jacquesqiao Jul 5, 2017

wangkuiyi Jul 5, 2017

luotao1 commented Feb 1, 2019


		The global and nested local scopes form a hierarchy. The following Python functions make it convenient to program scopes:

		1. `paddle.scope.current()` returns the current scope, which defaults to


		### Gradient Operators

		Each operator has a corresponding gradient operator that defines the gradient computation.


		### Gradient Operators

		A gradient operator should be build and linked only if we are building a binary that supports training. If we are building an "inference-only" binary, we shouldn't link gradient operators.


		### Operators as Functions

		An operator is intrinsically a function, which has inputs and outputs. For example, the functional representation of the GEMM operator is

A very preliminary draft #2696

A very preliminary draft #2696

Conversation

wangkuiyi commented Jul 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QiJune Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QiJune Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

wangkuiyi Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

reyoung Jul 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Feb 1, 2019

QiJune Jul 5, 2017 •

edited

Loading

QiJune Jul 5, 2017 •

edited

Loading

wangkuiyi Jul 5, 2017 •

edited

Loading

reyoung Jul 5, 2017 •

edited

Loading

reyoung Jul 5, 2017 •

edited

Loading

reyoung Jul 5, 2017 •

edited

Loading