Simpler cluster train job submit code

@Yancey1989  wrote this job submit tools at:  https://github.com/Yancey1989/paddle-job

currently submiting a job looks like:

```python
paddle.init(
            use_gpu=False,
            trainer_count=1,
            port=7164,
            ports_num=1,
            ports_num_for_sparse=1,
            num_gradient_servers=1,
            trainer_id=fetch_trainer_id(),
            pservers=fetch_pserver_ips())
job.dist_train(
        trainer=trainer,
        reader=paddle.batch(paddle.dataset.imikolov.train(word_dict, N), 32),
        num_passes=30,
        event_handler=event_handler,
        paddle_job=job.PaddleJob(
            pservers=3,
            base_image="yancey1989/paddle-cloud",
            input="/yanxu05",
            output="/yanxu05",
            job_name="paddle-cloud",
            namespace="yanxu",
            use_gpu=False,
            cpu_num=3,
            trainer_package_path="/example/word2vec",
            entry_point="python api_train_v2.py"))
```


We want to make it simpler like:

```python
# init from ENV "PADDLE_*", args below will overwrite the ENVs
paddle.init(use_gpu=False)
...
myjob = job.dist_train(
        trainer=trainer,
        reader=my_dist_reader("dataset-name"),
        num_passes=30,
        event_handler=event_handler,
        paddle_job=job.PaddleJob(
            [cluster configurations...]))
print "view job status at: ", myjob.status_url()
```

### Required ENVs:
- "PADDLE_PSERVERS"
- "PADDLE_TRAINER_ID"
- "PADDLE_TRAINER_COUNT"
- "PADDLE_NUM_GRADIENT_SERVERS"
- "PADDLE_PORTS_NUM_FOR_SPARSE"

### Optional ENVs:
- "PADDLE_PORT": default 7164
- "PADDLE_PORTS_NUM": default 1
- "PADDLE_USE_GPU": default False

### Cluster Job Configurations:
#### Job Resources
- parallism: parallism equals to num of trainer, the num of pservers is caculated from parallism.
- num_gpus: gpu resources needed, if `num_gpus ==0` and env "PADDLE_USE_GPU" set to True or the oppsite, paddle will throw a warning message when submiting a job.
- num_cpus: cpu resource
- entry_point: command to start your trainning program: `python /data/cloud/storage/path/train.py`
- ***NOTE:*** Paddle will default mount your cloud storage volume at `/data`, so your trainning program can read data any where under `/data`

#### Advanced settings:
- pservers: if this is set, num of pservers will be set to this value instead of auto caculated from parallism.
- base_image: use your own image to run
- job_name: use your own job name
- ***NOTE:*** namespace is read from ENV: "USER_NAMESPACE"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simpler cluster train job submit code #2047

Required ENVs:

Optional ENVs:

Cluster Job Configurations:

Job Resources

Advanced settings:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simpler cluster train job submit code #2047

Description

Required ENVs:

Optional ENVs:

Cluster Job Configurations:

Job Resources

Advanced settings:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions