Feature: Support for multiple custom scripts and master only scrips #93

jafreck · 2017-09-22T00:14:47Z

Fix #91
Fix #89

…o python

timotheeguerin · 2017-09-22T00:18:26Z

custom-scripts/simple.sh

+    echo "This is a custom script running on just the master!"
+fi
+
+echo "This is a custom script running all workers!"


all workers and the master

typo: running on*

timotheeguerin · 2017-09-22T00:19:07Z

dtde/clusterlib.py

@@ -80,24 +80,25 @@ def docker_run_cmd(docker_repo: str = None) -> str:
 def generate_cluster_start_task(
        cluster_id: str,
        zip_resource_file: batch_models.ResourceFile,
-        custom_script: str = None,
+        custom_scripts: list = None,


List[str] I think is better

jiata · 2017-09-22T05:44:07Z

this is not really what i had in mind from a design perspective

Instead of using the "IS_MASTER" environment variable and make users write logic in bash (ex. if Master Node), i was thinking more along the lines of having the user specify which is a master-node-only script and which is a every-node script.

example:
azb spark cluster create --id my_cluster \
--master-script custom-script-1.sh \
--worker-script custom-script-2.sh

(maybe "worker-script" isn't the best name)

Intuitively i think this is a better experience as i doesn't require users to know about the environment variable "IS_MASTER" and write logic around that in bash.

@paselem what do you think

jiata · 2017-09-22T05:46:16Z

also, should #60 be part of this feature? For many cases, custom script on master is not useful without the ability to open custom ports.

timotheeguerin · 2017-09-22T05:58:06Z

I think IS_MASTER give the user more flexibility to run a custom script however they might want.

If there is something that needs to setup on both worker and master with an extra setup on a master this is easily done this way.

You could maybe have both master-script(only on the master) and custom-script, which runs on every node but has access to the IS_MASTER for advanced configuration

jiata · 2017-09-22T06:23:32Z

Yeah, i like that idea.

but what are the scenarios you had in mind where you would need that flexibility of IS_MASTER?

Separating makes sense to me just in terms of the scenarios that i can think of:

ADLS, WASB, S3, custom package installs (run on all nodes)
livy, jupyter, spark magic, r-studio-server (only on master node)

paselem · 2017-09-22T15:43:43Z

@jiata - I was thinking more along the lines of what you specified except just always point to a directory. If there are files there, we would automatically run them. That is a pretty common practice for lots of over linux-like tooling and OS settings. We could have two directories, one for master scripts, one for non-master scripts. This would alleviate users from having to specify the if condition in the script which is really an 'us' problem.

Also, another issue that might come up is whether or not ordering matters. I'm not sure how someone would specify that and how we would guarantee it. Possibly register a collection of scripts in the config file or (as per Jacob's suggestion) just write an uber script that calls the other scripts in the order it wants.

jiata · 2017-09-23T00:21:22Z

[ignore this comment - this is handled in #60]

I think we should also add the ability to open custom ports (on cluster create) as part of this PR.

custom_scripts:
- script: livy_script.sh
  is_master: true
  ports: [8998]
- script: jupyter_script.sh
  is_master: true
  ports: [8888]

Something like this ^

jiata

what is the cmd line experience? is that defined at all? or do people have to use the cluster.yaml to run complex custom scripts

jiata · 2017-09-27T00:49:58Z

docs/11-custom-scripts.md

+You can specify the location of custom scripts on your local machine in `.thundebolt/cluster.yaml`. If you do not have a `.thunderbolt/` directory in you current working directory, run `azb spark init` or see [Getting Started](./00-getting-started). Note that the path can be absolute or relative to your current working directory.
+
+The custom scripts can be configured to run on the Spark master only, the Spark workers only, or all nodes in the cluster (Please note that by default, the Spark master node is also a Spark worker). For example, the following custom-script configuration will run 3 custom scripts:
+


maybe worth pointing out here that the scripts will be executed in the order that the user defines it in the cluster.yaml

I do that in the following paragraph, but it might be more clear to put that info here.

yeah i noticed - what you have below is useful i think. But i think it could be stated more explicitly, like "scripts will execute in the order that they are listed".

jafreck · 2017-09-27T01:05:13Z

The command line experience is broken as of the latest commit (this still needs to be fixed). I think it should probably be removed entirely and custom scripts should only be supported through cluster.yaml.

paselem · 2017-09-29T04:27:10Z

config/cluster.yaml

+# # optional custom scripts to run on the Spark master, Spark worker or all nodes in the cluster
+# custom_script: 
+#   - script: </path/to/script.sh or /path/to/script/directory/>
+#     location: <master/worker/all-nodes>


rename 'location' to 'runOn'

paselem · 2017-09-29T04:29:25Z

custom-scripts/simple.sh

+    echo "This is a custom script running on just the master!"
+fi
+
+echo "This is a custom script running all workers and the master!"


Maybe add that we currently always deploy a worker on the master node, so we do not support a 'worker only' install.

Nm... just saw docs below.

paselem · 2017-09-29T04:32:58Z

docs/11-custom-scripts.md

+      location: all-nodes
+```
+
+The above configuration takes the absolute path `/custom-scripts/` and upload every file within it. These files will all be executed, although order of exection is not guarenteed. If your custom scripts have dependencies, specify the order by providing the full path to the file as seen in the first example.


typo "upload" -> "uploads"

paselem · 2017-09-29T04:35:01Z

dtde/clusterlib.py

+    if custom_scripts is None:
+        return
+
+    os.mkdir(os.path.join(constants.ROOT_PATH, 'node_scripts', 'custom-scripts'))


If the directory exists, do we need to clean it up before copying files over? Can files from a previous cluster create still linger?

... Or is that what line 203 does? May be worth doing it here as a first step in case the process exists unexpectedly between lines 197 and 203 since it shouldn't harm anything and provides an extra guarantee.

I think we should actually clean up before we copy and after. This will remove lingering files from a previous create, copy over the expected files, upload them and then delete them.

The only way someone could have lingering files, though, is if the program crashed between 197 and 203, since 203 does remove all the files.

paselem · 2017-09-29T04:36:59Z

dtde/config.py

@@ -139,8 +139,8 @@ def _merge_dict(self, config):
        if 'password' in config and config['password'] is not None:
            self.password = config['password']

-        if 'custom_script' in config and config['custom_script'] is not None:
-            self.custom_script = config['custom_script']
+        if 'custom_scripts' in config and config['custom_scripts'] not in [[None], None]:


'custom_scripts' used several times. Should be a constant.

I don't understand what this means.

paselem · 2017-09-29T04:38:24Z

node_scripts/docker_main.sh

@@ -1,7 +1,7 @@
 #!/bin/bash

 # This file is the entry point of the docker container.
-# It will run the custom scripts if present and start spark.
+# It will setup WASB and start Spark.


worth commenting that it currently uses the same storage account as the one configured in the secrets.yaml (or conversely that the one in secrets.yaml is used for moving data around for this tool to work)

paselem · 2017-09-29T04:40:05Z

node_scripts/install/scripts.py

+    except FileNotFoundError as e:
+        print(e)
+    except IOError as e:
+        print(e)


final except catch all?

This error checking is just meant to ensure that the directory of scripts exists. However, we probably want to catch errors in _run_script() though so user script crashes are caught.

…/thunderbolt into feature/granular-scripts

jafreck added 3 commits September 21, 2017 09:59

added IS_MASTER environment variable, moved custom script execution t…

ae68185

…o python

allow for custom_scripts to check if executing on spark master

ded1a74

allow for multiple custom scripts

f2edc83

msftclas added the cla-not-required label Sep 22, 2017

jafreck added in progress and removed cla-not-required labels Sep 22, 2017

timotheeguerin changed the title ~~Feature/granular scripts~~ Feature: Support for multiple custom scripts and master only scrips Sep 22, 2017

timotheeguerin reviewed Sep 22, 2017

View reviewed changes

updated docs, fixed overwrite bug, more precise typing

c6a2162

jafreck added 8 commits September 25, 2017 08:56

Merge branch 'master' into feature/granular-scripts

945a6f1

added support for script ordering, consolidated file uploading

c9a27ec

rename custom script on upload to prevent conflicts

ac5b7a9

removed unnecessary import

4c720d7

added environment variable IS_MASTER back

a6ae965

white space

626912f

updated docs

2e7834e

updated docs

1193066

jiata reviewed Sep 27, 2017

View reviewed changes

jafreck added 2 commits September 26, 2017 18:12

fix pylint errors

46c3f41

undo previous commit error

e9561a2

Azure deleted a comment from msftclas Sep 27, 2017

clearer docs

ba1b3f3

jafreck added 2 commits September 27, 2017 10:29

removed cli custom script functionality, fixed no custom scirpts bug

35a21de

fixed pylint errors

6169426

paselem reviewed Sep 29, 2017

View reviewed changes

jafreck added 8 commits September 29, 2017 14:02

changed location to runOn, removed unused parameters, add error checking

7164f70

removed unused parameter

3538c12

added information about storage account

1aceaa1

remove unused functions to upload scripts

040bb0b

Merge branch 'master' into feature/granular-scripts

4afe3b9

merge

d7a003e

Merge branch 'feature/granular-scripts' of https://github.com/jafreck…

8257e3b

…/thunderbolt into feature/granular-scripts

chagned dtde to aztk

7fc70db

paselem approved these changes Oct 2, 2017

View reviewed changes

jafreck merged commit 37c5d87 into Azure:master Oct 2, 2017

jafreck removed the in progress label Oct 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Support for multiple custom scripts and master only scrips #93

Feature: Support for multiple custom scripts and master only scrips #93

jafreck commented Sep 22, 2017 •

edited

Loading

timotheeguerin Sep 22, 2017

jiata Sep 22, 2017

timotheeguerin Sep 22, 2017

jiata commented Sep 22, 2017

jiata commented Sep 22, 2017 •

edited

Loading

timotheeguerin commented Sep 22, 2017 •

edited

Loading

jiata commented Sep 22, 2017

paselem commented Sep 22, 2017

jiata commented Sep 23, 2017 •

edited

Loading

jiata left a comment

jiata Sep 27, 2017

jafreck Sep 27, 2017

jiata Sep 27, 2017

jafreck commented Sep 27, 2017

paselem Sep 29, 2017

paselem Sep 29, 2017

paselem Sep 29, 2017

paselem Sep 29, 2017

paselem Sep 29, 2017

paselem Sep 29, 2017

jafreck Sep 29, 2017 •

edited

Loading

paselem Sep 29, 2017

jafreck Sep 29, 2017

paselem Sep 29, 2017

paselem Sep 29, 2017

jafreck Sep 29, 2017

		You can specify the location of custom scripts on your local machine in `.thundebolt/cluster.yaml`. If you do not have a `.thunderbolt/` directory in you current working directory, run `azb spark init` or see [Getting Started](./00-getting-started). Note that the path can be absolute or relative to your current working directory.

		The custom scripts can be configured to run on the Spark master only, the Spark workers only, or all nodes in the cluster (Please note that by default, the Spark master node is also a Spark worker). For example, the following custom-script configuration will run 3 custom scripts:

Feature: Support for multiple custom scripts and master only scrips #93

Feature: Support for multiple custom scripts and master only scrips #93

Conversation

jafreck commented Sep 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiata commented Sep 22, 2017

jiata commented Sep 22, 2017 • edited Loading

timotheeguerin commented Sep 22, 2017 • edited Loading

jiata commented Sep 22, 2017

paselem commented Sep 22, 2017

jiata commented Sep 23, 2017 • edited Loading

jiata left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jafreck commented Sep 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jafreck Sep 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jafreck commented Sep 22, 2017 •

edited

Loading

jiata commented Sep 22, 2017 •

edited

Loading

timotheeguerin commented Sep 22, 2017 •

edited

Loading

jiata commented Sep 23, 2017 •

edited

Loading

jafreck Sep 29, 2017 •

edited

Loading