Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPICS Eval 7 #86

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
af24de2
working in Ray 2.4
phile-caci Nov 3, 2023
18f003c
eval_7 updates
phile-caci Nov 6, 2023
defa76b
eval_7 updates
phile-caci Nov 6, 2023
ec016d2
Updates for the OPICS validation run.
Nov 8, 2023
0992c75
Merge branch 'eval-7-updates' into opics-eval-7
Nov 8, 2023
5653884
Merge branch 'eval-7-updates' into opics-eval-7
Nov 8, 2023
48433ba
OPICS pipeline: create scene history folder if needed.
Nov 8, 2023
6f5dbc7
Merge branch 'eval-7-cora-ray24' into opics-eval-7
Nov 9, 2023
8dc9dfd
MCS-1777 Delete ray object references so the autoscaler will stop idl…
Nov 13, 2023
261d66e
Merge branch 'eval-7-updates' into opics-eval-7
ThomasSchellenbergNextCentury Nov 13, 2023
dbed7a3
Removed validation test file.
ThomasSchellenbergNextCentury Nov 13, 2023
04b34c8
Updated the OPICS AMI for validation run 2
Nov 20, 2023
2f7e46b
Updated size of virtual display for OPICS pipeline.
ThomasSchellenbergNextCentury Dec 4, 2023
85590f6
Fixed error edge cases with the OPICS pipeline.
ThomasSchellenbergNextCentury Dec 4, 2023
636a760
OPICS Eval 7 submission
ThomasSchellenbergNextCentury Dec 6, 2023
8933b1b
Trying something with ray reference management
ThomasSchellenbergNextCentury Dec 6, 2023
d417752
Various fixes to the OPICS pipeline
ThomasSchellenbergNextCentury Dec 7, 2023
b6157c2
Moved apt-get commands in OPICS pipeline from deploy script to additi…
ThomasSchellenbergNextCentury Dec 13, 2023
312d84c
Merge branch 'main' into opics-eval-7
rartiss55 Dec 28, 2023
c7d6443
Fix outdated references from recent merge.
rartiss55 Dec 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions deploy_files/cora/ray_script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ echo Starting Evaluation for CORA:

# Adjust for where they hardcoded the scene file to be read from, might be different next collab/evaluation run
echo "Copy Scene Files:"
mkdir -p /home/ubuntu/scenes/evaluation_6
cd /home/ubuntu/scenes/evaluation_6 || exit
mkdir -p /home/ubuntu/scenes/evaluation_7
cd /home/ubuntu/scenes/evaluation_7 || exit
rm ./*
cp "$scene_file" .
echo $(basename "$scene_file")
Expand Down
26 changes: 16 additions & 10 deletions deploy_files/opics/ray_script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,20 @@ set -m
# This is what the "main_optics" command does (from the instructions TA1 gave us).
echo "OPICS Pipeline: Running TA1 environment setup..."
cd /home/ubuntu/ || exit
sudo nvidia-xconfig --use-display-device=None --virtual=600x400 --output-xconfig=/etc/X11/xorg.conf --busid=PCI:0:30:0
export OUR_XPID=2356
export DISPLAY=:0
# Ensure that the virtual display (monitor resolution) is much greater than 600x400
sudo nvidia-xconfig --use-display-device=Device0 --virtual=1280x1024 --output-xconfig=/etc/X11/xorg.conf --busid=PCI:0:30:0
export OUR_XPID=
export DISPLAY=:1
export OPTICS_HOME=~/main_optics
export PYTHONPATH=$OPTICS_HOME:$OPTICS_HOME/opics_common
export OPTICS_DATASTORE=ec2b
cd $OPTICS_HOME || exit
cd scripts/ || exit

# Start the X Server
# (Note that OPICS sets the DISPLAY to :1 -- Do NOT call start_x_server.sh)
sudo /usr/bin/Xorg :1 1>startx-out.txt 2>startx-err.txt &

# Check passed mcs_config and scene file
# shellcheck source=/dev/null
source /home/ubuntu/check_passed_variables.sh
Expand All @@ -22,6 +27,7 @@ echo "OPICS Pipeline: Running OPICS with MCS config file $mcs_configfile and eva

echo "OPICS Pipeline: Removing previous scene history files in $eval_dir/SCENE_HISTORY/"
rm -f "$eval_dir"/SCENE_HISTORY/*
mkdir -p "$eval_dir"/SCENE_HISTORY/

# shellcheck disable=SC2207
CONTAINER_DIRS=($(ls /home/ubuntu/test__* -d))
Expand All @@ -30,14 +36,15 @@ for CONTAINER_DIR in "${CONTAINER_DIRS[@]}"; do
rm -f "$CONTAINER_DIR"/scripts/SCENE_HISTORY/*
done

# Start X
source /home/ubuntu/start_x_server.sh

export MCS_CONFIG_FILE_PATH=$mcs_configfile
python opics_eval6_run_scene.py --scene "$scene_file"
python opics_eval7_run_scene.py --scene "$scene_file"
unset MCS_CONFIG_FILE_PATH

DEBUG=true

# Make sure to check for new container directories!
# shellcheck disable=SC2207
CONTAINER_DIRS=($(ls /home/ubuntu/test__* -d))
for CONTAINER_DIR in "${CONTAINER_DIRS[@]}"; do
if [ $DEBUG ]; then echo "OPICS Pipeline: Found container directory: $CONTAINER_DIR"; fi
HISTORY_DIR="$CONTAINER_DIR/scripts/SCENE_HISTORY/"
Expand All @@ -62,9 +69,6 @@ for CONTAINER_DIR in "${CONTAINER_DIRS[@]}"; do
done
done

sudo apt-get update
sudo apt-get install awscli -y

SCENE_NAME=$(sed -nE 's/.*"name": "(\w+)".*/\1/pi' "$scene_file")
DISAMBIGUATED_SCENE_NAME=$(basename "$scene_file" .json)

Expand All @@ -78,4 +82,6 @@ RENAMED_LOG=${eval_dir}/logs/${TEAM_NAME}_${SCENE_NAME}_stdout.log
mv "${TA1_LOG}" "${RENAMED_LOG}"

# Upload the mp4 video to S3 with credentials from the worker's AWS IAM role.
echo "OPICS Pipeline: Uploading ${TEAM_NAME}_${SCENE_NAME}_stdout.log"
aws s3 cp "${RENAMED_LOG}" s3://"${S3_BUCKET}"/"${S3_FOLDER}"/ --acl public-read
echo "OPICS Pipeline: Finished successfully"
2 changes: 2 additions & 0 deletions mako/templates/ray_template_aws.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ setup_commands:
- python3 -m pip install -U "ray[default]" boto3
- python3 -m pip install -U "ray[default]"==2.4.0
- python3 -m pip install -U gpustat==1.0.0
- python3 -m pip install boto3==1.18.9
- python3 -m pip install botocore==1.21.9
${additional_setup_commands}

# Command to start ray on the head node. You don't need to change this.
Expand Down
4 changes: 2 additions & 2 deletions mako/variables/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ idle_timeout_minutes: 5
additional_setup_commands: ""
additional_file_mounts: ""
cache_stopped_nodes: True
utilize_head_node_for_work: True
utilize_head_node_for_work: False
#Below is for mcs_config_template.ini
mcs_config_extra: ""
evaluation_bool: true
eval_name: eval_7
video_enabled: true
history_enabled: true
save_debug_images: false
s3_bucket: dev-evaluation-images
s3_bucket: evaluation-images
s3_folder: eval-resources-7
s3_movies_folder: raw-eval-7
timeout: 3600
Expand Down
7 changes: 4 additions & 3 deletions mako/variables/opics.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
cache_stopped_nodes: False
cluster: "opics"
filePrefix: "opics"
team: "opics"
region: us-east-1
ami: ami-0c3e21efcd9cc4ee3 #opics-eval-6-with-nvidia-hold-patch, us-east-1
# region: us-east-2
# ami: ami-0446e2290a501208f #opics-eval-6-with-nvidia-hold-patch, us-east-2
# ami: ami-05e48add22bae3c25 # opics-eval-7
ami: ami-0612af76abe080b4d # opics-eval-7-tools
additional_setup_commands: "- sudo apt-get update && sudo apt-get install awscli -y"
2 changes: 2 additions & 0 deletions ray_scripts/pipeline_ray.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,7 @@ def run_scenes(self):
self.incomplete_jobs = []
run_script = self.exec_config["MCS"]["run_script"]
eval_dir = self.exec_config["MCS"]["eval_dir"]

for scene_ref in self.scene_files_list:
# skip directories
if os.path.isdir(str(scene_ref)):
Expand All @@ -537,6 +538,7 @@ def run_scenes(self):

while self.incomplete_jobs:
finished_jobs, self.incomplete_jobs = ray.wait(self.incomplete_jobs)

for finished_job_id in finished_jobs:
# logging.info(f"finished job id: {finished_job_id}")
try:
Expand Down