Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Checkpoint and restore connectors. #26253

Merged
merged 5 commits into from
Jul 9, 2022

Conversation

gjoliver
Copy link
Member

@gjoliver gjoliver commented Jul 1, 2022

Plus a couple of examples showing the usage of connector enabled policies.

Why are these changes needed?

Allow checkpoint and restore of connector pipelines.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [*] Unit tests
    • Release tests
    • This PR is not tested :(

@@ -341,6 +343,11 @@ def insert_before(self, name: str, connector: Connector):
raise ValueError(f"Can not find connector {name}")
self.connectors.insert(idx, connector)

print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -357,6 +364,11 @@ def insert_after(self, name: str, connector: Connector):
raise ValueError(f"Can not find connector {name}")
self.connectors.insert(idx + 1, connector)

print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -365,6 +377,11 @@ def prepend(self, connector: Connector):
"""
self.connectors.insert(0, connector)

print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -373,6 +390,11 @@ def append(self, connector: Connector):
"""
self.connectors.append(connector)

print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,118 @@
"""This example script shows how to load a connector enabled policy,
and adapt/use it with a different version of environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "the environment"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,54 @@
"""This example script shows how to load a connector enabled policy,
and adapt and use it with a different version of environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "the environment" :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment.
thanks.

@@ -740,8 +740,44 @@ def get_state(self) -> PolicyState:
# The current global timestep.
"global_timestep": self.global_timestep,
}
if self.config.get("enable_connectors", False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a one-line comment here? that we are adding the connector state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return state

@ExperimentalAPI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use PublicApi(alpha) here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

state: The new state to set this policy to. Can be
obtained by calling `self.get_state()`.
"""
# To avoid a circular dependency problem cause by SampleBatch.
Copy link
Contributor

@sven1977 sven1977 Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to future ourselves: We should move SampleBatch out of the policy folder. It doesn't belong there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, totally, SampleBatch should be a pretty low level util that depends on nothing.

self.agent_connectors = restore_connectors_for_policy(
self, connector_configs["agent"]
)
print("restoring agent connectors:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self, connector_configs["action"]
)
print("restoring action connectors:")
print(self.action_connectors.__str__(indentation=4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

else:
class_ = policy_cls
self[policy_id] = class_(observation_space, action_space, merged_config)
_class = get_tf_eager_cls_if_necessary(policy_cls, merged_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! thanks for cleaning this up and creating the utility function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR @gjoliver , thanks for the examples on connectors.
Just a few nits, then we can merge.

Oh, one last thing, we should add the new examples to BUILD!

@gjoliver
Copy link
Member Author

gjoliver commented Jul 7, 2022

ok, addressed all the comments. let's see if CI is happy.
completely understand your comments about logger.info(), was a bit hesitant because you often can't see log files on disks on Anyscale :).

Also added all the examples as unit tests.
I also added a multi-agent example, where we restore a TF PPO policy to train a new Torch SAC policy.

Jun Gong added 5 commits July 8, 2022 13:57
@richardliaw richardliaw merged commit 0c469e4 into ray-project:master Jul 9, 2022
truelegion47 pushed a commit to truelegion47/ray that referenced this pull request Jul 9, 2022
* master: (42 commits)
  [dashboard][2/2] Add endpoints to dashboard and dashboard_agent for liveness check of raylet and gcs (ray-project#26408)
  [Doc] Fix docs feedback button (ray-project#26402)
  [core][1/2] Improve liveness check in GCS  (ray-project#26405)
  [RLlib] Checkpoint and restore connectors. (ray-project#26253)
  [Workflow] Minor refactoring of workflow exceptions (ray-project#26398)
  [workflow] Workflow queue (ray-project#24697)
  [RLlib] Minor simplification of code. (ray-project#26312)
  [AIR] Update TensorflowPredictor to new API (ray-project#26215)
  [RLlib] Make Dataset reader default reader and enable CRR to use dataset (ray-project#26304)
  [runtime_env] [doc] Remove outdated info about "isolated" environment (ray-project#26314)
  [Doc] Fix rate-the-docs plugin (ray-project#26384)
  [Docs] [Serve] Has a consistent landing page style (ray-project#26029)
  [dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` (ray-project#26297)
  [tune] Use `Checkpoint.to_bytes()` for store_to_object (ray-project#25805)
  [tune] Fix `SyncerCallback` having a size limit (ray-project#26371)
  [air] Serialize additional files in dict checkpoints turned dir checkpoints (ray-project#26351)
  [Docs] Add "rate the docs" plugin for feedback on docs (ray-project#26330)
  [Doc] Fix actor example (ray-project#26381)
  Set RAY_USAGE_STATS_EXTRA_TAGS for release tests (ray-project#26366)
  [Datasets] Update docs for drop_columns and fix typos (ray-project#26317)
  ...
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants