Skip to content

Conversation

mylinyuzhi
Copy link

What do these changes do?

in java-worker, there are two ways to implement the ray.call: asm-rewrite and remote-lambda.
the asm-rewrite is not elegant and remote-lambda is too slow. And can not get the meta-information of the called function at runtime in the two solutions. e.g. @rayremote(xxx=yyy), at runtime we want to get the xxx=yyy.

the new solution: SerdeLambda( see org.ray.util.LambdaUtils). It has the following advantages:

  • support fo get the lambda' meta-information in runtime.
  • no need to rewrite the jar. Easy to develop and test.
  • one solutuion. the code is simpler
  • fast enough(see LambdaUtilsTest): in mac-air 1.6G I5
    • hit the lambda cache, parse the lambda spent 400us+, 80x than remote-lambda
    • miss the lambda cache , parse the lambda spent 2600us+, 200x than remote-lambda

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6025/
Test PASSed.

@pcmoritz pcmoritz changed the title <feature> : serde lambda Replace binary rewrite with Remote Lambda Cache (SerdeLambda) Jun 12, 2018
@pcmoritz pcmoritz changed the title Replace binary rewrite with Remote Lambda Cache (SerdeLambda) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) Jun 12, 2018
@pcmoritz pcmoritz self-assigned this Jun 12, 2018
CACHE = ThreadLocal.withInitial(() -> new WeakHashMap<>());

/**
* format A.B.C.cname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of this comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would clean it.


public final class MethodId {

/* use ThreadLocal to avoid lock. </br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unmatched
?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would clean it.

import org.ray.util.logger.RayLog;


public final class MethodId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small comment on what this class is doing? I.e. it seems to be similar to our FunctionID in Ray, except it also has a class associated with it? Also, let's document the fields methodDesc and encoding, it's not clear to me at the moment what they are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would add the doc

if (m != null) {
return m;
}
//it is a actor static func
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a space after the //

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would clean it.

}
Pair<ClassLoader, RayMethod> getMethod(UniqueID driverId, UniqueID actorId,
UniqueID methodId, String className) {
//assert the driver's resource is load
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after //

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would clean it.

public static Pair<Object, Object[]> unwrap(TaskSpec task, Method m, ClassLoader classLoader)
throws TaskExecutionException {
FunctionArg[] fargs = task.args;
//the last arg is className
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after //

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would clean it.

}

public static void unloadJars(ClassLoader loader) {
// TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say what is to be done here (or if nothing, remove the todo)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I would add the doc

@pcmoritz
Copy link
Contributor

Thanks, this is really great! Some high level questions:

  • How do performance numbers you cite compare to the old solution with the binary rewrite? (this solution is clearly superior but it would be good to know the difference)
  • Are there any changes in semantics for the user? The changes in [RFC] Improve Java API #2172 are orthogonal to this, right?

Also if functions are garbage collected from the WeakDictionary, they will be reloaded later, right? Where are the remote lambdas stored?

.gitignore Outdated
*.iml
java/test
java/*/target
java/target/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ignore java/test, it's a valid module.
java/test and java/*/target can be combined as java/**/target

}


public static SerializedLambda getSerializedLambda(Serializable lambda) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also use RayFunc as the parameter type here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MethodId is not only for RayFunc, it clould used for all lambda call. the check of RayFunc has add in entrance

* Note: the lambda instances are dynamically created per call site,
* we use WeakHashMap to avoid OOM.
*/
private static final ThreadLocal<WeakHashMap<Class<Serializable>, MethodId>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, RayFunc.
and a lot of other places in the rest of this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MethodId is not only for RayFunc, it clould used for all lambda call. the check of RayFunc has add in entrance

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MethodId is not only for RayFunc, it clould used for all lambda call. the check of RayFunc has add in entrance

this.encoding = encode(className, methodName, methodDesc, isStatic);
this.digest = getSha1Hash0();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

final SerializedLambda lambda = LambdaUtils.getSerializedLambda(serial);
Preconditions.checkArgument(lambda.getCapturedArgCount() == 0, "could not transfer a lambda "
+ "which is closure");
final boolean isstatic = lambda.getImplMethodKind() == MethodHandleInfo.REF_invokeStatic;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isStatic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@raulchen raulchen mentioned this pull request Jun 13, 2018
@raulchen
Copy link
Contributor

@pcmoritz

  1. The perf of this solution is slightly worse than binary rewrite, but should be totally good enough for prod (see commit message).
  2. If everybody is okay with this PR, we'll go on with option 1.
  3. Yes, If a function's GC-ed, it will be reloaded. (To be more specific, we're not caching the function, we're caching the metadata of the function).

@mylinyuzhi
Copy link
Author

performance

binary rewrite is fastest. the functioId has computed in rewrite. No time overhead for functioId get.
the serdeLambda need to compute the functioInd when miss cache. it need to parse the lambda and sha1 the MethodId. in my mac air, it need 2000+ ns. if hit the cache, it only need 20+ ns.
(time cost of System.currentTimeMillis is 50ns)
Actually, event if not hit the cache, 2000ns is not a bottleneck.
And, avoid the overhead of data trans#ission(600bytes) in remotelambda.

changes in semantics for the user

the serdeLambda didn't change the semantics for the user. the PR keep the Ray.call semantics. Do not conflict with this #2172

functions gc

the java func would not be gc, it only gc the MethodId. MethodId is a point to the called java function.
if miss the MethodId, the MethodId would be reconstruct.

what is the meaning of 'remote lambdas stored' ? I think, there is no need to store them.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6047/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/6051/
Test FAILed.

Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the changes!

@pcmoritz pcmoritz merged commit fa0ade2 into ray-project:master Jun 13, 2018
royf added a commit to royf/ray that referenced this pull request Jun 22, 2018
* 'master' of https://github.com/ray-project/ray: (157 commits)
  Fix build failure while using make -j1. Issue 2257 (ray-project#2279)
  Cast locator with index type (ray-project#2274)
  fixing zero length partitions (ray-project#2237)
  Make actor handles work in Python mode. (ray-project#2283)
  [xray] Add error table and push error messages to driver through node manager. (ray-project#2256)
  addressing comments (ray-project#2210)
  Re-enable some actor tests. (ray-project#2276)
  Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266)
  [xray] Sets good object manager defaults. (ray-project#2255)
  [tune] Update Trainable doc to expose interface (ray-project#2272)
  [rllib] Add a simple REST policy server and client example (ray-project#2232)
  [asv] Pushing to s3 (ray-project#2246)
  [rllib] Remove need to pass around registry (ray-project#2250)
  Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254)
  [rllib] Add squash_to_range model option (ray-project#2239)
  Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271)
  [rllib] Refactor Multi-GPU for PPO (ray-project#1646)
  [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170)
  [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942)
  [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants