Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLeap's value for sklearn #430

Open
mingmasplace opened this issue Oct 6, 2018 · 2 comments
Open

MLeap's value for sklearn #430

mingmasplace opened this issue Oct 6, 2018 · 2 comments

Comments

@mingmasplace
Copy link

MLeap solves the single-request low latency prediction problem for Spark pipeline. Quick test shows sklearn native pipeline.predict has pretty good latency < 3ms(sure it depends on the number of transforms). So why would people want to migrate the existing sklearn online prediction to MLeap? Thanks.

@ancasarb
Copy link
Member

ancasarb commented Oct 8, 2018

In our use case, we had to support model building/training not just in scikit-learn, but also in Spark and Tensorflow, so MLeap helped in this case, because at scoring time, you need to worry about monitoring and scalability of a single model scoring service. At the same time, with MLeap we expose a unified scoring interface, so clients which integrate with the scoring service don't need to know/worry about whether it's a Spark model they're using or scikit-learn etc. And this makes switching between models, with an A/B test for example, very easy.

Hope this helps!

@mingmasplace
Copy link
Author

Thanks Anca. So MLeap not only solves the latency issue with Spark, but also provides an unified online scoring service that is model-building ML framework agnostic. Still it isn't clear why that is a problem.

  • TF provides lots of DNN algorithms. Are you going to implement those in MLeap to get rid of the online scoring dependency on TF? If not, The scoring service will still have dependency on TF (which could be in a container).
  • Many have been using sklearn or TF for both model building and model scoring. Feature transformation latency doesn't seem to an issue on those framework. Seems people are fine with production support of scoring service built on top of sklearn and TF separately. So what are the issues this approach might have?
  • To evaluate different models, we just need to define a common scoring interface and leave the actual implementation to the containers that implement the interface, so you can have sklearn container, TF container, etc. Why do we need to use common MLeap pipeline inside the containers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants