-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards registering models with MLJ #20
Comments
Thank you for taking the time to review the repo. I will make all the changes to bring it to the MLJ model standard. I thought that the scitypes were incorrect, so thank you for the information on that too. |
You might find this new tool useful. You can use it to test your interfaces. Eventually you might want to include it in CI but for now it remains experimental. |
You should use the |
Thank you for sharing. This looks like a great addition. |
@ablaom Thank you again for reviewing the library. Here is a PR with some of the improvements that you suggested: #25
Unless I am missing something, all the classification methods should be able to handle multi class classification. I am a little confused on implementing the encoding of categorical variables. Because we pass the y vector to python, the model will be able to predict any class that is present in the training data.
This should now be the case. I added the casting to
This was incorrect. I have fixed the target scitype.
The machines should work with table/matrix inputs for the features. I will add some examples with this.
I updated the docstrings for the classifiers (LogisticRegression,...). Let me know what you think. A side note, if you do try to install it, there is a problem with micromamba installing rapids right now. The workaround is setting the solver as mamba by setting the following env variable: |
@tylerjthomas9 Thanks indeed for this update. Currently swamped with reviews and JuliaCon prep, but I will return to this in due course. |
Having trouble getting Any ideas?
|
I'm using the version on |
I am trying to narrow down this issue. I can build it without issues on one of my computers, but I get the same error as you on a different computer. I'll let you know as soon as I can get the builds working everywhere again. It seems that something broke right after i published v0.1.0, and building the environment has become less consistent. |
Yeah, I can appreciate this kind of multi-language GPU installation is very tricky. I will wait for you to sort this out before proceeding with a review. |
The nightly builds of julia should contain the GCC updates required for the most recent versions of RAPIDS (JuliaLang/julia#45582 (comment)). Additionally, the micromamba build bug has been fixed. I am going to start development on this library again now that these roadblocks are clearing up. I will let you know when it's ready for review. |
@ablaom I think that the package is in a good place for review. You will need to use a nightly julia build to run it. Let me know if you have any questions. |
Awesome. Will 1.8.3 (currently release-1.8) suffice? If so I might wait for that. |
I think the backport is scheduled to come with 1.8.4. We can circle back when that comes out. |
Yes. I know this is frustrating, but I prefer to prioritise review to interfaces that will work with current Julia. Hang in there. |
Sounds good. I'll ping you when it's ready. |
Hey @ablaom, I put a little time into getting this library to a similar level as CatBoost.jl. I think that it is in a good spot for a review when you have the capacity. There is absolutely no rush. I want to eventually move away from Python and wrap the C++/CUDA code directly, but I won't have the capacity to tackle this transition for a while. Because of this eventual transition at some point in the future, I think that this would be a good addition to the third-party MLJ models. |
Okay, I'm having trouble because of this error: "ERROR: LoadError: InitError: IOError: write: no space left on device (ENOSPC)" I've checked by disk and there is enough space. I understand this may be tricky to diagnose. Two facts lead me to kick the can on this one: my old GPU is to be replaced; after next week I am between contracts and I'm going to loose access. So I'll try again when the dust settles. Thanks for your patience. |
I have never seen that error before. If it is a pre-pascal GPU, then it might not work. There is absolutely no rush. |
Further to JuliaAI/MLJModels.jl#454, I have been asked to provide feedback on the implementation of the MLJ model interface package for models provided by RAPIDS.jl, with a view to registering the models in MLJ's model registry.
First, it's clear that a lot of effort has been put into understanding and implementing the interface, which is really awesome. I'm pretty optimistic we can close the necessary gaps to get this across the finish line.
I have not had an opportunity for a thorough review but wanted to get the ball rolling with a few comments. Only the first is more serious.
This scitype declaration for a classifier is not appropriate for an MLJ model, as users provide targets for classification in the form of categorical vectors (element scitype
Finite
, which includesOrderedFactor
- for ordered categoricals - andMulticlass
- for unordered ones). So, for classification, thetarget_scitype
you typically have isAbstractVector{<:Finite}
unless it's binary only, which would beAbstractVector{<:Finite{2}}
. Furthermore, one needs to take into account all the classes in the pool of the target, not just those manifest as entries seen in the training vector. This is so that the target youpredict
includes all classes. (See this discussion). This generally means re-encoding the data using the categorical reference integers for passing to the core model, but also somehow recording the full pool and a method to invert the conversion in the prediction phase. This is discussed in the manual but there are also lots of existing implementations to adapt, for example this DecisionTreeClassifier for probablistic predictors, and this SVM model for deterministic (point) predictions. 2.If your clustering models have a
predict
, then they should predict categorical vectors. I didn't check this.The same scitype declaration referenced in 1. also allows a
Table
type and I wonder if that was intenentional. Tabular targets are only used in multitarget models (one column per target). Is this actually supported in the current case?A lot of
model_metadata
entries in this pkg indicate that you can use tables and matrices for input but the examples in the doc-strings only use matrices. It's fine if your model indeed can handle both (is that the case?) but generally MLJ users are used to inputs being tabular (wrapping matrices as tables if necessary) so probably the doc-strings should include tabular examples. See the DecisioinTreeClassifier docstring for examples.This is probably an issue you want to address last, but MLJ now has a hard requirement for model docstrings, which must comply to this standard
At this early stage, you may just want to comment on these. If you do want to work on new commits to address them, can you please put them in unmerged PR, so I can more easily comment on any changes.
Happy to take a call if you think that would be efficient at this early stage of the review.
The text was updated successfully, but these errors were encountered: