Skip to content

Fixing alpha_grid for sparse matrices #2788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

MechCoder
Copy link
Member

Fixes #2526 partially.

Centering of the sparse matrices is done indirectly. The correct output and marginally faster than in master, though still much slower then than dense matrices.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 8f3991d on Manoj-Kumar-S:LassoCV_sparse_bug into f8e7cf1 on scikit-learn:master.

@MechCoder
Copy link
Member Author

cc @agramfort Please review!

@agramfort
Copy link
Member

looks good at first sight

can you bench the speed and that it fixes the problem reported in #2526 ?

how much slower is it? is it also slower with a really sparse problem?

@MechCoder
Copy link
Member Author

@agramfort Fitting the correct alpha_grid doesn't resolve the problem :( I think the slowdown it due to the sparse_enet_coordinate_descent. not exactly crossvalidation because I'm getting the same results with ElasticNet and Lasso

Speed profile

In [1]: from sklearn.linear_model import *

In [2]: from sklearn.linear_model.tests.test_sparse_coordinate_descent import make_sparse_data

In [3]: X, y = make_sparse_data()

In master:

In [4]: %timeit LassoCV().fit(X, y)
1 loops, best of 3: 4.73 s per loop

In [5]: %timeit ElasticNetCV().fit(X, y)
1 loops, best of 3: 4.97 s per loop

In [6]: %timeit LassoCV().fit(X.todense(), y)
10 loops, best of 3: 178 ms per loop

In [7]: %timeit ElasticNetCV().fit(X.todense(), y)
1 loops, best of 3: 227 ms per loop

In my branch:

In [4]: %timeit LassoCV().fit(X, y)
1 loops, best of 3: 4.64 s per loop

In [5]: %timeit ElasticNetCV().fit(X, y)
1 loops, best of 3: 4.81 s per loop

In [6]: %timeit LassoCV().fit(X.todense(), y)
10 loops, best of 3: 176 ms per loop

In [7]: %timeit ElasticNetCV().fit(X.todense(), y)
1 loops, best of 3: 224 ms per loop

See ElasticNet() & Lasso profiling for dense and sparse data.

In [4]: %timeit Lasso().fit(X, y)
1 loops, best of 3: 36.2ms per loop

In [5]: %timeit Lasso().fit(X.todense(), y)
100 loops, best of 3: 1.96 ms per loop

In [6]: %timeit ElasticNet().fit(X, y)
10 loops, best of 3: 35.9 ms per loop

In [7]: %timeit ElasticNet().fit(X.todense(), y)
1000 loops, best of 3: 1.96 ms per loop

Hope this convinces you that, it is not due to the centering alone. It maybe due to the sparse and dense data. But still this PR can be merged because it fixes the alpha_grid. (I've added a test for that)

if Xy is None:
X_sparse = False
if sparse.isspmatrix(X):
X_sparse = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X_sparse = sparse.isspmatrix(X)

is enough rather than these 3 lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats embarassing. Sorry.

@agramfort
Copy link
Member

it would be great if you could profile the code to check where the time is spent.

I would also try to bench using a really sparse X matrix as if X is large and super sparse
it should be faster. Or we have a problem...

@MechCoder
Copy link
Member Author

@agramfort : There seems to be a bug for normalize=True . The alpha_grid is fit perfectly, however the coefs are horribly wrong. Any pointers?

@MechCoder
Copy link
Member Author

@agramfort (Seems the cd solver is slow)
Sparse Case: X = sparse_matrix(100, 100)

In _alpha_grid 0.00639009475708
cv = 3, n_alphas = 100
1]
Average time in cd solver(1) 0.00587473154068
In enet_path 0.587656021118
2]
Average time in cd solver(2) 0.0057221198082
In enet_path 0.572408914566
3] 
Average time in cd solver(3) 0.0059232711792
In enet_path 0.592495918274
In path residuals 1.77350306511
Average cd after cross-validation 0.0108821392059

dense
In _alpha_grid 0.000421047210693
cv=3, n_alphas=100
Average cd 0.000321390628815
In path 0.0323050022125
Average cd 0.000257239341736
In path 0.0258841514587
Average cd 0.000246739387512
In path 0.0248351097107
In path residuals 0.0868258476257
Average cd after crossvalidation 0.00113892555237

@agramfort
Copy link
Member

ok so sparse CD is actually faster when X is really a sparse matrix. good !

@MechCoder
Copy link
Member Author

@agramfort : No it isn't it is almost ten times slower. according to the profile.

@agramfort
Copy link
Member

No it isn't it is almost ten times slower. according to the profile.

can you test with a real problemY? ie. Thousands for features and hundreds of samples. Even text data if the dense case still fits in memory?

@MechCoder
Copy link
Member Author

@agramfort : You are right. I tested it with X = (20000, 500) with 61762 elements. Sparse CD takes 46.7s and CD takes 57.2s. This can be merged.

# X can be touched inplace thanks to the above line
X, y, _, _, _ = center_data(X, y, fit_intercept,
X, y, _, _, X_std = center_data(X, y, fit_intercept,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the X_std with dense data.

@agramfort
Copy link
Member

Lasso and ElasticNet are more relevant in the case of n_features >> n_samples ie. X.shape = (500, 20000)

can you test this also?

a good test would also be on text data where X is super sparse and huge. You would be able to spot unnecessary memory allocations and identify bottlenecks

@MechCoder
Copy link
Member Author

@agramfort : Well I did it for (500, 20000). https://gist.github.com/Manoj-Kumar-S/8631580

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 4ed405f on Manoj-Kumar-S:LassoCV_sparse_bug into 6a46c8a on scikit-learn:master.

@MechCoder
Copy link
Member Author

@agramfort : I also tried for the to_dense case using the fetch 20 newsgroups . for around 1k features, my computer became unresponsive.

@agramfort
Copy link
Member

what is weird is that on this sparse X and y ElasticNet is faster on sparse
while ElasticNetCV is slower. It means there is something to understand
don't you think?
it's not the solver that is slower. Can you profile?

@MechCoder
Copy link
Member Author

@agramfort I profiled for cv = 3

First Case: X is dense, normal coordinate descent

In alpha grid 0.143427848816
In cd solver 56.0662539005
In enet path 56.1728839874
Calculating mses 1.29331803322
In path residuals 57.8905909061

In cd solver 55.6648950577
In enet path 55.771791935
Calculating mses 1.2891061306
In path residuals 57.4759218693

In cd solver 56.3729310036
In enet path 56.4804489613
Calculating mses 1.28896212578
In path residuals 58.1781880856

Final model (for one alpha)
In cd solver 0.148977041245
In enet path 0.352689027786

Second case: X is sparse

In alpha grid 0.170869112015
In cd solver 124.609766006
In enet path 124.610249043
Calculating mses 0.0117599964142
In path residuals 124.682076931

In cd solver 123.469653845
In enet path 123.470156193 
Calculating mses 0.0115718841553
In path residuals 123.542657852

In cd solver 120.946384907
In enet path 120.946867943
Calculating mses 0.0116329193115
In path residuals 121.018409967

FInal model (for one alpha)
In cd solver 0.565122842789
In enet path 0.565711021423

@MechCoder
Copy link
Member Author

Also do check this https://gist.github.com/Manoj-Kumar-S/8633769

EDIT: Time spent in cd solver for each alpha

@agramfort
Copy link
Member

manoj you do like my students :)

I ask them to do something and come back to me with numbers. I am much more
interested on the conclusions and take home message :)

@MechCoder
Copy link
Member Author

@agramfort : The numbers are just to prove a point :) . From the profile, I suppose it can be assumed that the time taken for sparse cd is greater then that of just cd. isn't it?

@agramfort
Copy link
Member

run this

from sklearn.datasets.samples_generator import make_regression
from scipy import sparse
from sklearn.linear_model import ElasticNetCV, ElasticNet

# clf = ElasticNetCV(max_iter=2000, n_alphas=5, eps=1e-2, cv=3, fit_intercept=False,
#                    normalize=False)
clf = ElasticNet(max_iter=2000)

X, y = make_regression(n_samples=500, n_features=10000, random_state=0)
X[X < 2.5] = 0

X_sp = sparse.csc_matrix(X)
# X_sp = sparse.coo_matrix(X)
%timeit clf.fit(X_sp,y)
%timeit clf.fit(X, y)

1 loops, best of 3: 409 ms per loop
10 loops, best of 3: 161 ms per loop

now replace ElasticNet by ElasticNetCV and the opposite happens. Why?

@agramfort
Copy link
Member

my bad... with ElasticNetCV I get:

1 loops, best of 3: 4.72 s per loop
1 loops, best of 3: 2.1 s per loop

so the question is why it's slower although there is less computation to do?

@agramfort
Copy link
Member

the answer seem to be that the sparse CD does some random accesses + does not use blas calls...

@GaelVaroquaux
Copy link
Member

so the question is why it's slower although there is less computation to do?

Crappy access patterns to the memory?

@MechCoder
Copy link
Member Author

So the bottomline is fitting the alpha grid properly doesn't speed up anything much right?

@agramfort
Copy link
Member

ping @agramfort : I had a look at the gist and enet_coordinate_descent and I think I understood what it does.

great

Can I work on this as part of Google Summer of Code?

that could definitely be part of a project.

Are there "strong rules" for Elastic Net too or is it just limited to Lasso? I suppose this would work (according to the gist) only with LassoCV where a grid of alphas are given.

yes they exist for E-Net.

In line 27 of the gist you had given me, should bb be initialized to zero before the first iteration or else it becomes initialized to zero for every iteration, which would give the same result.

@fabianp any chance you can have a look?

Gael mentioned "Better strategy to choose coordinates in the coordinate descent." .
I would like to work on this too as part of GSoC,

the idea is to avoid the loop of features but to experiment with
randomized choice of the next coordinate to update.

I understand what the present coordinate descent does, but some resources on how to improvise would be good. I had a really quick look at the paper given in the wiki but it seems that all algorithms are for L1-regularised Logistic Regression and and none of them are for Linear Regression.

indeed... @mblondel should really pitch in here.

@mblondel what is your take on porting some of lightning into sklearn?

@MechCoder
Copy link
Member Author

@agramfort
"yes they exist for E-Net"
I just googled "strong rules for Elastic Net" and nothing worthwhile turned up, It would be helpful, if you could provide me with a link or two.

"the idea is to avoid the loop of features but to experiment with
randomized choice of the next coordinate to update."
I suppose there will be some heuristic for finding the next random coordinate, again a link or two would be really helpful.

@agramfort
Copy link
Member

@agramfort https://github.com/agramfort
"yes they exist for E-Net"
I just googled "strong rules for Elastic Net" and nothing worthwhile
turned up, It would be helpful, if you could provide me with a link or two.

E-Net can be seen as a Lasso with a design X that is np.concatenate((X,
cst_np.eye(n_features), axis=0) where cst is a scalar depending on alpha_(1

  • l1_ratio)

"the idea is to avoid the loop of features but to experiment with
randomized choice of the next coordinate to update."
I suppose there will be some heuristic for finding the next random
coordinate, again a link or two would be really helpful.

I would have to search the literature too.

A

@MechCoder
Copy link
Member Author

I managed to have a detailed look at the research paper (http://www-stat.stanford.edu/~jbien/jrssb2011strong.pdf) , I again have a few questions.

  1. The shrinkage function in the gist is similar to the enet_coordinate_descent function, Do you think we need to have an 1D numpy array argument in enet function, which would specify the features to iterate over?
  2. The algorithm described in the gist seems to be
    a] Find "current active set" corresponding to the present alpha..
    b] Solve for the coefficients using the "active set" corresponding to the previous alpha
    c] Iterate over the current active set, and append to the active set if there any KKT violations
    d] Solve for the active set again.
    e] Iterate over n_features and append to the active set if there are any KKT violations,
    f] Solve for the active set again.

According to the paper, shouldn't it be strong_active_set everywhere? i.e
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "current active set" corresponding to the previous alpha
c] Iterate over the current active set, and if there any KKT violations solve again.
e] Iterate over n_features and append to the strong_active set if there are any KKT violations,
f] Solve for the strong active set again.

Is it because we are assuming the non-zero coefficients are same across all alphas, but that might not make sense right?

  1. In https://github.com/mblondel/lightning CD Regressor, I did not have a look at the source code yet, I will next week. But what type of co-ordinate descent is used? In the documentation it is mentioned as block co-ordinate descent. Is this what we are looking for (randomised)?

@MechCoder
Copy link
Member Author

@mblondel : Would you please be able to comment on this, about removing Liblinear and porting lightning for Logistic Regression? I would like to do this as part of my project.
Thanks.

@mblondel
Copy link
Member

mblondel commented Feb 7, 2014

liblinear implements three solvers for logistic regression:

  1. a primal solver for L2 regularization using TRON (this uses second-order info)
  2. a dual solver for L2 regularization
  3. a primal solver for L1 regularization using a mix of coordinate descent and glmnet

For 1) you could start from @GaelVaroquaux 's branch (https://github.com/GaelVaroquaux/scikit-learn/blob/logistic_cv/sklearn/linear_model/logistic.py). The nice thing is that the Hessian doesn't need to be entirely materialized, you just need to implement the Hessian vector product. lightning only supports coodinate descent, which is usually not preferred for L2 regularization.

For 2) you could do a straight port of the C++ to Cython. lightning doesn't support this solver at all.

For 3), you could reuse lightning's code but lightning uses coordinate descent only (not the mix of CD and glmnet) and so might be slower.

I'm not sure whether working on strong rules is a good idea. This is a difficult project for little gains.

I am not very enthusiastic about merging a lot of stuff from lightning. I spent a lot of time on it and I don't want it to be rendered obsolete. Now if you only take the L1 logistic regression solver, that's fine.

Last but not least, I'll be increasingly busy this year so I can't mentor this (I can of course answer occasional questions).

@agramfort
Copy link
Member

The shrinkage function in the gist is similar to the
enet_coordinate_descent function, Do you think we need to have an 1D numpy
array argument in enet function, which would specify the features to
iterate over?

exactly

The algorithm described in the gist seems to be
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "active set" corresponding to
the previous alpha
c] Iterate over the current active set, and append to the active set
if there any KKT violations
d] Solve for the active set again.
e] Iterate over n_features and append to the active set if there are
any KKT violations,
f] Solve for the active set again.

According to the paper, shouldn't it be strong_active_set everywhere? i.e
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "current active set" corresponding
to the previous alpha
c] Iterate over the current active set, and if there any KKT violations
solve again.
e] Iterate over n_features and append to the strong_active set if there
are any KKT violations,
f] Solve for the strong active set again.

yes this sounds good to me.

Is it because we are assuming the non-zero coefficients are same across
all alphas, but that might not make sense right?

the active set will change with alpha.

@agramfort
Copy link
Member

liblinear implements three solvers for logistic regression:

  1. a primal solver for L2 regularization using TRON (this uses
    second-order info)
  2. a dual solver for L2 regularization
  3. a primal solver for L1 regularization using a mix of coordinate descent
    and glmnet

For 1) you could start from @GaelVaroquauxhttps://github.com/GaelVaroquaux's branch (
https://github.com/GaelVaroquaux/scikit-learn/blob/logistic_cv/sklearn/linear_model/logistic.py).
The nice thing is that the Hessian doesn't need to be entirely
materialized, you just need to implement the Hessian vector product.
lightning only supports coodinate descent, which is usually not preferred
for L2 regularization.

For 2) you could do a straight port of the C++ to Cython. lightning
doesn't support this solver at all.

For 3), you could reuse lightning's code but lightning uses coordinate
descent only (not the mix of CD and glmnet) and so might be slower.

I'm not sure whether working on strong rules is a good idea. This is a
difficult project for little gains.

I am not very enthusiastic about merging a lot of stuff from lightning.

I fully understand.

I spent a lot of time on it and I don't want it to be rendered obsolete.
Now if you only take the L1 logistic regression solver, that's fine.

a project "logistic regression on steroids" that finishes:

  • the PR from Gael with L2 LogisticRegressionCV using TRON or l-bfgs
  • adds support for multinomial LR as started by @larsmans
  • implements L1+L2 LR using coordinate descent inspired by lightning

could fly for me.

Adding strong rules to Lasso and ElasticNet should not be that hard anymore
with the refactoring from last year.

what do others think

ping @GaelVaroquaux @fabianp @larsmans ?

@MechCoder
Copy link
Member Author

@agramfort I definitely think I would be able to do the strong rules. But my fear is that it might break some peoples code since some coordinates are discarded on the basis of the rules. May be we can have an option strong=True in the init?

As for the remaining I haven't yet had a deep look at the Logistic Regression codebase yet, (I will this week). I can happily finish Gael's and @larsmans 's work as part of GSoC (if they allow me too :) )
Waiting for their opinions

@agramfort
Copy link
Member

@agramfort I definitely think I would be able to do the strong rules. But my fear is that it might break some peoples code since some coordinates are discarded on the basis of the rules. May be we can have an option strong=True in the init?

the solution with strong rules should be exactly the same (up to
numerical errors)
it's just a trick to avoid looping over coordinates that cannot be active.

As for the remaining I haven't yet had a deep look at the Logistic Regression codebase yet, (I will this week). I can happily finish Gael's and @larsmans 's work as part of GSoC (if they allow me too :) )
Waiting for their opinions

@GaelVaroquaux and @larsmans ?

@larsmans
Copy link
Member

larsmans commented Feb 8, 2014

+1. I guess my work could best be merged into Gaël's.

@GaelVaroquaux
Copy link
Member

On Fri, Feb 07, 2014 at 11:58:09AM -0800, Alexandre Gramfort wrote:

  • the PR from Gael with L2 LogisticRegressionCV using TRON or l-bfgs

In my experience, tron isn't very useful, and it is quite costly in terms
of code.

  • adds support for multinomial LR as started by @larsmans
  • implements L1+L2 LR using coordinate descent inspired by lightning

could fly for me.

Me too.

Adding strong rules to Lasso and ElasticNet should not be that hard anymore
with the refactoring from last year.

I would really push for the strong rules. When I talked to Hastie about
what makes coordinate solvers for l1 problems fast, him seemed to think
that they were very important.

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux and @larsmans ?

I am absolutely happy to see my work being finished by somebody else :).
Especially if it is done well, ie readable and fast.

@GaelVaroquaux
Copy link
Member

@manoj-kumar-s : I just stumbled on a reference that might be useful to your GSOC proposal:
http://jmlr.org/papers/v13/yuan12a.html

Alsonn the Yuan 2010 reference mentioned in the above paper is relevant.

@mblondel
Copy link
Member

@manoj-kumar-s : I just stumbled on a reference that might be useful to your GSOC proposal:
http://jmlr.org/papers/v13/yuan12a.html

This is the CD + glmnet mix I mentioned above. This is the solver currently used by liblinear for L1-regularized logistic regression. However, it is fairly complicated. If you choose this solver over CD, I'd recommend a straight port of the C to Cython.

Alsonn the Yuan 2010 reference mentioned in the above paper is relevant.

lightning implements the CDN algo of this paper. But lightning also supports a variant without line search. The line search is tricky to implement and adds a lot of code for no speed benefit in my experience. The step sizes for the variant without line search are given in Table 10 of
http://arxiv.org/abs/1107.2848
(this is a very theoretical paper, you don't need to read it)

And implemented in lightning here:
https://github.com/mblondel/lightning/blob/master/lightning/impl/primal_cd_fast.pyx#L1219

(The variable n_vectors should be 1 for binary classification)

@MechCoder
Copy link
Member Author

@agramfort , @GaelVaroquaux Summing up, apart from Strong rules, these are the following ideas.

  1. L1 regularisation, primal() (CD / CD + glmnet)(lightning)
  2. L2 regularisation, primal() (from @GaelVaroquaux ' s branch')
  3. L2 regularisation, dual() (from LibLinear)
  4. L1 + L2 regularisation.

Multinomial LR
5. Completing the multinomial PR (L2- L-BFGS) by Larsman.
6. Multinomial L1 using coordinate descent.

Is this too ambitious, to do in one summer. Which are the 3-4 which have the most priority?

@GaelVaroquaux
Copy link
Member

I would add the easiest thing to try. Random order in coordinate descent. 

I don't think that what you propose is to ambitious. I terms of order I would stay by the random order then move on to l2 models then strong rules then the rest.  Make sure that you include parameter selection with regularization path.

Sounds exciting to me. 

-------- Original message --------
From: Manoj Kumar notifications@github.com
Date:13/02/2014 08:44 (GMT+01:00)
To: scikit-learn/scikit-learn scikit-learn@noreply.github.com
Cc: Gael Varoquaux gael.varoquaux@normalesup.org
Subject: Re: [scikit-learn] Fixing alpha_grid for sparse matrices (#2788)
@agramfort , @GaelVaroquaux Summing up, apart from Strong rules, these are the following ideas. 1. L1 regularisation, primal() (CD / CD + glmnet)(lightning) 2. L2 regularisation, primal() (from @GaelVaroquaux ' s branch') 3. L2 regularisation, dual() (from LibLinear) 4. L1 + L2 regularisation.

Multinomial LR
5. Completing the multinomial PR (L2- L-BFGS) by Larsman.
6. Multinomial L1 using coordinate descent.

Is this too ambitious, to do in one summer. Which are the 3-4 which have the most priority?


Reply to this email directly or view it on GitHub.

@mblondel
Copy link
Member

Just to clarify, CD + glmnet is not in lightning, only CD (with or without line search).

@MechCoder
Copy link
Member Author

@mblondel great.

@GaelVaroquaux In the randomised coordinate descent method, for every iteration, we randomly pick a feature to update instead of doing a transversal across all features (according to the wiki page). Is there anything more to this method?

@agramfort
Copy link
Member

+100 for what gael said

"""
I terms of order I would stay by the random order then move on to l2 models
then strong rules then the rest.
"""

doing great what is already started and ok is ambitious enough

@mblondel
Copy link
Member

Pure random coordinate selection (with replacement) will need a pure Python function call at every iteration to get the coordinate index. In lightning, I have backported the Cython interface of a subset of numpy.random. An easier solution is to use cyclic coordinate selection with permutation of the order before every outer iteration. This is similar to random but without replacement.

@MechCoder
Copy link
Member Author

@mblondel : Is there anything wrong with using rand?

For example:

from libc.stdlib cimport rand

for i in xrange(n_iterations):
    j = rand() % n_features
....

There will be modulo bias, but that can also be taken care of I think.

@mblondel
Copy link
Member

Indeed that's possible but you will need a clean way to handle random_state. You could also reuse some code by the tree growers:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L2361

@larsmans
Copy link
Member

The problems with rand are

  • its random state is a singleton which cannot be inspected
  • it gives different results on different operating systems
  • its implementation can be arbitrarily bad; IIRC, the Windows version gives only 16 bits of randomness.

(The replacement in the tree code has modulo bias, but that's not nearly as bad as the other problems with rand.)

@MechCoder
Copy link
Member Author

@larsmans Then i can use the tree code? rand_int?

@larsmans
Copy link
Member

Sure. Just copy it over for now; we can refactor into a pxd later.

(I planned to do a big refactor of np.random to make it reusable as a C library but never found the time to finish that.)

@MechCoder
Copy link
Member Author

@agramfort So finally my exams are over. I'll submit a first draft proposal in a few days. By the way coming back to what this PR was about does it make sense to profile the sparse_enet_coordinate_descent method to find where exactly the time is spent since it seems to be really slow.

clfs = LassoCV(max_iter=100, cv=10, normalize=normalize)
ignore_warnings(clfs.fit)(X, y)
clfd = LassoCV(max_iter=100, cv=10, normalize=normalize)
ignore_warnings(clfd.fit)(X.todense(), y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming late to the party, sorry for that.

Nice test, however @manoj-kumar-s could please avoid using X.todense() but rather use X.toarray() instead in the future? todense returns a numpy.matrix instance which has a very confusing matlab-like API and we rather want to only deal with numpy.ndarray instances throughout the code base.

Here this is just a test so this is not a big deal, but for the sake of consistency it's better to take good habits as early as possible.

@MechCoder
Copy link
Member Author

@ogrisel Thanks for the heads-up. !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LassoCV too slow with sparse matrix?
8 participants