Fixing alpha_grid for sparse matrices #2788

MechCoder · 2014-01-23T13:17:15Z

Fixes #2526 partially.

Centering of the sparse matrices is done indirectly. The correct output and marginally faster than in master, though still much slower then than dense matrices.

coveralls · 2014-01-23T13:40:25Z

Coverage remained the same when pulling 8f3991d on Manoj-Kumar-S:LassoCV_sparse_bug into f8e7cf1 on scikit-learn:master.

MechCoder · 2014-01-23T13:41:39Z

cc @agramfort Please review!

agramfort · 2014-01-23T21:40:20Z

looks good at first sight

can you bench the speed and that it fixes the problem reported in #2526 ?

how much slower is it? is it also slower with a really sparse problem?

MechCoder · 2014-01-24T12:31:24Z

@agramfort Fitting the correct alpha_grid doesn't resolve the problem :( I think the slowdown it due to the sparse_enet_coordinate_descent. not exactly crossvalidation because I'm getting the same results with ElasticNet and Lasso

Speed profile

In [1]: from sklearn.linear_model import *

In [2]: from sklearn.linear_model.tests.test_sparse_coordinate_descent import make_sparse_data

In [3]: X, y = make_sparse_data()

In master:

In [4]: %timeit LassoCV().fit(X, y)
1 loops, best of 3: 4.73 s per loop

In [5]: %timeit ElasticNetCV().fit(X, y)
1 loops, best of 3: 4.97 s per loop

In [6]: %timeit LassoCV().fit(X.todense(), y)
10 loops, best of 3: 178 ms per loop

In [7]: %timeit ElasticNetCV().fit(X.todense(), y)
1 loops, best of 3: 227 ms per loop

In my branch:

In [4]: %timeit LassoCV().fit(X, y)
1 loops, best of 3: 4.64 s per loop

In [5]: %timeit ElasticNetCV().fit(X, y)
1 loops, best of 3: 4.81 s per loop

In [6]: %timeit LassoCV().fit(X.todense(), y)
10 loops, best of 3: 176 ms per loop

In [7]: %timeit ElasticNetCV().fit(X.todense(), y)
1 loops, best of 3: 224 ms per loop

See ElasticNet() & Lasso profiling for dense and sparse data.

In [4]: %timeit Lasso().fit(X, y)
1 loops, best of 3: 36.2ms per loop

In [5]: %timeit Lasso().fit(X.todense(), y)
100 loops, best of 3: 1.96 ms per loop

In [6]: %timeit ElasticNet().fit(X, y)
10 loops, best of 3: 35.9 ms per loop

In [7]: %timeit ElasticNet().fit(X.todense(), y)
1000 loops, best of 3: 1.96 ms per loop

Hope this convinces you that, it is not due to the centering alone. It maybe due to the sparse and dense data. But still this PR can be merged because it fixes the alpha_grid. (I've added a test for that)

agramfort · 2014-01-24T13:11:36Z

sklearn/linear_model/coordinate_descent.py

    if Xy is None:
+        X_sparse = False
+        if sparse.isspmatrix(X):
+            X_sparse = True


X_sparse = sparse.isspmatrix(X)

is enough rather than these 3 lines

Thats embarassing. Sorry.

agramfort · 2014-01-24T13:16:26Z

it would be great if you could profile the code to check where the time is spent.

I would also try to bench using a really sparse X matrix as if X is large and super sparse
it should be faster. Or we have a problem...

MechCoder · 2014-01-24T21:08:25Z

@agramfort : There seems to be a bug for normalize=True . The alpha_grid is fit perfectly, however the coefs are horribly wrong. Any pointers?

MechCoder · 2014-01-25T08:33:29Z

@agramfort (Seems the cd solver is slow)
Sparse Case: X = sparse_matrix(100, 100)

In _alpha_grid 0.00639009475708
cv = 3, n_alphas = 100
1]
Average time in cd solver(1) 0.00587473154068
In enet_path 0.587656021118
2]
Average time in cd solver(2) 0.0057221198082
In enet_path 0.572408914566
3] 
Average time in cd solver(3) 0.0059232711792
In enet_path 0.592495918274
In path residuals 1.77350306511
Average cd after cross-validation 0.0108821392059

dense
In _alpha_grid 0.000421047210693
cv=3, n_alphas=100
Average cd 0.000321390628815
In path 0.0323050022125
Average cd 0.000257239341736
In path 0.0258841514587
Average cd 0.000246739387512
In path 0.0248351097107
In path residuals 0.0868258476257
Average cd after crossvalidation 0.00113892555237

agramfort · 2014-01-25T09:50:43Z

ok so sparse CD is actually faster when X is really a sparse matrix. good !

MechCoder · 2014-01-25T09:55:23Z

@agramfort : No it isn't it is almost ten times slower. according to the profile.

agramfort · 2014-01-25T10:25:54Z

No it isn't it is almost ten times slower. according to the profile.

can you test with a real problemY? ie. Thousands for features and hundreds of samples. Even text data if the dense case still fits in memory?

MechCoder · 2014-01-25T11:14:05Z

@agramfort : You are right. I tested it with X = (20000, 500) with 61762 elements. Sparse CD takes 46.7s and CD takes 57.2s. This can be merged.

agramfort · 2014-01-25T12:48:04Z

sklearn/linear_model/coordinate_descent.py

            # X can be touched inplace thanks to the above line
-            X, y, _, _, _ = center_data(X, y, fit_intercept,
+            X, y, _, _, X_std = center_data(X, y, fit_intercept,


I don't think you need the X_std with dense data.

agramfort · 2014-01-25T12:57:08Z

Lasso and ElasticNet are more relevant in the case of n_features >> n_samples ie. X.shape = (500, 20000)

can you test this also?

a good test would also be on text data where X is super sparse and huge. You would be able to spot unnecessary memory allocations and identify bottlenecks

MechCoder · 2014-01-26T11:54:27Z

@agramfort : Well I did it for (500, 20000). https://gist.github.com/Manoj-Kumar-S/8631580

coveralls · 2014-01-26T12:08:10Z

Coverage remained the same when pulling 4ed405f on Manoj-Kumar-S:LassoCV_sparse_bug into 6a46c8a on scikit-learn:master.

MechCoder · 2014-01-26T12:15:07Z

@agramfort : I also tried for the to_dense case using the fetch 20 newsgroups . for around 1k features, my computer became unresponsive.

agramfort · 2014-01-26T12:53:36Z

what is weird is that on this sparse X and y ElasticNet is faster on sparse
while ElasticNetCV is slower. It means there is something to understand
don't you think?
it's not the solver that is slower. Can you profile?

MechCoder · 2014-01-26T14:43:19Z

@agramfort I profiled for cv = 3

First Case: X is dense, normal coordinate descent

In alpha grid 0.143427848816
In cd solver 56.0662539005
In enet path 56.1728839874
Calculating mses 1.29331803322
In path residuals 57.8905909061

In cd solver 55.6648950577
In enet path 55.771791935
Calculating mses 1.2891061306
In path residuals 57.4759218693

In cd solver 56.3729310036
In enet path 56.4804489613
Calculating mses 1.28896212578
In path residuals 58.1781880856

Final model (for one alpha)
In cd solver 0.148977041245
In enet path 0.352689027786

Second case: X is sparse

In alpha grid 0.170869112015
In cd solver 124.609766006
In enet path 124.610249043
Calculating mses 0.0117599964142
In path residuals 124.682076931

In cd solver 123.469653845
In enet path 123.470156193 
Calculating mses 0.0115718841553
In path residuals 123.542657852

In cd solver 120.946384907
In enet path 120.946867943
Calculating mses 0.0116329193115
In path residuals 121.018409967

FInal model (for one alpha)
In cd solver 0.565122842789
In enet path 0.565711021423

MechCoder · 2014-01-26T14:44:01Z

Also do check this https://gist.github.com/Manoj-Kumar-S/8633769

EDIT: Time spent in cd solver for each alpha

agramfort · 2014-01-26T15:24:14Z

manoj you do like my students :)

I ask them to do something and come back to me with numbers. I am much more
interested on the conclusions and take home message :)

MechCoder · 2014-01-26T16:28:48Z

@agramfort : The numbers are just to prove a point :) . From the profile, I suppose it can be assumed that the time taken for sparse cd is greater then that of just cd. isn't it?

agramfort · 2014-01-26T16:55:03Z

run this

from sklearn.datasets.samples_generator import make_regression
from scipy import sparse
from sklearn.linear_model import ElasticNetCV, ElasticNet

# clf = ElasticNetCV(max_iter=2000, n_alphas=5, eps=1e-2, cv=3, fit_intercept=False,
#                    normalize=False)
clf = ElasticNet(max_iter=2000)

X, y = make_regression(n_samples=500, n_features=10000, random_state=0)
X[X < 2.5] = 0

X_sp = sparse.csc_matrix(X)
# X_sp = sparse.coo_matrix(X)
%timeit clf.fit(X_sp,y)
%timeit clf.fit(X, y)

1 loops, best of 3: 409 ms per loop
10 loops, best of 3: 161 ms per loop

now replace ElasticNet by ElasticNetCV and the opposite happens. Why?

agramfort · 2014-01-26T16:57:53Z

my bad... with ElasticNetCV I get:

1 loops, best of 3: 4.72 s per loop
1 loops, best of 3: 2.1 s per loop

so the question is why it's slower although there is less computation to do?

agramfort · 2014-01-26T17:00:31Z

the answer seem to be that the sparse CD does some random accesses + does not use blas calls...

GaelVaroquaux · 2014-01-26T17:00:52Z

so the question is why it's slower although there is less computation to do?

Crappy access patterns to the memory?

MechCoder · 2014-01-26T17:15:28Z

So the bottomline is fitting the alpha grid properly doesn't speed up anything much right?

agramfort · 2014-02-05T20:37:49Z

ping @agramfort : I had a look at the gist and enet_coordinate_descent and I think I understood what it does.

great

Can I work on this as part of Google Summer of Code?

that could definitely be part of a project.

Are there "strong rules" for Elastic Net too or is it just limited to Lasso? I suppose this would work (according to the gist) only with LassoCV where a grid of alphas are given.

yes they exist for E-Net.

In line 27 of the gist you had given me, should bb be initialized to zero before the first iteration or else it becomes initialized to zero for every iteration, which would give the same result.

@fabianp any chance you can have a look?

Gael mentioned "Better strategy to choose coordinates in the coordinate descent." .
I would like to work on this too as part of GSoC,

the idea is to avoid the loop of features but to experiment with
randomized choice of the next coordinate to update.

I understand what the present coordinate descent does, but some resources on how to improvise would be good. I had a really quick look at the paper given in the wiki but it seems that all algorithms are for L1-regularised Logistic Regression and and none of them are for Linear Regression.

indeed... @mblondel should really pitch in here.

@mblondel what is your take on porting some of lightning into sklearn?

MechCoder · 2014-02-05T21:08:58Z

@agramfort
"yes they exist for E-Net"
I just googled "strong rules for Elastic Net" and nothing worthwhile turned up, It would be helpful, if you could provide me with a link or two.

"the idea is to avoid the loop of features but to experiment with
randomized choice of the next coordinate to update."
I suppose there will be some heuristic for finding the next random coordinate, again a link or two would be really helpful.

agramfort · 2014-02-05T21:18:57Z

@agramfort https://github.com/agramfort
"yes they exist for E-Net"
I just googled "strong rules for Elastic Net" and nothing worthwhile
turned up, It would be helpful, if you could provide me with a link or two.

E-Net can be seen as a Lasso with a design X that is np.concatenate((X,
cst_np.eye(n_features), axis=0) where cst is a scalar depending on alpha_(1

l1_ratio)

"the idea is to avoid the loop of features but to experiment with
randomized choice of the next coordinate to update."
I suppose there will be some heuristic for finding the next random
coordinate, again a link or two would be really helpful.

I would have to search the literature too.

A

MechCoder · 2014-02-07T06:12:32Z

I managed to have a detailed look at the research paper (http://www-stat.stanford.edu/~jbien/jrssb2011strong.pdf) , I again have a few questions.

The shrinkage function in the gist is similar to the enet_coordinate_descent function, Do you think we need to have an 1D numpy array argument in enet function, which would specify the features to iterate over?
The algorithm described in the gist seems to be
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "active set" corresponding to the previous alpha
c] Iterate over the current active set, and append to the active set if there any KKT violations
d] Solve for the active set again.
e] Iterate over n_features and append to the active set if there are any KKT violations,
f] Solve for the active set again.

According to the paper, shouldn't it be strong_active_set everywhere? i.e
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "current active set" corresponding to the previous alpha
c] Iterate over the current active set, and if there any KKT violations solve again.
e] Iterate over n_features and append to the strong_active set if there are any KKT violations,
f] Solve for the strong active set again.

Is it because we are assuming the non-zero coefficients are same across all alphas, but that might not make sense right?

In https://github.com/mblondel/lightning CD Regressor, I did not have a look at the source code yet, I will next week. But what type of co-ordinate descent is used? In the documentation it is mentioned as block co-ordinate descent. Is this what we are looking for (randomised)?

MechCoder · 2014-02-07T11:47:39Z

@mblondel : Would you please be able to comment on this, about removing Liblinear and porting lightning for Logistic Regression? I would like to do this as part of my project.
Thanks.

mblondel · 2014-02-07T15:09:09Z

liblinear implements three solvers for logistic regression:

a primal solver for L2 regularization using TRON (this uses second-order info)
a dual solver for L2 regularization
a primal solver for L1 regularization using a mix of coordinate descent and glmnet

For 1) you could start from @GaelVaroquaux 's branch (https://github.com/GaelVaroquaux/scikit-learn/blob/logistic_cv/sklearn/linear_model/logistic.py). The nice thing is that the Hessian doesn't need to be entirely materialized, you just need to implement the Hessian vector product. lightning only supports coodinate descent, which is usually not preferred for L2 regularization.

For 2) you could do a straight port of the C++ to Cython. lightning doesn't support this solver at all.

For 3), you could reuse lightning's code but lightning uses coordinate descent only (not the mix of CD and glmnet) and so might be slower.

I'm not sure whether working on strong rules is a good idea. This is a difficult project for little gains.

I am not very enthusiastic about merging a lot of stuff from lightning. I spent a lot of time on it and I don't want it to be rendered obsolete. Now if you only take the L1 logistic regression solver, that's fine.

Last but not least, I'll be increasingly busy this year so I can't mentor this (I can of course answer occasional questions).

agramfort · 2014-02-07T19:50:37Z

The shrinkage function in the gist is similar to the
enet_coordinate_descent function, Do you think we need to have an 1D numpy
array argument in enet function, which would specify the features to
iterate over?

exactly

The algorithm described in the gist seems to be
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "active set" corresponding to
the previous alpha
c] Iterate over the current active set, and append to the active set
if there any KKT violations
d] Solve for the active set again.
e] Iterate over n_features and append to the active set if there are
any KKT violations,
f] Solve for the active set again.

According to the paper, shouldn't it be strong_active_set everywhere? i.e
a] Find "current active set" corresponding to the present alpha..
b] Solve for the coefficients using the "current active set" corresponding
to the previous alpha
c] Iterate over the current active set, and if there any KKT violations
solve again.
e] Iterate over n_features and append to the strong_active set if there
are any KKT violations,
f] Solve for the strong active set again.

yes this sounds good to me.

Is it because we are assuming the non-zero coefficients are same across
all alphas, but that might not make sense right?

the active set will change with alpha.

agramfort · 2014-02-07T19:58:06Z

liblinear implements three solvers for logistic regression:

a primal solver for L2 regularization using TRON (this uses
second-order info)

a dual solver for L2 regularization

a primal solver for L1 regularization using a mix of coordinate descent
and glmnet

For 1) you could start from @GaelVaroquauxhttps://github.com/GaelVaroquaux's branch (
https://github.com/GaelVaroquaux/scikit-learn/blob/logistic_cv/sklearn/linear_model/logistic.py).
The nice thing is that the Hessian doesn't need to be entirely
materialized, you just need to implement the Hessian vector product.
lightning only supports coodinate descent, which is usually not preferred
for L2 regularization.

For 2) you could do a straight port of the C++ to Cython. lightning
doesn't support this solver at all.

For 3), you could reuse lightning's code but lightning uses coordinate
descent only (not the mix of CD and glmnet) and so might be slower.

I'm not sure whether working on strong rules is a good idea. This is a
difficult project for little gains.

I am not very enthusiastic about merging a lot of stuff from lightning.

I fully understand.

I spent a lot of time on it and I don't want it to be rendered obsolete.
Now if you only take the L1 logistic regression solver, that's fine.

a project "logistic regression on steroids" that finishes:

the PR from Gael with L2 LogisticRegressionCV using TRON or l-bfgs
adds support for multinomial LR as started by @larsmans
implements L1+L2 LR using coordinate descent inspired by lightning

could fly for me.

Adding strong rules to Lasso and ElasticNet should not be that hard anymore
with the refactoring from last year.

what do others think

ping @GaelVaroquaux @fabianp @larsmans ?

MechCoder · 2014-02-08T15:02:12Z

@agramfort I definitely think I would be able to do the strong rules. But my fear is that it might break some peoples code since some coordinates are discarded on the basis of the rules. May be we can have an option strong=True in the init?

As for the remaining I haven't yet had a deep look at the Logistic Regression codebase yet, (I will this week). I can happily finish Gael's and @larsmans 's work as part of GSoC (if they allow me too :) )
Waiting for their opinions

agramfort · 2014-02-08T15:46:11Z

@agramfort I definitely think I would be able to do the strong rules. But my fear is that it might break some peoples code since some coordinates are discarded on the basis of the rules. May be we can have an option strong=True in the init?

the solution with strong rules should be exactly the same (up to
numerical errors)
it's just a trick to avoid looping over coordinates that cannot be active.

As for the remaining I haven't yet had a deep look at the Logistic Regression codebase yet, (I will this week). I can happily finish Gael's and @larsmans 's work as part of GSoC (if they allow me too :) )
Waiting for their opinions

@GaelVaroquaux and @larsmans ?

larsmans · 2014-02-08T15:54:10Z

+1. I guess my work could best be merged into Gaël's.

GaelVaroquaux · 2014-02-08T17:56:47Z

On Fri, Feb 07, 2014 at 11:58:09AM -0800, Alexandre Gramfort wrote:

the PR from Gael with L2 LogisticRegressionCV using TRON or l-bfgs

In my experience, tron isn't very useful, and it is quite costly in terms
of code.

adds support for multinomial LR as started by @larsmans

implements L1+L2 LR using coordinate descent inspired by lightning

could fly for me.

Me too.

Adding strong rules to Lasso and ElasticNet should not be that hard anymore
with the refactoring from last year.

I would really push for the strong rules. When I talked to Hastie about
what makes coordinate solvers for l1 problems fast, him seemed to think
that they were very important.

GaelVaroquaux · 2014-02-08T17:58:34Z

@GaelVaroquaux and @larsmans ?

I am absolutely happy to see my work being finished by somebody else :).
Especially if it is done well, ie readable and fast.

GaelVaroquaux · 2014-02-12T17:12:00Z

@manoj-kumar-s : I just stumbled on a reference that might be useful to your GSOC proposal:
http://jmlr.org/papers/v13/yuan12a.html

Alsonn the Yuan 2010 reference mentioned in the above paper is relevant.

mblondel · 2014-02-13T01:34:03Z

@manoj-kumar-s : I just stumbled on a reference that might be useful to your GSOC proposal:
http://jmlr.org/papers/v13/yuan12a.html

This is the CD + glmnet mix I mentioned above. This is the solver currently used by liblinear for L1-regularized logistic regression. However, it is fairly complicated. If you choose this solver over CD, I'd recommend a straight port of the C to Cython.

Alsonn the Yuan 2010 reference mentioned in the above paper is relevant.

lightning implements the CDN algo of this paper. But lightning also supports a variant without line search. The line search is tricky to implement and adds a lot of code for no speed benefit in my experience. The step sizes for the variant without line search are given in Table 10 of
http://arxiv.org/abs/1107.2848
(this is a very theoretical paper, you don't need to read it)

And implemented in lightning here:
https://github.com/mblondel/lightning/blob/master/lightning/impl/primal_cd_fast.pyx#L1219

(The variable n_vectors should be 1 for binary classification)

MechCoder · 2014-02-13T07:44:55Z

@agramfort , @GaelVaroquaux Summing up, apart from Strong rules, these are the following ideas.

L1 regularisation, primal() (CD / CD + glmnet)(lightning)
L2 regularisation, primal() (from @GaelVaroquaux ' s branch')
L2 regularisation, dual() (from LibLinear)
L1 + L2 regularisation.

Multinomial LR
5. Completing the multinomial PR (L2- L-BFGS) by Larsman.
6. Multinomial L1 using coordinate descent.

Is this too ambitious, to do in one summer. Which are the 3-4 which have the most priority?

GaelVaroquaux · 2014-02-13T08:19:59Z

I would add the easiest thing to try. Random order in coordinate descent.

I don't think that what you propose is to ambitious. I terms of order I would stay by the random order then move on to l2 models then strong rules then the rest. Make sure that you include parameter selection with regularization path.

Sounds exciting to me.

-------- Original message --------

From: Manoj Kumar notifications@github.com

Date:13/02/2014 08:44 (GMT+01:00)

To: scikit-learn/scikit-learn scikit-learn@noreply.github.com

Cc: Gael Varoquaux gael.varoquaux@normalesup.org

Subject: Re: [scikit-learn] Fixing alpha_grid for sparse matrices (#2788)

@agramfort , @GaelVaroquaux Summing up, apart from Strong rules, these are the following ideas. 1. L1 regularisation, primal() (CD / CD + glmnet)(lightning) 2. L2 regularisation, primal() (from @GaelVaroquaux ' s branch') 3. L2 regularisation, dual() (from LibLinear) 4. L1 + L2 regularisation.

Multinomial LR
5. Completing the multinomial PR (L2- L-BFGS) by Larsman.
6. Multinomial L1 using coordinate descent.

Is this too ambitious, to do in one summer. Which are the 3-4 which have the most priority?

—
Reply to this email directly or view it on GitHub.

mblondel · 2014-02-13T09:10:59Z

Just to clarify, CD + glmnet is not in lightning, only CD (with or without line search).

MechCoder · 2014-02-13T10:05:39Z

@mblondel great.

@GaelVaroquaux In the randomised coordinate descent method, for every iteration, we randomly pick a feature to update instead of doing a transversal across all features (according to the wiki page). Is there anything more to this method?

agramfort · 2014-02-13T15:25:23Z

+100 for what gael said

"""
I terms of order I would stay by the random order then move on to l2 models
then strong rules then the rest.
"""

doing great what is already started and ok is ambitious enough

mblondel · 2014-02-14T02:41:46Z

Pure random coordinate selection (with replacement) will need a pure Python function call at every iteration to get the coordinate index. In lightning, I have backported the Cython interface of a subset of numpy.random. An easier solution is to use cyclic coordinate selection with permutation of the order before every outer iteration. This is similar to random but without replacement.

MechCoder · 2014-02-14T06:17:55Z

@mblondel : Is there anything wrong with using rand?

For example:

from libc.stdlib cimport rand

for i in xrange(n_iterations):
    j = rand() % n_features
....

There will be modulo bias, but that can also be taken care of I think.

mblondel · 2014-02-14T06:50:53Z

Indeed that's possible but you will need a clean way to handle random_state. You could also reuse some code by the tree growers:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L2361

larsmans · 2014-02-14T11:20:12Z

The problems with rand are

its random state is a singleton which cannot be inspected
it gives different results on different operating systems
its implementation can be arbitrarily bad; IIRC, the Windows version gives only 16 bits of randomness.

(The replacement in the tree code has modulo bias, but that's not nearly as bad as the other problems with rand.)

MechCoder · 2014-02-14T11:26:37Z

@larsmans Then i can use the tree code? rand_int?

larsmans · 2014-02-14T13:10:59Z

Sure. Just copy it over for now; we can refactor into a pxd later.

(I planned to do a big refactor of np.random to make it reusable as a C library but never found the time to finish that.)

MechCoder · 2014-03-04T20:09:17Z

@agramfort So finally my exams are over. I'll submit a first draft proposal in a few days. By the way coming back to what this PR was about does it make sense to profile the sparse_enet_coordinate_descent method to find where exactly the time is spent since it seems to be really slow.

ogrisel · 2014-03-13T09:21:38Z

sklearn/linear_model/tests/test_sparse_coordinate_descent.py

+        clfs = LassoCV(max_iter=100, cv=10, normalize=normalize)
+        ignore_warnings(clfs.fit)(X, y)
+        clfd = LassoCV(max_iter=100, cv=10, normalize=normalize)
+        ignore_warnings(clfd.fit)(X.todense(), y)


Coming late to the party, sorry for that.

Nice test, however @manoj-kumar-s could please avoid using X.todense() but rather use X.toarray() instead in the future? todense returns a numpy.matrix instance which has a very confusing matlab-like API and we rather want to only deal with numpy.ndarray instances throughout the code base.

Here this is just a test so this is not a big deal, but for the sake of consistency it's better to take good habits as early as possible.

MechCoder · 2014-03-14T06:38:51Z

@ogrisel Thanks for the heads-up. !

Proper centering of alpha_grid for sparse matrices

8f3991d

agramfort reviewed Jan 24, 2014
View reviewed changes

agramfort reviewed Jan 25, 2014
View reviewed changes

FIX: Normalize=True

4ed405f

ogrisel reviewed Mar 13, 2014
View reviewed changes

MechCoder mentioned this pull request Sep 16, 2014

Remove intercept_ attribute? scikit-learn-contrib/lightning#12

Closed

Uh oh!

Fixing alpha_grid for sparse matrices #2788

Fixing alpha_grid for sparse matrices #2788

Uh oh!

Conversation

MechCoder commented Jan 23, 2014

Uh oh!

coveralls commented Jan 23, 2014

Uh oh!

MechCoder commented Jan 23, 2014

Uh oh!

agramfort commented Jan 23, 2014

Uh oh!

MechCoder commented Jan 24, 2014

Uh oh!

agramfort Jan 24, 2014

Choose a reason for hiding this comment

Uh oh!

MechCoder Jan 24, 2014

Choose a reason for hiding this comment

Uh oh!

agramfort commented Jan 24, 2014

Uh oh!

MechCoder commented Jan 24, 2014

Uh oh!

MechCoder commented Jan 25, 2014

Uh oh!

agramfort commented Jan 25, 2014

Uh oh!

MechCoder commented Jan 25, 2014

Uh oh!

agramfort commented Jan 25, 2014

Uh oh!

MechCoder commented Jan 25, 2014

Uh oh!

agramfort Jan 25, 2014

Choose a reason for hiding this comment

Uh oh!

agramfort commented Jan 25, 2014

Uh oh!

MechCoder commented Jan 26, 2014

Uh oh!

coveralls commented Jan 26, 2014

Uh oh!

MechCoder commented Jan 26, 2014

Uh oh!

agramfort commented Jan 26, 2014

Uh oh!

MechCoder commented Jan 26, 2014

Uh oh!

MechCoder commented Jan 26, 2014

Uh oh!

agramfort commented Jan 26, 2014

Uh oh!

MechCoder commented Jan 26, 2014

Uh oh!

agramfort commented Jan 26, 2014

Uh oh!

agramfort commented Jan 26, 2014

Uh oh!

agramfort commented Jan 26, 2014

Uh oh!

GaelVaroquaux commented Jan 26, 2014

Uh oh!

MechCoder commented Jan 26, 2014

Uh oh!

agramfort commented Feb 5, 2014

Uh oh!

MechCoder commented Feb 5, 2014

Uh oh!

agramfort commented Feb 5, 2014

Uh oh!

MechCoder commented Feb 7, 2014

Uh oh!

MechCoder commented Feb 7, 2014

Uh oh!

mblondel commented Feb 7, 2014

Uh oh!

agramfort commented Feb 7, 2014

Uh oh!