Registered Member
|
Hi all,
I use the following method from the docs to do linear regression:
It works great, except when my pesky X matrix has singularities. Is there a way to get a least squares solution that deals with this in a sexy way? When running the same regressions in R (for example), it simply squashes one of the variables causing the Singularity. Thanks in advance! |
Moderator
|
It's already supposed to work that way. What do you get in such a case? What's the value of (X*res-y).borm()/y.norm() where res = X.jacobiSvd( ComputeThinU | ComputeThinV ).solve( y )
|
Registered Member
|
Thanks for the reply Gael.
I'm expecting coefficients on the order of 0.0001. I have two columns that are actually identical. R squishes one of those columns and reports a NaN coefficient, and sensible values for the other coefficients (on the correct order). Eigen gives me +6.3e16 coefficient for one offending column, and -6.3e16 coefficient for the other offending column. The non-offending variables are also impacted: their coefficients come out on the order of 0.1. I ran the following code (assuming you meant 'norm' and not 'borm' ) and received a value of +58.3652. Not sure how to interpret this, sorry. R gives me a value of 0 for the equation, because the numerator portion is 0.
|
Moderator
|
This might be an overflow issue. Are you using float or double?
|
Registered Member
|
doubles
More random information: My X matrix is full of 'observation' type variables (all values 0 or 1)...so that's why it's easy on occasion to have equal columns. In this example I have 30 variables, and 330 observations. Excel even gives me sensible results (surprise!). It gives me 0.0 for one of the two offending columns. |
Moderator
|
Hm, ok it seems I can reproduce the issue. In the meantime you can use QR:
res = X.householderQr().solve(y); or res = X.colPivHouseholderQr().solve(y); or even full piv LU: res = X.fullPivLu().solve(y); |
Registered Member
|
Thanks for the suggestions Gael,
Of those, here are my observations for my particular regression: HouseholdQR matches R, except that it still returns nonsense for the offending variables (coefficients of + and - 5.46e12) ColPivHouseholdQR matches R exactly (well, ColPiv chose a different variable to squash, and it squashed it with a 0 rather than NaN/NA) FUllPivLU was way off - my coefficients only correlated 29% with those from R. I think I'll go with HouseholdQR, since I already have logic to drop bizarre coefficients. As such, I'll interpret these as NaN anyway (rather than ColPivHousholdQR which returns a 0 for an offending column. 0 is an acceptable value in my domain, so I can't detect it as 'bad'). |
Registered Member
|
Do these methods give directly a least-squares solution though? |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]