BigDaddyDrew
Registered Member

Hi all,
I use the following method from the docs to do linear regression:
It works great, except when my pesky X matrix has singularities. Is there a way to get a least squares solution that deals with this in a sexy way? When running the same regressions in R (for example), it simply squashes one of the variables causing the Singularity. Thanks in advance! 
ggael
Moderator

It's already supposed to work that way. What do you get in such a case? What's the value of (X*resy).borm()/y.norm() where res = X.jacobiSvd( ComputeThinU  ComputeThinV ).solve( y )

BigDaddyDrew
Registered Member

Thanks for the reply Gael.
I'm expecting coefficients on the order of 0.0001. I have two columns that are actually identical. R squishes one of those columns and reports a NaN coefficient, and sensible values for the other coefficients (on the correct order). Eigen gives me +6.3e16 coefficient for one offending column, and 6.3e16 coefficient for the other offending column. The nonoffending variables are also impacted: their coefficients come out on the order of 0.1. I ran the following code (assuming you meant 'norm' and not 'borm' ) and received a value of +58.3652. Not sure how to interpret this, sorry. R gives me a value of 0 for the equation, because the numerator portion is 0.

ggael
Moderator

This might be an overflow issue. Are you using float or double?

BigDaddyDrew
Registered Member

doubles
More random information: My X matrix is full of 'observation' type variables (all values 0 or 1)...so that's why it's easy on occasion to have equal columns. In this example I have 30 variables, and 330 observations. Excel even gives me sensible results (surprise!). It gives me 0.0 for one of the two offending columns. 
ggael
Moderator

Hm, ok it seems I can reproduce the issue. In the meantime you can use QR:
res = X.householderQr().solve(y); or res = X.colPivHouseholderQr().solve(y); or even full piv LU: res = X.fullPivLu().solve(y); 
BigDaddyDrew
Registered Member

Thanks for the suggestions Gael,
Of those, here are my observations for my particular regression: HouseholdQR matches R, except that it still returns nonsense for the offending variables (coefficients of + and  5.46e12) ColPivHouseholdQR matches R exactly (well, ColPiv chose a different variable to squash, and it squashed it with a 0 rather than NaN/NA) FUllPivLU was way off  my coefficients only correlated 29% with those from R. I think I'll go with HouseholdQR, since I already have logic to drop bizarre coefficients. As such, I'll interpret these as NaN anyway (rather than ColPivHousholdQR which returns a 0 for an offending column. 0 is an acceptable value in my domain, so I can't detect it as 'bad'). 
dim_tz
Registered Member

Do these methods give directly a leastsquares solution though? 
Registered users: andreas_k, Baidu [Spider], Bing [Bot], Exabot [Bot], Google [Bot], google01103, gtrip, kilianl, Majestic12 [Bot], marcuskjeldsen, metzman, Saabhero, Sogou [Bot], StuieT, supaiku, Yahoo [Bot]