Any advice on my implementation of logistic regression?

Board index

Page 1 of 1 (5 posts)

Tags:

kde-crazy Registered Member Posts 20 Karma 0	Any advice on my implementation of logistic regression? Thu Dec 03, 2015 1:16 pm I am a newer with eigen, and I implement a logistic regression model with eigen. It works but I don't know whether it is implemented in a efficient way or is there any operation in my implementation is low efficient and could get some improvement. Code: Select all #include <iostream> #include <Eigen/Dense> #include <cmath> using namespace Eigen; using namespace Eigen::internal; using namespace Eigen::Architecture; using namespace std; class logistic_regression { public: VectorXd w; double b; logistic_regression(int n_in) { this->w = VectorXd::Random(n_in); this->b = 0.0; } void train(MatrixXd train_datas, double lr) { VectorXd dw(this->w.rows()); double db; dw = train_datas.rightCols(1) - calc(train_datas.leftCols(this->w.rows())); db = dw.mean(); MatrixXd tmp = train_datas.leftCols(this->w.rows()); tmp = tmp.array().colwise()dw.array(); dw = tmp.colwise().mean(); w += lr dw; b += lr * db; } VectorXd predict(MatrixXd inputs) { return ((1 / (1 + ((-inputsthis->w).array() - b).exp())).array() > 0.5).cast<double>(); } double test_error(MatrixXd datas) { VectorXd outputs = predict(datas.leftCols(w.rows())); / cout << outputs << endl << endl << endl; cout << datas.rightCols(1) << endl << endl << endl; cout << (outputs - datas.rightCols(1)) << endl; cout << ((outputs - datas.rightCols(1)).array() > 1e-5).count() << endl;/ return ((outputs - datas.rightCols(1)).array() > 1e-5).count() / (double)outputs.rows(); } }; MatrixXd linear_separable_dataset_generator(int n, int dim, VectorXd w, double b) { int n_col = w.rows(); MatrixXd datas = MatrixXd::Random(n, n_col + 1); datas.rightCols(1) = (((datas.leftCols(n_col)w).array() + b) > 0).cast<double>(); return datas; } void test_logistic_regression() { logistic_regression lr(2); VectorXd w(2); double b = 0.2; w << 0.3, 0.6; MatrixXd train_datas = linear_separable_dataset_generator(1000, 2, w, b); for (int i = 0; i < 500; i++) { lr.train(train_datas.topRows(990), 0.1112); cout << "epoch:" << i << " error:" << lr.test_error(train_datas) << endl; } cout << "w:" << lr.w / (lr.b / 0.2) << endl; cout << "b:" << lr.b << endl; getchar(); return; }
ggael Moderator Posts 3447 Karma 19 OS	Re: Any advice on my implementation of logistic regression? Fri Dec 04, 2015 9:31 am In all function, use const reference to pass vectors and matrices, e.g.: void train(const MatrixXd &train_datas, double lr) In train, no need to introduce tmp: w += lr * (train_datas.leftCols(this->w.rows()) * dw.asDiagonal()).colwise().mean(); In predict: .array() > 0.5 -> .eval() > 0.5 ; array() is redundant here, and better evaluate the subexpression to fully benefits from vectorization (bench to be sure this really improve perf.) In linear_separable_dataset_generator: datas.rightCols(1) -> datas.col(datas.cols()-1) so that Eigen knowns this is a vector.
kde-crazy Registered Member Posts 20 Karma 0	Re: Any advice on my implementation of logistic regression? Fri Dec 04, 2015 10:59 am ggael wrote:In all function, use const reference to pass vectors and matrices, e.g.: void train(const MatrixXd &train_datas, double lr) In train, no need to introduce tmp: w += lr * (train_datas.leftCols(this->w.rows()) * dw.asDiagonal()).colwise().mean(); In predict: .array() > 0.5 -> .eval() > 0.5 ; array() is redundant here, and better evaluate the subexpression to fully benefits from vectorization (bench to be sure this really improve perf.) In linear_separable_dataset_generator: datas.rightCols(1) -> datas.col(datas.cols()-1) so that Eigen knowns this is a vector. Thanks a lot! I think using const reference will improve the performance a lot bucause it doesn't have to do the assignments , but after I tried several times with -O0 shuting down the optimizer, I found this way does not improve the performance. Isn't it weird?
kde-crazy Registered Member Posts 20 Karma 0	Re: Any advice on my implementation of logistic regression? Fri Dec 04, 2015 11:48 am ggael wrote:In all function, use const reference to pass vectors and matrices, e.g.: void train(const MatrixXd &train_datas, double lr) In train, no need to introduce tmp: w += lr * (train_datas.leftCols(this->w.rows()) * dw.asDiagonal()).colwise().mean(); In predict: .array() > 0.5 -> .eval() > 0.5 ; array() is redundant here, and better evaluate the subexpression to fully benefits from vectorization (bench to be sure this really improve perf.) In linear_separable_dataset_generator: datas.rightCols(1) -> datas.col(datas.cols()-1) so that Eigen knowns this is a vector. As for what you said about predict function: .array() > 0.5 -> .eval() > 0.5 ; This seems cannot work bucause there is no > operation for a matrix like this. Code: Select all `MatrixXd a = MatrixXd::Random(5, 5); cout<<(a>0.2); //this is illegal`
ggael Moderator Posts 3447 Karma 19 OS	Re: Any advice on my implementation of logistic regression? Mon Dec 07, 2015 12:20 pm 1 - Benchmarking without compiler optimizations is meaningless. You need at lest to enabled -O2 or -O3. 2 - Yes, Matrix>Scalar is illegal, but in your case, the expression "(1 / (1 + ((-inputs*this->w).array() - b).exp()))" is already an array. In doubt, you can still do .array().eval().