statistics - logistic regression with sparse predictor variables -


i modeling data using binary logistic regression. dependent variable has number of positive cases , negative cases - not sparse. have large training set (> 100,000) , number of main effects i'm interested in 15 i'm not worried p>n issue.

what i'm concerned many of predictor variables, if continuous, 0 of time, , if nominal, null of time. when these sparse predictor variables take value > 0 (or not null), know because of familiarity data should of importance in predicting positive cases. have been trying information on how sparseness of these predictors affecting model.

in particular, not want effect of sparse important variable not included in model if there predictor variable not sparse , correlated doesn't job of predicting positive cases. illustrate example, if trying model whether or not ended being accepted @ particular ivy league university , 3 predictors sat score, gpa, , "donation > $1m" binary, have reason believe "donation >$1m", when true, going predictive of acceptance - more high gpa or sat - sparse. how, if @ all, going effect logistic model , need make adjustments this? also, type of model (say decision tree, random forest, etc) handle better?

thanks, christie


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -