We show that S3D creates parsimonious models that can predict outcomes in the held-out data at levels comparable to state-of-the-art approaches, but in addition, produces interpretable models that provide insights into behaviors. We apply S3D to learn models of online activity from large-scale data collected from diverse sites, such as Stack Exchange, Khan Academy, Twitter, Duolingo, and Digg. This partitioned subspace allows us to predict and analyze the behavior of the outcome variable both statistically and visually, giving a medium to examine the effect of various features and to create explainable predictions. The algorithm, which is inspired by decision trees, selects important features that collectively explain the variation of the outcome, quantifies correlations between the features, and bins the subspace of important features into smaller, more homogeneous blocks that correspond to similarly-behaving subgroups within the population. To address these challenges, we describe a statistical approach to modeling behavioral data called the structured sum-of-squares decomposition (S3D). An additional challenge is learning an interpretable model that not only accurately predicts outcomes, but also identifies important factors associated with a given behavior. Modeling human behavioral data is challenging due to its scale, sparseness (few observations per individual), heterogeneity (differently behaving individuals), and class imbalance (few observations of the outcome of interest).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |