Skip to contents

Data (extracted word-frequency and other features for fiction volumes) used in modeling fiction genre by Underwood in Distant Horizons, chap. 2. The features are only those used in the regularized logistic regression models of science fiction, detective fiction, and Gothic supplied by Underwood in his reproduction repository, and do not include all features found in the accompanying source data files.





A sparse dgCMatrix with 1047 rows corresponding to documents and 4889 columns corresponding to features. The row names are document IDs corresponding to genre_meta$docid and the column names given the features as used by Underwood.