Excerpt from the book
Missing and Minority Data. Most financial time series have a great deal of uninteresting data, such as small price changes, and little important data, such as large rallies and routs. As rare as this minority data is, the rarest and most valuable data are the minority data that look like majority data—the small price movements that warn of large ones to come.
Oversampling, undersampling, and combinations of both are common ways to manage imbalanced data. Oversampling may include random oversampling, SVM, Synthetic Minority Oversampling Technique (SMOTE), “borderline” methods that use only misclassified minority data, and Adaptive Synthetic Sampling (ADASYN), which uses a density metric.