Machine learning applications in quantitative trading cover signal generation, execution optimization, risk management, and alternative data processing. Traditional quant relies on manually constructed factors; ML allows models to automatically discover nonlinear patterns from thousands of raw features — this is ML’s core advantage and the primary source of its overfitting risk.
## Major ML Models in Quantitative Trading
**Gradient Boosted Trees (XGBoost/LightGBM)**: currently the top-performing ML model class on structured financial data (financial data, price/volume data). Advantages: insensitive to feature scaling, solid to missing values, relatively interpretable (feature importance), fast training. Most domestic quant private funds’ core stock selection models are based on XGBoost or LightGBM.
**LSTM (Long Short-Term Memory)**: the classic deep learning model for sequential data, suited for capturing time-dependent relationships in price time series. In practice, LSTM’s advantage in financial time-series prediction is less than academic papers suggest — primarily because financial time-series have extremely low signal-to-noise ratios, and models easily learn noise rather than real signals.
**Transformer/LLMs**: converting unstructured text (news, earnings reports, analyst reports) into quantitative trading signals (sentiment analysis, event extraction) is one of the most active current research areas. Bloomberg GPT and FinBERT are pre-trained models designed specifically for financial NLP.
## Special Challenges of Financial ML
**Feature Leakage (Look-ahead Bias)**: using data that would only be available in the “future” when the strategy actually runs, inflating backtest results while live performance disappoints. Common sources: earnings data publication delays (report release date vs. reporting period end date); index constituent change look-ahead bias.
**Low signal-to-noise ratio**: approximately 95% of financial market price movements are noise; truly predictable signals are rare. Many ML models overfit the training set and show dramatically worse test set (out-of-sample) performance.
**Non-stationarity**: financial time-series statistical properties change over time (market regime change), causing historically effective models to fail in new market environments.
See [Quantitative Investing Intro](https://sunqi.org/quantitative-investing-intro-en/) and [Marcos Lopez de Prado “Advances in Financial Machine Learning”](https://www.wiley.com/en-us/Advances+in+Financial+Machine+Learning-p-9781119482086).




