Learning from running financial models live

Mar 9, 2025

Let’s say you are the world’s best burger chef (we all have ambitions, right). You’d be serving up all manner of burgers for your customers. It would be odd though, wouldn’t it, if you never actually ate any of the burgers to test the quality. After all, what if the quality of the ingredients you get from your suppliers has declined? Maybe ingredients on paper look good, but in practice don’t work? An overly “busy” burger with too many toppings, ends up being more an exercise in endurance to finish, rather than anything else.

If you’re building a statistical model, you can’t exactly eat it. In practice though eating a burger, if you’d created it, is a way to test whether the reality of the idea actual works in practice, and does this opinion persist over time. If we go back to our statistical model example, it’s the equivalent of understanding whether the model actually works once you’ve created it. Just like the burger example, it’s not simply a matter of checking a model once after it’s in a live environment, it’s about continual monitoring.

Monitoring the performance of our forecasting models in a live environment is something we do continuously at Turnleaf Analytics. Over the past 3 years, we’ve been publishing inflation forecasts, and throughout that time we’ve increased the indicators which we forecast, as well as expanding the countries we cover. In financial markets, you aren’t ever going to get a 100% hit rate, or even approaching it! However, you can always try to improve your models to see if you can improve their performance.

Ultimately, there are two ways we’ve increased the accuracy of our forecasting models. The first way is by finding more data to add to the model. Backtests can be useful (with all the various caveats associated with them), but by monitoring live performance, it can often guide you towards what data might be missing from your model and also understand how fitted your model is. Observing live performance can help focus you in a way that augments insights from backtest. As I have often written, you can’t backtest pain, that only comes with a live production model! Some examples of where this can happen, in the case of developing a forecasting model, is for example when you might miss a short term nowcast. You can dig down into the release to see if you can understand why you missed.

The second way is in terms of improving the modelling. I’m not necessary suggesting trying hundreds of different models and then picking the best. There is of course something to be said by looking at different models, and trying to choose the one which might be most suitable for your problem, which in our case is time series forecasting. From a live model perspective, you might also be to observe situations where the models underperforms versus expectations (eg. by comparing the distribution of errors over time). Obviously, it isn’t always going to be the case that new data or modelling techniques will necessarily help improve accuracy. I would say on the whole, though, we’ve found that additional relevant datasets have helped us unlock more accuracy than changes to the actual model itself.

In our case, it seems very natural to try to learn from the live performance of our forecasting models, given it’s the service we provide. However, even for the various systematic trading use cases that we have researched and published as whitepapers, we have begun to also monitor their performance out-of-sample/post publication. This has provided us with invaluable feedback when developing new models, in particular in understanding the ways to make these strategies robust. You can follow the performance of our trading strategy use cases at https://turnleafanalytics.com/strategies/. I’ve also started to refresh the performance of trading strategies I’ve research over the past 2 decades (in particular in FX), and I’m pleased to say many of them still work. For those that no longer work, it’s also been a good experience to try to understand why they have stopped working (eg. different market dynamics, strategies reached capacities etc.)

I’m definitely of the view that you can learn a lot from live models, which you might not necessarily learn from a backtest, whether they are production models or more for research purposes. So next time you develop a model, make sure it’s easy to update so you can regularly check back to see how everything played out! If you cook something, you have to taste it.