The human part of machine learning forecasts

Jun 16, 2024

I’ve seen a few videos showing a robot making a burger (here’s one of RoboBurger for example). It seems pretty impressive that a machine can create a fresh burger. However, one thing seems missing in those videos. How do the ingredients get there? That part is kind of ignored. Of course no ingredients means no burgers. It is what happens before the robot assembles the burger which is the most important step. A farmer has spent years raising a cow. Another farmer has grown tomatos for the ketchup. Another farmer has grown wheat for the flour in the bun and so on. The “robot” bit is the culmination of a massive amount of human effort.

When it comes to financial markets, it’s often what you don’t see which is most important too! Machine learning is creating a buzz in many places including financial markets, in particular models like ChatGPT, which work with text. Whilst these can be incredibly useful for financial markets, ultimately, what most folks are doing in financial markets boils down to forecasting time series (is a stock going up? is inflation going down? etc.). We can still use machine learning for time series forecasting, but typically we need to use different sorts of models when we do a regression, as opposed to ChatGPT.

Turnleaf Analytics, which I cofounded with Alexander Denev is built around using machine learning to forecast economic variables. What have we learnt from this about forecasting using machine learning? Machine learning allows us to squeeze more out of complex datasets, where simpler models aren’t as effective. However, in order to do the modelling effectively, requires a lot of human effort for research. It can be tempting to simply do a “horse race” of many different machine learning models and just pick the best one. In practice, you need to understand the particular forecasting problem in a lot of detail. Forecasting inflation is different to many other types of forecasting, whether it is inventory changes, market prices etc.

If you can understand the particular nuances of your particular problem it can help greatly. Feature engineering is a big part of solving the problem. In some areas, feature engineering has been automated, such as computer vision through the use of deep learning. However, in that problem there is a lot of data which can feed data hungry models like CNNs. If you are forecasting inflation you have a lot less data (yes, you can probably find a lot of variables, but you are restricted to monthly observations).

You also need the right data to train your models. The datasets you will use are also very different depending on your domain. Often the incremental performance improvement comes from adding additional datasets which are not readily accessible and not as obvious (ie. alternative datasets). After all, there’s a reason all the top hedge funds have teams of data strategists scouring the world for alternative data, to augment their existing traditional dataset. We also need to be able to clean the data effectively too, before it goes anywhere near a regression model.

It is tempting to think that we forecasting can be automated, if we use machine learning. Maybe there will be more effective AutoML tools for forecasting time series in the future. However, at this stage, it require a huge human effort to create effective models to forecast time series. Short cuts sound attractive, but ultimately our observation has been that the most important part of the forecast is having a team of very smart data scientists, who are specialists in the particular problem you are solving (which in our case is forecasting inflation). Once you have created a model, you need to continually research it, to improve it and also to find new data.