When a new burger joint opens up, there’s often a buzz. Everyone (well, at least me) wants to try the new burger. Is it as good as it looks on Instagram? Or is it just style over substance, and simply an overpriced fast food place? This week, in the world of AI, the buzz has most definitely been about DeepSeek. I’ve lost count of the number of posts I’ve seen about it on LinkedIn, Twitter, Bluesky etc.
The figure cited for the development of DeepSeek’s R1 is 6 million dollars. There have been all sorts of takes about what precisely this figure covers. I think it’s a fair assumption that whatever the exact figure is, whatever is , the amount of resources available to DeepSeek, whether in terms of compute or people etc. is likely to be much lower than that which has been available to many other firms developing LLMs. From this assumption, it follows that DeepSeek faces more resource constraints than the competition.
If we compare the various firms currently racing to create and improve LLMs, all have similar objectives (creating better LLMs) albeit with some are trying the open source approach and others are sticking to closed source. However, as mentioned their constraints differ. This has obviously influenced how they have tried to solve the problem. If you have less constraints in terms of things like compute, then it might be tempting simply to scale up models to take advantage of this. After all, this is the path of least resistance. However, if you faced much more compute constraints, you have rethink how you write your algorithm, to reduce the compute needs.
This is of course not unique to LLMs. If we think back 30 years to the advent of the internet, most computers had only a couple of megabytes of RAM. Somehow, they managed to run web browsers, like Netscape in a graphical environment, whether on a Mac or Windows. The constraints facing developers meant they had to make sure their code did not need large amounts of memory. Admittedly, websites were much simpler than they are today, so by their nature web browsers didn’t have to do as much heavy lifting. Today, a web browser alone can consumer gigabytes of RAM, rendering a lot of graphics etc. on the client side, and why not? There’s little point spending ages trying to ensure your web browser takes a tiny footprint in memory, if memory is so cheap, right?
I would suggest though that a balance needs to be taken, between utilising “cheap” compute and other similar resources. If the answer is always, more compute/more data, it can stifle innovation, because more compute is the “easier” option. We have also seen the energy consumption of data centres soar, probably relating to the point that compute is “cheap” so why not use it? Whilst I have focused on compute constraints, it can relate to any other constraint you face in building models, notably in terms of data availability, for a particular problem.
Closer to home, at Turnleaf Analytics, we obviously have some resources to undertake our objective of creating ever more accurate inflation forecasts, and we’ve been following this route for the past few years. At the same time, any startup faces constraints. However, it has helped to push us to innovate, whether it is through ways of speeding up our compute, finding new data, improving the model itself etc. When it comes to finding new data for improving forecasting inflation, it isn’t simply a case of throwing more data at the problem, but selecting the right data! The nature of the problem is also not necessarily amenable to solving by using more compute either.
Should we want more compute to reach our objective? Yes, problems need more compute to solve them (some require a lot!) but it shouldn’t always be the first answer, because other solutions might be better. Next step, trying to find that new burger place…!