Over the years the number of datasets which have come to the market have increased significantly. How do data buyers in the buy and sell side find such data? One way has been to approach individual sources of data, whether it is data vendors or corporates, ie. through outbound requests, often as a result of specific questions from the business whether that is data scientists or portfolio managers. The flipside, inbound requests, where data vendors seek to find data buyers has also been another approach.
Data conferences seek to bring together both data buyers and sellers in one place to make this whole process efficient, most notably the Eagle Alpha conference in London, which I recently spoke at and attended. In this article, I’ll look to write some of main takeaways from the Eagle Alpha conference. Whilst, I can’t cover every session, I’ll try to give a flavour of some of the discussions.
The supply of data and the market
Niall Hurley gave the keynote address at the conference discussing how their Eagle Alpha platform has around 2000 datasets on it, alongside due diligence documentation. The idea of the platform is to allow data buyers to quickly browse what’s out there instead of having to separately approach many data vendors themselves, which can take a lot of time. He noted how we are now in a post covid era for alternative data. One thing that been said to me many times, was that it became very apparent during covid was that alternative data could plug gaps in more traditional datasets because of publication lag. Despite the view that everything has normalised, I still believe there is tremendous value in alternative data not purely from the perspective of plugging that publication lag, but also in terms of giving your different insights. Hurley also made a parallel between the shale gas revolution and the falling costs of extraction with that in terms of getting new datasets to market. Supply of data from corporate environments could be one of the new stories.
Also on the topic of supply data there was a discussion with Mark Fleming-Williams (CFM) and also the presenter of the fantastic Alt Data Podcast and Connor Emmel (Apptopia) discussing the data market. Given the session was a closed session purely for data vendors, I won’t discuss the specifics of the discussion. However, it was very useful from my perspective, despite being in the alt data space for quite a long time. The session went over some of the aspects of the whole data selling cycle. I think my key takeaway is that clients can be very different, and also based on my own experience of being at Turnleaf Analytics. For example, a discretionary fund will have a very different approach to data compared to a systematic fund, and vendors need to be cognoscente of this. This is of course is not the only difference between data buyers, but it is perhaps one of the clearest differentiating factors.
Macro, inflation, data and point-in-time
I appeared on a panel on macro and inflation (thanks James Munro for the above photo!), which was moderated by Brendon Furlong (Eagle Alpha), alongside panelists Lasse Simonsen (JPMorgan/MacroSynergy) and Meghna Shah (Macrobond). Typically, when people think of alternative data, it is mostly related to data for individual companies most suited to equities investors. Indeed, whilst there are still more equity based datasets, macro datasets in alt space are growing. One the complexities of using macro data is that trying to map it to an asset can require more domain knowledge knowledge.
If we think more broadly about macroeconomic data it is also different to market data. One difference is that you can have many timestamps associated with the same point. With very high frequency market data, you might also have this (eg. the exchange time, the time at which the data was snapped), but it’s not something you observe with alternative data. With macroeconomic data, a single data point has the reference data (what period is being measured for?), then you can have multiple revisions of that same data, each of which have a different release date. Simonsen noted the importance of having point-in-time data which could record this. Shah talked about some of the new datasets available on Macrobond that can enable investors to keep track of economic trends at a higher frequency basis, such as debit & credit card spending. On my side I discussed what our Turnleaf Analytics inflation forecasts have been saying. For the US our models have been pointing to the theme of higher inflation for a few months, before the market. At present, our models are still suggesting that US inflation will be higher than market expectations (which have readjusted higher since the beginning of the year.
Data vendor pitches
There were a number of data vendor pitches on the day, across a wide variety of different areas, mostly focused towards equity use cases. I’ll go through a selection of the vendors who presents, given the number of presentations during these sessions.
Babel Street Data (formally Vertical Knowledge) specialised in web scraping, discussed their industry specific data. There was also ETF Global which talked about their ETF reference data, which has expanded from US to EMEA datasets. ESG has been a hot topic more broadly, and altlastic.ai talked about their high frequency dataset around perceived corporate trust, reputation (and ESG) insights, derived from news data. The dataset was not purely examining sentiment, but also quantified the impact of the news. One difficulty with machine learning is having training data. Fantix presented a solution for creating synthetic data (eg. for US consumers) that could be used for filling gaps in your training data.
In macro space, Macrobond presented some of the new developments in their popular macroeconomic dataset. It is now available to clients on Snowflake. They also discussed their new Macrobond One product, which give access to some premium datasets. Turnleaf Analytics is now partnering with Macrobond to offer our economic forecasting data on Macrobond One.
How to store time series data
Man Group are one of the most well known hedge funds. They have many funds under their umbrella including the quant fund AHL. James Munro (Man Group), discussed how they had built their own database ArcticDB to store the massive amount of data they generate, not only market data but also alt data. Across their various systems, the data frame has become a quite a universal way to represent the various datasets they use. Their new database ArcticDB makes it easy to store data frames, with the ability to store them in a point-in-time way (using versioning). I’ve actually used it as well, and have included adapters in my open source library findatapy (here’s a Jupyter notebook showing how to use ArcticDB with findatapy to download and store market data). Whilst ArcticDB has adapters for use in Python, it’s core is now written in C++. Unlike it’s predecessor, Arctic, ArcticDB is also serverless, so it doesn’t need a database instance to run. Instead it just needs a disk drive (either local or a remote disk like s3).
Also on the subject of keeping data and analytics together, Amy Young (Microsoft) and Thomas Oliver (BlackRock), discussed BlackRock’s Aladdin platform and how it could be used together with Azure.
Discussion on regulatory aspects of data
The whole LLM discussion has again put into the fore regulations and the use of data. A panel with Patrick Van Eecke (Cooley), Christophe (Bignon Lebray) and Sanaea Daruwalla (Zyte) discussed the regulatory environment for data. It was noted that the EU was a forerunner when it came to data regulation, however, the UK and USA wouldn’t necessarily always follow in their footsteps. However, there could still be a dribble down impact across other jurisdictions. It was key to understand how risky various use cases of AI were. For example, in cases like humanoid chat bots, the risks could be much higher than other areas (eg. using AI to parse a PDF). It was important to have an internal AI policy, to ensure that confidential data was not being exposed.
Conclusion
The Eagle Alpha conference day was super packed, but hopefully I’ve managed to capture a few of the main discussion points. What was very apparent is that the area of alt data is definitely expanding each year, as evidenced by the amount of interest in the topic, and the number of data vendors offering new datasets, as well as the fact that data buyers are no longer purely from the quant space but also across a multitude of more discretionary funds.