20 October 2020

Ten Years On: The Data Deluge 2.0

Opinions expressed whether in general or in both on the performance of individual investments and in a wider economic context represent the views of the contributor at the time of preparation.

Executive summary: Technology spend as a percentage of GDP is set to double over the next decade. This should not be surprising since data sits at the heart of almost everything we do, both personally and professionally. During the next five years the world is set to see an over fourfold increase in the amount of data generated. However, for that data to have any value, it needs to be stored, secured and analysed. We have been making this point for the last decade and our data deluge thesis remains both highly relevant and valid. As our piece highlights, cloud computing and the leveraging of artificial intelligence (among other trends) stand in very early innings of maturity at present. Expect further growth in silicon combined with open hardware and software standards. Businesses that sit at the heart of the digital ecosystem and which can innovate at scale should continue to prosper. ASML, Equinix, Keysight Technologies as well as the leading hyperscale cloud businesses look particularly well placed.

It is almost ten years since we wrote our initial theme piece at Heptagon. Our data deluge thesis appeared in March 2011 and has featured in every annual commentary since. Both a lot and a little has changed since then. As we wrote in the initial piece, “whether we like it or not, we are being deluged by data” and also that “growth [in data] appears exponential, demand compulsive and efficiency gains indubitable.” Our primary contention – which still holds – is that the amount of data produced and consumed continues to grow exponentially. Indeed, more data was created in the last five years than in the previous 5,000 (per McKinsey). However, for data to have any value, it needs to be stored, secured and analysed. Further, as the amount of data grows, the problems of managing information will only likely become more challenging.

Importantly, we expect this thesis to continue to hold true for at least the next decade. An increasingly digital society drives greater computing and data needs. In what is our 50th piece of proprietary thematic work, we set out to provide some perspectives regarding our current thinking on data, and also how we expect it to evolve over the coming ten years. Owing to its importance as a theme, this piece is somewhat longer than the typical three pages we allocate to a topic.

More than five billion people interact with data on a daily basis, with over a million going online for the first time every day (per IDC – a leading consultancy in the field – and the World Economic Forum respectively). Bear in mind that a third of the planet is under 20 and 50% under 30. All are (or will soon become) digitally native. The young, self-evidently, are a proxy for the future. As a result, by 2025, the number of people that will interact with data will be 6bn, or 75% of the world’s population. At this time, each connected person will have at least one data interaction every 18 seconds (per IDC).

IDC estimates that the world generated 40ZB (or zettabytes) of data in 2019. On its forecasts, this figure will reach 175ZB by 2025. It’s often quite difficult to get one’s head around a number such as 175ZB, so to put it in context, consider that 1 zettabyte is equivalent to 1tr gigabytes, or the storage capacity of 250bn DVDs. If you were notionally to store all the data in existence, the stack of DVDs would reach the moon 23 times over.

Today, of course, anyone can do almost anything online. Every day, people make an estimated 6bn search queries on Google and upload more than 49 years’ worth of video to YouTube. More than 300bn emails are said to be sent every day; assuming that just a third of these originate on Gmail, then a stack of printouts would be 10,000km high. Viewed from a different perspective, ~$10,000 worth of goods change hands on Amazon’s e-commerce platform every second. Further, some 40% of Americans now meet their partners online, up from ~20% in 2010 and ~2% in 1995 (data from Google, Amazon and GFK respectively).

Less visible but equally important advances that are underpinned by data would include the following three examples. Consider that the first human genome was sequenced in 2003 and comprised 3GB (or one DVD) of data. Now, 23andMe, a genetic testing firm, has over 12m customers. Next, vehicles on the road which currently perform basic autonomous driving tasks, such a lane assist or cruise control, are generating 90TB (~20,000 DVDs) of data daily. Finally, some 130 new devices are being connected online, to an internet of ‘things’ daily (data from 23andMe, AMD and McKinsey respectively).

If we try to define exactly what is going on, then who better perhaps to turn to than Satya Nadella, the Chief Executive of Microsoft? As he recently put it, “the defining secular trend will be the increasing rate of digitalisation of people, places and things.” Thought of another way, Andy Jassey, Chief Executive of Amazon Web Services believes “we are in an unprecedented era of mass data.”

Data could then be considered as the new oil – a valuable good that powers the economy and countries may go to war over. A more positive interpretation might liken it to sunlight – it is everywhere and underlies everything. Another view sees data as the currency of the online world – an asset that can be gathered, analysed, sold and sometimes even stolen. Even if metaphors only taken you so far, it is easy to see why technology spend as a percentage of GDP is expected to double over the next decade (per Microsoft). Where will it be spent? Per our original deluge thesis, on three interlinked, overlapping and mutually reinforcing areas: storage, security and analysis.

At present, less than 20% of data created gets stored. Given the rate at which it is being generated, less than 3% will be stored by 2030 and only 1% by 2040 (per Seagate). This raises several related questions. First, why is so little stored? The main reason is one of practicality – not that companies can’t store their data; more that they don’t know how to value the data they generate. Fewer than 100 of the world’s largest companies collect more than half of all data generated online. Overall, some 60-73% of data generated in an enterprise goes unanalysed (per Microsoft and Forrester respectively). Should more be stored? Probably, yes. The OECD estimates that if data were more widely exchanged and analysed, many states would enjoy gains of between 1.0 and 2.5% of GDP annually.

The problem is less one of where to store the data and more one of what to store. Data are typically stored on a combination of servers (sometimes known as ‘the core’), enterprise infrastructure such as cell towers (‘the edge’) and devices such as PCs and smartphones (‘the endpoint’). The majority is stored in the core on servers, which can either be on-premise or distributed (in ‘a cloud’). The cloud is not a physical entity, but rather a vast network of remote servers around the globe, which are linked together and can operate as a single ecosystem.

Cloud infrastructure therefore allows for information to be accessed anywhere on any device that has an internet connection. This is a very powerful proposition. Cloud also means that companies can adapt at speed and scale, accelerate innovation, drive business agility, streamline operations and reduce costs. Even if “the cloud is powering the future” (per Werner Vogels, Amazon’s Chief Technology Officer), we are still in a very early innings of its development. Less than 10% of the estimated $4tr of annual global IT spend has so far migrated to the cloud. However, by 2025, almost 50% of the world’s stored data will reside in public cloud environments (per AWS and IDC respectively).

The crucial question then remains one of deciding what to store and then making the most of that which is stored. Perhaps the solution lies in artificial intelligence (or ‘AI’). Consider the following three quotations from industry luminaries, respectively Arvind Krishna (CEO of IBM), Sundar Pichai (CEO of Google) and Werner Vogels: “every company will become an AI company – not because they can, but because they must”. Next, AI “could be more profound than electricity or fire.” And, AI “is no longer an emerging technology… it promises to be a game-changer.”

Billions of people already benefit from some artificial intelligence on a daily basis without even realising; it resides in our smartphones and the internet services we use daily. Whether we like it or not, AI is always on; always tracking, monitoring, listening and watching our digital lives. The reason why? Because it is learning. Algorithms (like robots, but unlike humans) do not need to sleep or take holidays. Consider also that since 2012, the amount of computing power needed to train a neural network has been decreasing by a factor of 2 every 16 months (per Open AI, an industry body).

We are still in an early innings here too. Although 83% of businesses view AI as a strategic opportunity, 81% of business leaders do not understand the data required for AI and, even if they do, over 80% of that data is either inaccessible, untrusted or unanalysed (per IBM). Business (and consumers) find change difficult, particularly if it means implementing AI at the expense of humans. Nonetheless, the big data and analytics market could be worth over $300bn by 2025. Within this, dedicated AI spend could comprise over $110bn, more than double the level likely spent this year (per IDC).

Of course, all data that is stored and analysed has no value unless it is properly secured. The global cost of cybercrimes to organisations has increased twofold in the last five years to reach ~$6tr annually. Unsurprisingly then, some $8tr is spent every year on fighting cybercrime (per Juniper Networks). The big challenge is that while 48% of companies say that they plan to increase cybersecurity spending – by an average of ~30% this year – some 2.9m organisations globally say they face cyber skill shortages and/or budgets gaps (per Capgemini and Mimecast respectively). As with storage and analytics, the opportunities for those who get it right in cybersecurity could be significant.

Any discussion about data needs to remember that not everyone has access to the same technology. The things readers of this piece take for granted – for example, fast and plentiful broadband, unfettered access to information, data rights enshrined in regulation – are not ubiquitous. Data costs vary markedly too. 1GB of data costs over $20 in Malawi, Benin or Chad, versus less than $2 in most developed world countries (per IDC). Bridging the digital divide is therefore crucial. Boosting a country’s mobile internet penetration by 10 percentage points correlates with a 2 percentage point increase in GDP (per the OECD). Ventures such as Project Loon (an Alphabet initiative) are encouraging. Here, the hitherto unconnected can now gain access via an aerial wireless network of balloons aimed at delivering connectivity in rural areas. Services went live in Kenya in July.

Beyond access, there is also a role for regulation. The broader debate centres on the potential misuse and/or abuse of data; where privacy begins and ends. Consider the earlier quotation from Sundar Pichai. He continues that “we have learned to harness fire for the benefits of humanity, but we have to overcome its downsides too… AI is really important, but we have to be concerned about it.” More data does not mean higher quality data. Each new way of looking at the world through machinery may bring its own biases. In a deluge, these would become harder to spot. Maybe the correct question to be asking is not what can AI do, rather what should it do? Algorithms need to fit into a world run by humans. There remains a very big difference between ‘data’ (facts) and ‘knowledge’ (their application).

At the least, what the above implies is that innovation is going to be necessary across the whole value chain. What might come next? We offer up three possibilities: quantum computing, neuromorphic chips and DNA storage. We first wrote on the potential of quantum computing in October 2017. While the theory behind the technology is well understood and despite billions of dollars of funding from the likes of Google, Microsoft and IBM, building stable quantum computers remains an engineering challenge. Nonetheless, at least $10bn is likely to be invested in this area by 2024 (per HSBC).

Moving on, neuromorphic chips offer some intriguing possibilities. They may mimic more closely than any other current application the electrical behaviour of the neurons that make up the brain. This would allow them to be good at pattern recognition, hence adjusting to real time information, learning from mistakes and tolerating errors. Importantly, they are also highly efficient. An IBM prototype neuromorphic chip has 5 times as many transistors as an Intel processor but consumes only 70MW of power, half the level of a typical processor. Watch this space.

To get around some of the challenges related to data storage, one intriguing idea worth considering is DNA storage. Historically thought of as a medium to store genetic information, DNA is rapidly evolving as a viable digital data storage method. Encoding digital data in synthesised DNA has a very high data density; orders of magnitude beyond conventional storage systems. 1g of DNA could hold nearly 1TB of data. Scientists at Harvard have already managed to store the text of a book on DNA. Expect more innovation. The plummeting cost of DNA sequencing combined with advances in synthetic biology and gene-editing technology is making DNA storage increasingly economic. Commercialisation seems imminent, even if some challenges remain relating to efficient scaling.

Disclaimers

The document is provided for information purposes only and does not constitute investment advice or any recommendation to buy, or sell or otherwise transact in any investments. The document is not intended to be construed as investment research. The contents of this document are based upon sources of information which Heptagon Capital LLP believes to be reliable. However, except to the extent required by applicable law or regulations, no guarantee, warranty or representation (express or implied) is given as to the accuracy or completeness of this document or its contents and, Heptagon Capital LLP, its affiliate companies and its members, officers, employees, agents and advisors do not accept any liability or responsibility in respect of the information or any views expressed herein. Opinions expressed whether in general or in both on the performance of individual investments and in a wider economic context represent the views of the contributor at the time of preparation. Where this document provides forward-looking statements which are based on relevant reports, current opinions, expectations and projections, actual results could differ materially from those anticipated in such statements. All opinions and estimates included in the document are subject to change without notice and Heptagon Capital LLP is under no obligation to update or revise information contained in the document. Furthermore, Heptagon Capital LLP disclaims any liability for any loss, damage, costs or expenses (including direct, indirect, special and consequential) howsoever arising which any person may suffer or incur as a result of viewing or utilising any information included in this document.

The document is protected by copyright. The use of any trademarks and logos displayed in the document without Heptagon Capital LLP’s prior written consent is strictly prohibited. Information in the document must not be published or redistributed without Heptagon Capital LLP’s prior written consent.

Heptagon Capital LLP, 63 Brook Street, Mayfair, London W1K 4HS
tel +44 20 7070 1800
email [email protected]

Partnership No: OC307355 Registered in England and Wales Authorised & Regulated by the Financial Conduct Authority

Heptagon Capital Limited is licenced to conduct investment services by the Malta Financial Services Authority.