The case for building a full-stack machine learning company
This piece was originally published by Nathan Benaich on the Financial Times' Sifted.eu on the 5th September 2019.
On its path to success, a startup must solve two key problems. The first is to develop and distribute a product that creates significant new value for its users. The second is to capture a meaningful proportion of that value. Playbooks have been written about solving these two problem vectors in the era of online marketplaces, software-as-a-service (SaaS) businesses, enterprise software, and consumer internet products.
Today’s businesses are, however, competing on a new battleground by developing products uniquely enabled by Machine Learning (ML) technology. This nascent turf has a much less well-developed playbook.
In this piece, I make the case for operating as a “full-stack ML company” in order to maximize economic value capture. For a given problem, a “traditional ML company” would build a piece of the technology stack (component or tool) and then sell or license it to incumbents. In contrast, a full-stack ML company creates fully-integrated ML products that solve this problem end-to-end. From my vantage point as an investor at Air Street Capital focused on AI-first technology and life science companies, I argue that the best way to capture value in ML is to directly monetize your predictions by being full-stack vs licensing ML tools as software.
Full-stack ML companies eat their problem value chain
In its most practical instantiation, ML is best used as an automatic task solver. To illustrate this point, let’s consider the supervised learning paradigm. Provided with enough high-quality training data, an ML system can be trained to make high-quality predictions from real-world input data that it has never seen before. What’s important here is both the quality of the system’s output and the scale at which the system can be deployed while still offering economic value. The amount of value created will vary as a function of several variables. These might include the user persona (enterprise vs. small business), how important the target task is to their workflow, how closely this workflow is aligned with a profit or cost center for the business, and how much of the workflow can be eaten away if the task is solved.
For example, a business might purchase ML-powered software to protect their staff from network security threats that are beyond the detection ability of their own IT department. In this context, a traditional ML company might offer novel modeling tools or pre-trained models to help the IT department develop a home-grown security solution or to supercharge the detection performance of their existing third-party software solution. In contrast, the full-stack ML company will solve this security problem end-to-end. It might do so by abstracting away data collection, annotation, exploration, modeling, engineering, testing, system integration, and cloud/on-premise infrastructure deployment into a single product that serves threat predictions and solutions to its users. As such, the full-stack ML company’s product will often encompass solution components sitting either upstream or downstream of what the traditional ML company would otherwise provide.
A history of full-stack ML companies: Directly monetizing predictions
The paradigm of end-to-end problem-solving in ML is not new. To illustrate this point, let’s consider four waves of this theme that has played out over the last 20 years.
The first instantiation (‘90s-’00s) was in quantitative trading. Firms such as DE Shaw, Renaissance Technologies and Two Sigma created and curated financial market data, built ML models to generate trade ideas and directly monetized these trades through their own funds. They did not sell their competitive advantage as software to other hedge funds.
The second iteration (‘00s-’10s) was in programmatic advertising, which is not dissimilar to quantitative trading. Here, adtech companies captured data about online audiences and modeled their behavior to predict which ads to display in order to optimize their clickthrough rate. While there certainly were many adtech software component providers, the big wins were from full-stack ML companies like Criteo and The Trade Desk.
In the third iteration (‘10s-present), we’ve seen full-stack ML companies directly monetize their predictions in many more sectors. For example, consumer finance companies such as Affirm and Zopa use ML systems to score the default risk of a consumer applying for credit. The companies directly lend against these predictions instead of selling their software to banks to do the same. For online retailers, Signifyd not only predicts fraudulent transactions running through their marketplace but they go a step further by offering financial guarantees if the system’s predictions avail to be incorrect. In customer service, Afiniti routes customer calls to the best-matched support agent and only charges if the call created economic value. In each of these examples, ML systems output predictions that are directly tied to economic value creation.
There is now a fourth category of full-stack ML companies that are bridging the world of bits and atoms. Here, companies are using ML software to discover new local maxima in the solution space of physical products and real-world environments. They offer solutions beyond what human expertise could assemble through trial and error alone. For example, Zymergen is designing and engineering bacterial hosts to manufacture a diversity of novel materials that cannot be made from petrochemical starting points. Optimal Labs is building indoor farming control systems that can operate greenhouses on autopilot to optimize plant growth and financial margins. LabGenius and Recursion Pharmaceuticals are stringing together ML-based analysis of automated experimental biology data with ML-based predictions of drug design and performance to develop novel therapies for human disease. Finally, PolyAI has built a state-of-the-art ML platform for creating conversational agents and is infusing this platform into existing call centers that it owns and operates to drive increasing levels of automation and profitability while ensuring high-quality customer service level agreements.
When does it make sense to go full-stack and what are the benefits?
Building a full-stack ML company clearly presents more operationally complexity than building a traditional ML company, which resembles the pure SaaS play we’re used to today. However, there are many advantages for ML companies that take the leap to be full-stack.
In environments with high experimentation costs such as farming, the customer is very reluctant to use new tools because they are in a highly-leveraged, low-margin business. Farmers are focused on the short-term and have trouble pricing the risk of using ML products that might help them predict crop yields or detect diseases affecting their plants. Being full-stack, i.e. becoming the farmer, allows an ML company to take more of the risks (e.g. financial and reputational) associated with experiments off the hands of the customer. As such, risk is allocated to the entity with greater tolerance and a long-term orientation, i.e. the venture-backed full-stack ML company, and away from the legacy with short-term minded owners/investors.
Overcoming adoption inertia
In problem areas where the ultimate value is derived from making better operational or commercial decisions, full-stack ML products replace on-premise human decision-making with end-to-end ML decision-making. Being full-stack means that one can abstract most of the intermediate decision points that otherwise require significant on-premise human decision-makers and are prone to error and slow operations. Instead, one can focus on the most valuable end decision. Moreover, it is often more difficult to get customers’ operators to adopt new tools than getting your own operators to adopt new tools. The underlying reasons include resistance to change, incentivization, and the ability to hire ‘forward thinking’ operators. Thus, abstracting these adoption barriers helps accelerate the go-to-market.
ML is a relatively nascent industry with rudimentary tooling that is subject to rapid iteration. When this is the case, it’s often more efficient from a value creation standpoint to go the extra mile to become full-stack and control the value chain. To quote Henry Ford: “If you want it done right, do it yourself”. In contrast, when a market is mature, buyers are educated enough to outsource non-core functionality to third parties. In exchange for tolerable subscription fees, the buyer gains operational agility and improved overall product performance. This led to the success of API-first platforms like Twilio (communication), Stripe (payments) or Algolia (search). For ML this is likely to be many years away. Let’s not forget that It took the automobile and computer industry decades to disaggregate their supply chain.
Full-stack ML companies that have control over the end-user relationship become more defensible over time because they are more difficult to replicate. They become the trusted brand and capture a greater portion of the economic benefits they provide, which translates into pricing power over competitors. In contrast, low level, task-based ML products get commoditized or disintermediated over time, which leads to margin compression. For example, customers often start with third party APIs for text, speech, image and video analysis to experiment with product hypotheses. However, if and when these hypotheses have exposed significant opportunity for value creation, these external ML APIs are often disintermediated and replaced with internal software consistent with becoming full-stack.
The value attribution problem
Traditional ML companies typically market their core technology or product as an API or SaaS. Too often this product only solves a sliver of the problem value chain by focusing on what technical teams know best: designing and training ML models to make high-quality predictions. From a value capture perspective, this strategy results in margin compression through technology commoditization. What’s more, this paradigm sees the customer provide their data, knowledge of the business problem and evaluation criteria (i.e. “what great looks like”), as well as the distribution to their end customers or internal users. What does the ML company bring to the table? Put simply, this would be akin to “better maths”. To make this strategy even more tenuous, ML companies often tackle prediction problems that verge on the core competence of their customer, e.g. client risk scoring for banks that underwrite credit. Unless the customer is sophisticated to the extent of understanding how to implement such technology into their end products, these companies fall short of their full potential and make purchasing decisions harder.
It’s full-stack ahead!
The internet has now matured to the point where the low-hanging opportunities that can rise to products packaged as consumer, SaaS or marketplaces are few and far between. New opportunities for value creation are emerging in what some refer to as the traditional economy (e.g. industry, pharmaceuticals, and agriculture). Here, ML companies have a big opportunity to create category-defining businesses if they abide by the full-stack mantra. Existing players in these traditional markets are not used to buying the software over the internet and often do not have cost-effective (i.e. scalable) distribution channels. It is full-stack ML companies stand a far more compelling chance at kicking out incumbents and installing themselves as long-standing owners of their space.
Thanks to Moritz Müller-Freitag, Torsten Reil, Dave Hunter, Davíd Markey, Ian Hogarth, Shubho Sengupta, Nikola Mrkšić, and James Field for critical discussions on this topic and/or comments on this piece.