Structuring the Unstructured: A Human + AI Approach to Private Capital Data
Updated: Jan 6
Unlike in public markets, where highly-regulated electronic exchanges handle millions of transactions each day, private capital market transactions occur entirely off any common platform. Deals come in the form of lengthy, complex legal agreements loaded with bespoke terms and provisions.
That type of unstructured data requires a sophisticated and rigorous data extraction process that to date has been too costly or complicated to implement. At Aumni, we took on that challenge. Using a combination of artificial intelligence techniques and a robust multi-layered manual review process, we believe we have engineered an industry-leading solution.
Modeling the Venture Capital Industry
Aumni’s approach centers around extracting information from the source legal contracts underlying today’s venture capital transactions. While the nature of transactions can vary from deal to deal, there are common features that are becoming standard across the industry thanks to initiatives like the Model Legal Document Library by the National Venture Capital Association (NVCA).
That standardization is a great starting point, but ultimately it took a massive effort to identify and prioritize the great breadth and variety of economic and legal language found in venture deals. We worked with in-house domain experts and outside advisors from prominent institutions in venture such as Wilson Sonsini, DLA Piper, Latham & Watkins, Silicon Valley Bank, DCVC, NEA, the National Venture Capital Association (NVCA), and many others to painstakingly reproduce structured, virtual representations of nearly all modern venture capital investment vehicles and transaction types.
Incorporating AI and Human Intelligence
Our team of legal experts intakes the legal documents underlying each transaction and extracts economic terms and in-depth legal provisions through a combination of natural language processing (NLP) AI, tabular data perception AI, and manual human review.
We use a hybrid AI and manual review process to reduce errors (because AI and humans tend to make mistakes that are non-overlapping in nature) as well as regularly refine our data models and validation processes as we see new edge cases.
Introducing: The AumniSphere
Once our hybrid human and AI process extracts the data, it enters our proprietary data analytics engine, the AumniSphere. The AumniSphere performs three major functions: Validation, Enrichment, and Analysis.
The Validation process continuously monitors our entire dataset using a sophisticated rule-based reasoning approach to find discrepancies. Working with our team of experts, we developed algorithms to identify and triage data anomalies that can indicate inaccurate data or errors that remained in the signed agreements. To date, this process has identified over 700 such legal discrepancies resulting from material errors in our customers’ documents.
The Enrichment process transforms the raw, granular data found in the source legal documents into more useful, derived data points. Much of this process involves aggregating a large number of disparate data points from multiple documents in order to build a more holistic picture of an investment or portfolio company. For example, we can combine our summarization of a company’s protective provisions with ownership information to present potential voting coalitions.
The Analysis process runs further calculations on the enriched data to provide answers to higher-level questions: How has my fund performed so far? How often have I worked with this particular co-investor? How have we dealt with protective provisions across our other transactions? What is the average option pool percentage for deals this size in other industries?
How Aumni’s Approach Impacts the Future of Investing
Because of Aumni’s economies of scale and automation, we are able to present near real-time analysis of granular economic and legal data. What might have otherwise taken a team of analysts weeks to answer, a user can do in just several clicks.
In collaboration with our users, we are discovering new uses for these capabilities almost daily. For the first time, users can access deep insights across their portfolio on datapoints that were previously buried in a web of unstructured data. As we continue to scale our processes, data models, and capabilities to accommodate ever-increasing demand and complexity, we expect to continue be a critical influencer of our users' investment decisions.