*Thanks to Edgar Pavlovsky, buffalu, Walter Li, Victor Xu, Matt Dobel, and Nick Cannon for feedback and review*

There have recently been a lot of questions about things like, “how do we know a risk model works?” when its outputs (e.g., parameter recommendations) are used in an onchain protocol:

This question has existed in quantitative and non-quantitative finance for many years, and coming to a clear, clean answer is quite hard. I’m going to spend some time describing what the “ideal” way to measure these models is and then slowly add practical considerations to show that sometimes, you can’t perfectly measure a model’s performance. If you make it to the end of this series of posts, I hope you take away the following ideas:

- Economic risk always has some inherent subjectivity (especially when compared to smart contract risk).
- That subjectivity, however, can be quantified, and with a lot of data and careful modeling, it can be interpreted and explained to users.
- We’re not at the point where it can be done autonomously onchain, but zero-knowledge proofs and fully homomorphic encryption will accelerate us to the point that this is possible.

### What is a risk model?

In quantitative finance and other multi-agent systems, one usually has to think dialectically about two objects that can be at odds with one another:

**Macroscopic Payoff**: The payoff that the system as a whole realizes when it is used

- Network Effects and Retention in a marketplace company (e.g., Uber, DoorDash)
- Protocol Value-at-Risk and Revenue in DeFi lending
- Net volume minus incentive spend in DeFi trading

**Microscopic Payoff**: The payoff that the agents within the system receive for participation

- Profit for participating within a particular network in DeFi (either as an LP or a lender)
- Costs to a user incurring debt within DeFi
- Net driver/delivery incentives minus costs in a marketplace company

Why do we care about these payoffs? Firstly, the choice of parameters in a protocol depends heavily on these payoffs. As a concrete example, suppose we have a degenerate borrower who is always maxing out their loans. A protocol’s choice of margin requirement (e.g., LTV) impacts both the degenerate borrower’s microscopic payoff *and* the protocol’s macroscopic payoff (which aims to avoid bad debt). Secondly, in multi-sided marketplaces like DeFi, one needs to have some notion of the macroscopic ‘health’ of a protocol — “We’ve issued a lot of loans and can underwrite $50M more of debt while having less than 0.0001% probability of leaving lenders with bad debt.” This concept of protocol health, which is represented quantitatively by a macroscopic payoff, has to concurrently be optimized while assuming that selfish players, who don’t care about the protocol’s health, will optimize their individual microscopic payoffs.

One can think of a risk model, in its simplest form, as an optimization problem involving a macroscopic payoff, a microscopic payoff, and a set of parameters that can be modified (e.g., via governance). Optimizing the model corresponds to choosing parameters that simultaneously maximize both payoffs subject to some realistic constraints (e.g., liquidators aren’t altruistic and won’t lose money just to increase protocol health). The model has to capture both game theoretic risks that come from protocol health being antipodal to degenerate user behavior as well as inherent stochasticity that comes from how users change their behavior over time.

This means that the risk modeler creating these models needs to be able to think in the *dialectic*. They need to be able to optimize protocol parameters in a manner that can *simultaneously* optimize the macroscopic payoff of a protocol and the microscopic payoffs of the network participants (borrowers, lenders, arbitrageurs, liquidators, etc.). This means they have to be able to imagine the rewards, incentives, and risks that each participant faces and construct a simple yet interpretable quantitative metric that measures a user’s utility. In particular, any measurement of how successful (or not) a protocol parametrization is with regard to mitigating economic risk will depend on how the modeler describes these payoff functions and adapts them as market conditions change. Having simple yet interpretable payoffs is preferable to complex payoffs because it is easier to figure out what to change when market conditions change if there is some notion of interpretability (see footnote 0).

### Principal-Agent Problems

In traditional finance, this type of joint optimization is described via principal-agent problems: a principal supplying capital has a payoff that they want to be optimized by a set of agents. A simple example in DeFi is that of a constant function market maker, such as Uniswap: a liquidity provider is a principal whose capital is utilized by agents such as arbitrageurs, noise traders, and swappers with information. To achieve an optimum payoff, the principal needs to incentivize the agents. Then the question becomes: is there a **stable** equilibrium where the principal pays the agents (e.g., provides incentives, such as the ones we see in crypto), and the agents can act selfishly, maximizing their own payoff? Classical economic theory, including the 2016 Nobel Prize in economics (awarded to Oliver Hart and Bengt Holmström), shows that if there is sufficient structure and information shared amongst principals and agents, then such an equilibrium exists.

Before we continue into how principal-agent problems are related to risk, I want to cover one crucial thing about principal-agent problems. You might naturally ask the following question about the setup so far: why does the principal need to incentivize an agent? Can’t they just take the action themselves? In principal-agent problems, it is assumed that there is *information asymmetry* between principals and agents. You can think of this as the principal not knowing the set of possible actions they could take to optimize their payoff. As such, the agent can collect an information rent in the form of a payment for providing the correct action to take. This type of relationship shows up in many cases: limited partners in a hedge fund or venture fund generally do not know the set of possible investments (or actions), whereas general partners do. As such, the limited partners are willing to pay the general partners a management and performance fee to access those investments. Similarly, passive liquidity providers (LPs) in a protocol like Uniswap pay an information rent to arbitrageurs to convey the Binance price to them via CEX-DEX arbitrage.

One can view most DeFi and consensus protocols within the cryptocurrency space from the lens of a principal-agent problem. Unlike classical principal-agent problems, where there is a single principal (e.g. a limited partner in a fund, a bank, etc.), decentralized systems have multiple principals and multiple agents. To be decentralized means that there is a protocol serving as a means of coordinating the different principals and agents. For instance, a blockchain holds assets and distributes payments between principals and agents. On the other hand, being permissionless and censorship resistant means one cannot perfectly predict which principals and agents are participating (and hence, what their payoffs are).

Even for simple principal-agent problems, in practice, none of the ‘nice’ properties from classical economic models that make them exactly solvable hold empirically (sorry, Hart, Holmström, and Milgrom!). The goal of risk models, as we defined in the previous section, is to bridge this gap. In particular, risk models capture the following deficiencies endemic to classical principal-agent models:

**Undefined user preferences.** The precise utilities or desires of principals and/or agents in the system ** cannot** be perfectly known — instead, we try to model the inherent idiosyncrasies in their preferences via stochastic processes fit to and rigorously tested against empirical data.

- All classical economics models find an equilibrium between principals and agents because they assume there is a single, known utility function for each type of user — this is clearly not true in practice.

**Dynamic Market Conditions**. When user preferences change concurrently, market dynamics can also change. This means that things like liquidity, market impact, slippage, and liquidation feasibility change in a heteroskedastic manner. Again, classical models assume either a notion of static or quasi-static user behavior so that there are no ‘regime’ shifts. However, the moment that, say, a points system is added to a particular asset within crypto, the incentives and liquidity tendered by users change, leading to a different set of market dynamics.

- Classical economics models assume that the utilities of users never change — but clearly, that isn’t true in crypto markets. For instance, point systems used by, e.g., Jito, Blast, Margin, and Eigenlayer mutate the utility functions for users who participate in these networks due to the random payoff that occurs when an airdrop is realized. Similarly, the $BONK airdrop suddenly changed the utility function for Solana Saga phone owners upon the price spike.

**Uncertain action space.** Every multi-agent system, whether it be one from classical economics or a reinforcement learning that is superhuman at poker, makes one key assumption: the set of actions that users can take is fixed over time. In a permissionless world, this assumption is at best naive and at worst malicious, as users can easily create new contracts or applications. These new applications allow for a new set of user actions that not only interact with existing actions, but can mutate the set of payoffs available to users. The simplest yet most striking example that is unique to cryptocurrencies is the flash loan — the existence of the flash loan implies that you cannot make classical assumptions about the instantaneously available, maximum quantity of capital that a user can use. One has to assume that all participants in a market with leverage — borrowers, lenders, liquidation, arbitrageurs — can have arbitrarily large liquidity. In many ways, this is why flash loans are (to me, wrongfully) indicted as the ‘cause’ of exploits within DeFi. It is more that they damage the implicit contract that a developer had with a set of users because the users could ‘change their microscopic payoff’ by using another contract.

- In classical economics, this could be referred to as having an ‘unknown type space’
- In reinforcement learning, this is often referred to as ‘off-policy learning’ (see footnote 3)

The point of a risk model is to make a best attempt at understanding how equilibria (and optimal payoffs) that occur in practice arise, given that these inherent uncertainties exist. This means making inherently **stochastic** models of reality that are constantly being retrained on live data — accurate real-time data is critical in a space where billion dollar attacks can happen in a single block at all times of the day.

### Regret Minimization

A natural question to ask is how to measure whether a risk model works. The main goals are to measure if the model can capture worst-case events while ensuring high level protocol goals such as revenue or usage maximization hold. One usually measures the quality of a model via an **objective function** \( \mathcal{f} \) that captures this trade-off between risk and reward. For instance, an objective function (written in words, not math) might be, “what is the total revenue made by the protocol minus the total amount lost in liquidations”? This objective function, while inherently interpretable, is missing something: it doesn’t make any predictions about ‘future’ risk. Instead, it is strictly backward looking and has no predictive power about future losses or earnings.

This trade-off between objectives that are made in hindsight versus objectives that predict something about the future is often termed ** regret minimization **(see footnote 4). To be a bit more formal, we’ll first have to define two mathematical objects:

- \( \mathbf{A} \subset \{0,1\}^d \): A finite, discrete action space. One can think of \( \mathbf{a} \in \mathbf{A} \) as a call to a contract or program function with a particular input transaction and the value \( \mathbf{d} \) as the maximum size of input within a blockchain
- \( \Theta \subset \mathbb{R}^n \): A continuous parameter space that represents the parameters that a protocol can change via multisig, governance, or an autonomous mechanism

In machine learning and statistical learning theory, the ** regret** of an objective function \( \mathcal{f} : \mathbf{A} \to \mathbb{R} \) that maps a pair of an action (taken by users) and a parameter (set by the protocol) \( (\mathbf{a}, \Theta) \in \mathbf{A} \times \Theta \) to a payoff, is the difference between the best in hindsight outcome \( \mathbf{a}^+ \) and the outcome realized in practice

**\(\hat{a}\)**

#### \[\mathsf{Regret}(\hat{\mathbf{a}}, \theta) = \max_{\mathbf{a} \in \mathbf{A}} \mathcal{f}(\mathbf{a}; \theta) - \mathcal{f}(\hat{\mathbf{a}}; \theta)\]

where 𝛉 is the set of parameters (e.g., margin requirements, interest rate curves, etc.) that a protocol uses. In words, the regret finds the best action you could have taken and compares it to the one you do take.

This notion of regret, however, assumes that we know the set of actions **\( \mathbf{A} \)**. As mentioned above, the permissionless world does not allow one to know the set of actions precisely — instead, we know a set of *outcomes* (payoffs) and can only measure if an action leads to that outcome with some probability. This means that real data (from live blockchains) only tells you things like distributions over **\( \mathbf{A} \)** (e.g., What is the probability that the swap function is called? What is the average size of a trade that uses this function?) and **\( \Theta \)**. We provide some formalization of this in footnote 5, but I promise you can make it through this post without understanding the technical nuances.

Given such probability distributions, however, one can construct approximate notions of regret that account for our lack of knowledge of the true action space (see ). These notions of regret can provide a protocol with a means to change its parameters 𝛉 such that they minimize regret across all ** known** actions and is a best response in hindsight. In fact, one can view standard Value-at-Risk (VaR) or conditional Value-at-Risk (CVaR) models from traditional finance as a form of regret minimization for a particular 𝒇 (e.g. the Markowitz mean-variance 𝒇)

tl;dr: If we can do the following steps, we can convincingly reduce the risk in a protocol:

- Estimate the set of outcomes that our protocol can have
- Construct a mapping from the set of
**possible**andactions and our protocol’s parameters to a set of outcomes, \( \mathcal{f}(\mathbf{a}, \theta) \)*known* - Estimate an easy to compute,
**probabilistic approximation**of regret (such as a VaR or CVaR) that can be computed easily and is interpretable - Choose parameters \( \theta^+ \in \Theta \) that minimize our probabilistic regret with high probability

### Simulation: Estimating these Probabilities

All of the above steps rely on estimating probabilities in many different shapes and forms. The problem is, however, that we are presented with a protocol as a sequential state machine and not a probability distribution. However, if we can construct a set of inputs and a representative set of environments (e.g., user balances, user state, offchain prices, liquidity, etc.), we can run many simulations against the state machine to get an estimate for all the probabilities involved. Sounds easy, right?

Wrong! There are a lot of obstacles that make this hard:

**Convergence Rate is too Slow:**Sometimes, there isn’t enough data to converge an estimate of a probability distribution — especially in scenarios where the actions taken by users are far from independent.**Regime shifts:**If the environment changes dramatically (e.g., Binance shuts down and suddenly arbitrage is much more difficult), then our probability distribution itself changes. This means we need to identify changepoints, a notoriously difficult problem in Bayesian statistics, to know when to ‘resimulate’ against particular data.**Wrong set of environments:**Our regret is also dependent on the environments that we stress test again. If we are overly optimistic and miss “worst-case” environments, we may underestimate regret dramatically. On the other hand, only optimizing for worst-case regret might lead to there being no admissible 𝛉⁺ that can work in practice. Finding a balance between “average case” and “worst case” is one of the predominant difficulties that one faces when trying to be ‘optimistic under uncertainty’ (a common theme from multi-armed bandit problems that resemble discrete risk models).

The hard work from those of us who do rigorous risk modeling, like Gauntlet or Block Analitica, versus purely heuristic risk modeling involves being **extremely careful** about all of these conditions. It is easy to YOLO this, especially if you’re not statistically minded and rush through the testing process (see footnote 6).

Given that “setting” a parameter is easy to do (it's a single function call!), it might seem like the work of optimizing parameters in protocols according to a risk model is “easy”. But it is also easy to fall into a black hole of despair (especially in an emergency response or hack scenario) if you aren’t extremely careful — like a security auditor — about these steps. Not being statistically careful about risk modeling is similar to someone who gets a security audit from ChatGPT — it will find the easy stuff and tell you you’re A-OK without understanding the nuances of your particular application.

### Conclusions + Looking ahead to Part II

In this post, we’ve looked at some of the background needed to think about risk in DeFi. We tried to define the *ideal* models of classical economics, and demonstrate where they fall flat. We then focus on looking at stochastic risk models as a means of providing practical means for getting around the failures of these idealized models. Instead of giving a precise formula, we focused on the highest level description of what a risk model is supposed to do: minimize regret in hindsight. We argued that the simulation and machine learning tools that risk managers such as Gauntlet use exist to estimate some of the unknown probabilities needed to minimize regret in practice (and especially during emergency responses to hacks!).

But we still didn’t address buffalu’s main question: How does this avoid ‘trust me bro’? What we’ve described is ** how** risk models are made as opposed to

**if the output of a risk model is**

*how to verify***actually**minimizing regret. While there are deep epistemological concerns with regard to ‘what does verification even mean here?’, there are several practical considerations too. Some of the main difficulties, even if the model were perfect, are:

**Computational Complexity:**The complexity / compute used for simulations and data ingestion are closer to that of training a neural network as opposed to the inference of a neural network. This means it's infeasible to do onchain, at least for now.**Privacy of Parameter Updates:**One key thing to note is that if there is a deterministic, public algorithm (e.g., a PID controller), then an attacker or even just a protocol health misaligned actor, can simulate configurations to find an attack against the mechanism. This is not a hypothetical issue, as one saw, at times, borrowers overloading positions on Fraxlend during the CRV downturn in 2023 because of its PID mechanism . As such, having a means (such as differential privacy or fully homomorphic encryption) for allowing the final parameter 𝛉⁺ to be chosen without revealing the entire process for how 𝛉⁺ is more important in decentralized environments than centralized ones.**Dynamic Updates**: Models are constantly changing as market conditions adjust — probability distributions, utilities, and everything in between are constantly being re-estimated as large events occur. This means that verification not only relies on knowing that a single risk model output is correct, but the choice to change to a new model or refit an existing model (e.g., due to a regime shift) is also correct.

These aren’t insurmountable problems, and we’ll give some ideas for how new technologies in both zero-knowledge proving systems and machine learning can help the verifiability of risk model outputs become ever closer to the verifiability of cryptographic outputs.

### Footnotes

**[0] **This is one reason that high-capacity models, such as neural networks or LLMs, tend to perform poorly on dynamic and heteroskedastic data — they take a lot of effort to retrain when the data generating distribution fundamentally changes. Think about ChatGPT ‘hallucinating’ when it has to reason about events that occur after its last training data existed — the same problem occurs within risk models. Having an interpretable model allows you to ‘course correct’ this type of hallucination, especially during emergency response situations (which are sadly too common in crypto). As an example of how simple models can help in an emergency, see this talk that I gave at the DeFi Security Summit in 2022 about an incident (that ended up ok!) in Aave. I should also note that the interpretability of statistical models is a hot topic within epistemology and philosophy (so spoiler: there’s no logically consistent definition of interpretability)

**[1] **Much of the structure needed involves conditions for linear program tractability, as many single principle, single agent problems can be formulated as mixed integer linear programs. If you’re interested in this and would prefer a discrete math/computer science/optimization perspective on the problem (versus the continuous math point of view that economists take), I recommend *Simple versus Optimal Contracts* by Dütting, et al.

**[2] **I wrote an old note (February 2022! We were all so naive then!) on how one can think about things like point systems or retroactive airdrops as exotic options here. If you’re interested in resurrecting this for modern point systems, feel free to reach out!

**[3] **Technically, off-policy learning is more about learning actions to take that aren’t consistent with a known policy \( \pi : \mathbf{S} \to \mathbf{A} \)**,** which is a map from a state space \( \mathbf{S} \) to an action space \( \mathbf{A} \). But there exist off-policy methods for unknown action spaces (e.g., this or this for examples)

**[4] **Technically, in the strictest definition, regret minimization has to do with a decision framework for evaluating choices made. But as ‘regret minimization’ has become used colloquially in a number of different fields (economics, computer science, machine learning), its definition has expanded (including the one we present here)

**[5]** We have two forms of uncertainty here: uncertainty about the payoff \( \mathcal{f} : \mathbf{A} \to \mathbb{R} \) and uncertainty about the action space \( \mathbf{A} \). If we assume the action space is finite and discrete, then we can try to formalize two notions of regret that doesn’t depend on an explicit action space. First we suppose that there exists \( \mathbf{d} \in \mathbb{N} \) such that \( \mathbf{A} \subset \{0,1\}^d \). This says that the number of bits that it takes to represent the set of actions is \( \leq \mathbf{d} \) Next, given an action space \( \mathbf{A} \), we consider the set of payoffs that is bounded by some constant \( \mathbf{L} \in \mathbb{N} \)

#### \[ F(\mathbf{A}, L) = \{ \mathcal{f} : \mathbf{A} \rightarrow \mathbb{R} : \mathcal{f} \leq L \} \]

One can think of the value 𝐋 as the maximum budget a user has (e.g., the maximum amount they can flash loan). We can now define two notions of regret for this scenario: max regret and average regret. Max regret looks for the absolute worst case regret that we can find:

#### \[\mathsf{MaxRegret}(\hat{a}, \theta, L) = \max_{A \subset \{0,1\}^d} \max_{\mathcal{f} \in F(A, L)} \left( \max_{a \in A} \mathcal{f}(a) - \mathcal{f}(\hat{a}) \right)\]

On the other hand, we can define the average regret as

#### \[\mathsf{AvgRegret}(\hat{a}, \theta, L) = \mathsf{E}_{A \subset \{0,1\}^d} \mathsf{E}_{\mathcal{f} \in F(A, L)} \left( \max_{a \in A} \mathcal{f}(a) - \mathcal{f}(\hat{a}) \right)\]

where the expectations assume a uniform distribution. In this scenario, the max regret is extremely pessimistic as it finds the absolute worst function and action space, whereas the average regret can sometimes be too optimistic. In practice, most people measure probability distributions over \( \mathbf{A} \), \( F \), \( \Theta \) and then estimate approximations to different restrictions of the regret (e.g. VaR) to these distributions.

**[6]** One very important thing is to be able to oscillate between high levels of rigor and optimization when there is time and being able to respond quickly when there are large events. Our analysis of the USDC deviation from par around the Silicon Valley Bank incident from earlier this year exemplifies the dialectic one has to take in incidence response.

**[7]** We also study direct maximal extractable value attacks against PID interest rate curves here.

Blog