The complexity of rules
Summary: In order to get a grip on the costly complexity of the rules governing us, we should learn how to measure it. Different measures exist. By modeling rules as decision trees, we can leverage measures such as the "Vapnik–Chervonenkis dimension" and "Kolmogorov complexity".
"Life used to be simpler" - most old people I met
Life around us is full of complexity. Often, it seems to me, it keeps on increasing. While the economy keeps on growing, we have more things to choose from than ever before and the rules that govern us are multiplying. Many aspects of everyday life have increased in complexity (though certainly not all, look at pop music). Some anecdotal numbers from legal researcher Gillian K. Hadfield to back it up:
"The 1887 Interstate Commerce Act, for example—the first major piece of federal legislation in the United States … [was] a grand total of seven pages long."
While 150 years later even the legal documents of individual companies are larger:
"the average length of the user agreement was even longer—closer to three thousand words—which is about eleven pages of double-spaced text. Adobe Reader’s license agreement topped the scale at 9,313 words, or about forty pages of double-spaced text."
Just imagine that there may have once been a time where we could actually read the Terms and Conditions! Or god forbid, that we could have answered truthfully that we read and understood them. Given two pieces of regulation which have the same purpose; the simpler piece is likely to be less costly to create and interpret. In order to get a grip on the complexity of regulation, we should learn how to measure it.
How can we measure regulatory complexity?
A very crude measure to start with when measuring regulatory complexity: look at the number of words in a particular legislative proposal. A very easy method, yes, but also shallow. For example, this doesn't take into account that some pieces of legislation contain more information than others. Two measures which capture this dynamic are Kolmogorov complexity and Vapnik–Chervonenkis (VC) dimension.
The Kolmogorov complexity of an object is defined as the minimum length of a computer program which can produce the object; you can imagine we could reduce a certain large piece of text to the number of variables involved (for example, the height of the fine of your speeding ticket might depend mainly on your age and speed).
The VC dimension of a function is a measure of resistance to overfit; i.e. how much a function is optimizing just for a particular situation. According to Wikipedia, the VC dimension is: "a measure of the capacity (complexity, expressive power, richness, or flexibility) of a set of functions that can be learned by a statistical binary classification algorithm". In more intuitive terms, VC dimension is: "roughly, the size of the largest set of situations for which we can turn the knobs in a particular way to achieve any particular set of outcomes".
The complexity of a speeding ticket
That's all very abstract, so it's time to look at a toy example: how complex is the legislation of getting a speeding ticket? Before we can measure the complexity of a piece of legislation, we have to find a way to model the legislation. Decision trees are a great way to do this, as they come prepackaged with great visualizations. You can imagine a simple decision tree which determines the height of the fine given to a person who is driving too fast on the highway. Round nodes represent decisions, square nodes represent outcomes, and the branches represent the path how we achieve a particular outcome based on a set of decisions.
To determine the size of the fine to pay, first we have to check whether the person is above 18 years old. Is the person below 18? They lose their license, no mercy! Is the person above 18 years old? Well then of course they know what they are doing and they get away with a fine! Depending on how much the person surpassed the maximum speed, the right fine can be determined. In real life, dozens of other factors may explicitly or implicitly be taken into account by the police and judges to determine the right applicable punishment.
Some easy measures which come to mind to measure complexity of this decision tree are: the amount of variables, the depth of the tree, or the number of possible outcomes. Let's see what other measures have to say about the complexity.
VC dimension of decision trees
Aslan, Yildiz and Alpaydin (2014) write that:
The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree.
The formula which this paper determines through a number of simulations is:
V = 0.7152 + 0.6775V_l + 0.6775V_r − 0.6600 log d + 1.2135 log M
V_l = VC dimension of left subtree
V_r = VC dimension of right subtree
M = the number of decision nodes in the tree
d = the number of input features in the corresponding dataset
Kolmogorov complexity of decision trees
We can also measure the Kolmogorov complexity of decision trees, based on this paper by Goutam Paul (2004). Instead of focusing on the number of parameters of the model (so that more decision nodes imply a higher complexity), they measure complexity based on the range of possible outcomes. They find that an increase in decision nodes can still lead to a decrease in complexity of the outcome space.
Practical notes on applying these measures
The more I tried to apply to uniformly apply these measures, the more I noticed how complexity is a very context-dependent exercise. If the level of complexity of different rules are to be measured effectively, decision trees should be made in the same way. I'll mention three examples:
You can increase the number of variables (and reduce the number of assumptions) in the decision tree by taking into account e.g. the country, the job of the driver and even the situation. An ambulance driver in the middle of the desert bringing a pregnant woman to the hospital probably won't be given a very high fine.
Large decision trees can be "cut up" in many different combinations of subtrees and such configurations will all lead to different results.
Variables and outcomes can be defined in many different ways. Speeding fines in the Netherlands are noted in a 39 row table. Should we make 39 decision nodes with corresponding outcomes with a particular exact fine, or is "look at the table to see your fine" a valid outcome of the decision tree?
Therefore, the best way in which we can determine the complexity of a piece of legislation is to compare different situations and to ensure we apply the model as consistently as possible. We can measure the difference of complexity of different pieces of legislation, or even the different complexity of a single set of rules over time. Moreover, note that the VC dimension and Kolmogorov complexity still measure subtly different things. Depending on your aim or use case, you may actually need one or the other, and you can't say that either is a better measure.
The total burden of rules
So far, we only looked at single decisions or single laws. However, a final perspective which the aforementioned model offers, is that it allows us to take a look at the cumulative level of complexity of multiple laws. When different laws use similar variables (decision nodes), decision trees can be merged. If we want to know the cumulative complexity of all legislation, we can build up a giant new decision tree by taking an agent's entire relevant state (age, speed, income, etc.) as decision nodes!
A small set of simple rules can shape our behavior substantially. One can visualize this by looking at L-systems, which can create greatly varying visualizations based on a small set of adjusting parameters. However, too many laws may create the risk that our behavior becomes overly constrained, and that the rules become "overfitted" for a particular situation. In terms of our previous example: complicated rules regarding speeding fines may cause people to find loopholes or to avoid driving altogether.
The world is full of complexity, some of which I hope we can learn to manage better. We can start by measuring what the level of complexity of legislation actually is, using some of the techniques presented in this article. But our task doesn't end there. Perhaps most importantly, we should evaluate the costs and benefits of such complexity, which I will leave for future investigation. I wouldn't want to risk making this article too complex.