Catala: A Programming Language for the Law
5 min read

Catala: A Programming Language for the Law

Catala: A Programming Language for the Law

Merigoux, D., Chataing, N., & Protzenko, J. (2021). Catala: A Programming Language for the Law. arXiv preprint arXiv:2103.03198.

Being able to digitize and automate the law is one of the holy grails for many computer scientists. But it is a difficult goal.

As software is eating the world, more and more of our daily life becomes digitized. Abstract concepts such as communications, trust and money have all been digitized to some extent. There are many areas which seem ripe for disruption. I believe this is partly why smart contract have attracted so much investment capital by investors: the term seems to promise the automation of law and governance. However, on closer inspection full automation of the law is still impossible because the law operates for a large part on subjective evaluations by judges and the entire legal industry. Such subjective evaluations may only be resolved once computational reasoning has progressed to a point where it is indistinguishale from human reasoning.

Today's paper still makes a very meaningful contribution by focusing on a subset of law and a very specific and real problem: reducing the complexity of translating law to software.


What specific issues exist in lawmaking?

The lawmaking process, like almost all processes, is already partially digitized: lawyers make use of Microsoft Word, have discussions via Zoom and use software to calculate your tax burden. This last topic creates especially large problems and overhead.

Imagine that you are in charge of building software which calculates how much income tax someone has to pay. You will have to 'translate' lawspeak to software, which requires highly complex domain-specific knowledge of both law and software. Spotting inconsistencies is incredibly difficult work:

"In practice, lawyers of the legal department take the input set of legal statutes and write a detailed natural language specifcation, that is supposed to make explicit the different legal interpretations required to turn the legal statute into an algorithm. Then, legal expert systems programmers from the IT department take the natural specification and turn it into code, often never referring back to the original statute"
"Tax law provides a quintessential example. While many of the implementations around the world are shrouded in secrecy, the public occasionally gets a glimpse of the underlying infrastructure. Recently, Merigoux et al. [31] reverse-engineered the computation of the French income tax, only to discover that the tax returns of an entire nation were processed using an antiquated system designed in 1990, relying on 80,000 lines of code written in a custom, in-house language, along with 6,000 lines of hand-written C directly manipulating tens of thousands of global variables."

Can we digitize any part of these procedures?

In the software world, a solution for complex diverging implementations has been known for a long time: use a single domain-specific language which can be read both by humans and computers. An example known by many programmers is OpenAPI, which allows you to simultaneously build software ánd rigorously document how others can interact with it in a single way. The innovation I'm discussing today achieves the same thing for the law; you can simultaneously build tax software and define the corresponding specification:

"we observe that the benefts of formalizing a piece of law are the same as formalizing any piece of critical software: numerous subtleties are resolved, and non-experts are provided with an explicit, transparent executable specifcation that obviates the need for an expert legal interpretation of implicit semantics."

Remember that this does not aim to replace or digitize the entire law! We should only digitize portions of the law which can be completely objectively evaluated. There are a number of promising areas to look at:

"Examples of computational law include, but are not limited to: tax law, family benefts, pension computations, monetary penalties and private billing contracts. All of these are algorithms in disguise: the law (roughly) defines a function that produces outcomes based on a set of inputs."

Still, translating a piece of law into software creates challenges:

  • most laws are incredibly convoluted as they have been iterated and adjusted over centuries, they make the typical spagetti code look nice given the prevalence of exceptions, exceptions of exceptions, and so forth.
  • most laws are written with a very weird syntax, a.k.a. "legalese", which is even different in each country and language.

To give you a taste of this complexity:

"In France, the military’s payroll computation involves 174 different bonuses and supplemental compensations. Three successive attempts were made to rewrite and modernize the military paycheck infrastructure; but with a complete disconnect between the military statutes and the implementation teams that were contracted, the system had to be scrapped."

Putting Catala into practice

A first look at section 121 of the U.S. tax law

This section indicates that profits from selling your house should not be taxed. In lawspeak:

(a) Exclusion
Gross income shall not include gain from the sale or exchange of property if, during the 5-year period ending on the date of the sale or exchange, such property has been owned and used by the taxpayer as the taxpayer’s principal residence for periods aggregating 2 years or more.

Translating this to software

The new programming language Catala allows you to write a machine readable specification of the above text which can directly generate software to evaluate people's tax liability. Here is a small section of the Catala version (if you are not familiar with reading software, feel free to reach out for clarification):

1   declaration structure Period:
2     data start content date
3     data end content date
4
5   declaration structure PersonalData:
6     data property_ownership content collection Period
7     data property_usage_as_principal_residence content collection Period
8     ...
9
10  declaration scope Section121SinglePerson:
11    context gain_from_sale_or_exchange_of_property content money
12    context personal content PersonalData
13    context requirements_ownership_met condition
14    context requirements_usage_met condition
15    context requirements_met condition
16    context amount_excluded_from_gross_income_uncapped content money
17    context amount_excluded_from_gross_income content money

The first three lines declare a Period - defined by a start and end date. The next section declares PersonalData. Keywords such as "date" and "money" make it easy to unambigously refer to a particular well-defined concept. The generous use of keywords are meant to improve readability for legal professionals who do not have a software background. Moreover, Catala has some unique keywords such as "scope" and "context":

"Line 10 declares Section121SinglePerson, a scope. A key technical device and contribution of Catala, scopes allow the programmer to follow the law’s structure, revealing the implicit modularity in legal texts. Scopes are declared in the metadata section: the context keyword indicates that the value of the field might be determined later, depending on the context. Anticipating on Section 4, the intuition is that scopes are functions and contexts are their parameters and local variables."

Various optimizations ensure that large pieces of law can be generated or calculated from a Catala specification. For example, Catala makes use of default logic:

"Default logic which can express facts like “by default, something is true”; by contrast, standard logic can only express that something is true or that something is false"
"rather than having an arbitrary priority order resolved at run-time between various rules, we encode priorities statically in the surface syntax of the language, and the pre-order is derived directly from the syntax tree of rules and definitions."

The road before and ahead

The results of today's discussed paper build further on various branches of research, including:

  • Extracting logical essences from code was started already in 1914 by John Dewey.
  • In 1956 Layman Allen notes that symbolic logic can be used to remove ambiguity in the law.
  • In 2018 Sarah Lawsky continued building an insight on the logical structure of legal statutes.

However, logic programming has not been widely used in the lawmaking process yet. Instead, what is common today is lawyers using Rule Engines like Drools to write specifications of the law for programmers, after which programmers often still have to make a final translation.

Based on a comment on the Hacker News website, I learned that the tax authority in the Netherlands has already been using a domain specific language for the specification of their legislation using JetBrains MPS. Catala offers a promising alternative language to achieve the same goals with more rigor.

Already after translating just a single social security law (consisting of 27 articles) to Catala, the authors were able to find and fix a bug in OpenFisca's French social security calculator. You may think they were lucky, but I think there's a lot of dragons left to be found in our law texts and that Catala has a bright future!