Incentivizing Deep Fixes in Software Economies

Malvika Rao, David F. Bacon, David C. Parkes, and Margo Seltzer. 2020. “Incentivizing Deep Fixes in Software Economies.” IEEE Transactions on Software Engineering, 46, 1, Pp. 51-70.

For today's paper review, I'm going to look at a new incentive mechanism. Incentive mechanisms define a method to reward people or organizations based on a particular target. Whenever I learn about new incentive mechanisms, from income share agreements to quadratic funding, I get very excited! These mechanisms offer the potential to change the way how society is organized within the current capitalistic framework. In practice, good mechanisms require that targets are measurable, that the mechanism can't be gamed, and that the mechanism is legitimate to the stakeholders involved. Let's see if today's mechanism holds up to the test!

The problem

All software consists of two things: bugs and features. Bugs are undesired elements, like a button not responding or an algorithm which assigns $12,000 worth of fines to someone who had NULL on their license plate.

In software development, engineers are usually rewarded independent from the type or amount of bugs they fix. Whereas target-based incentives are common in sales, where people's salaries or bonuses are directly tied to how much they sell, most software engineers tend to be rewarded through fixed monthly salaries, equity and the occasional bonus. This is because planning software development is hard, very hard. Most managers are permanently unsure of which bugs to fix, of how long it takes to fix certain bugs, and which unknown bugs may be lurking around. There are certainly ways to tie rewards directly to performance measurements in software development, but at minimum the measurements require constant calibration by intelligent (and therefore expensive) humans.

A combination of top-down and bottom-up planning processes are therefore the norm. However, as every economist will tell you, centralized planning processes have limited information processing capacity, so it is worth looking for ways to decentralize the planning process, for example by introducing a market!

"In fact software systems have come to resemble economic systems, where behaviour is decentralized, interdependent, and dynamic. This suggests that the principles of market design and mechanism design have an important role in complementing traditional engineering techniques. We see this in market-based platforms such as Bountysource [6] and TopCoder [7]. Markets enable us to directly target incentives issues as well as elicit valuations from users so that they can influence the direction of software development. Moreover markets can aggregate supply and demand, thereby providing the scale needed to solve long neglected engineering problems, with prices guiding the efficient allocation of resources."

The fact that a market could exist, does not mean that a market will exist. Take prediction markets as an example, a concept which has been written about for decades while mainstream adoption has been slow. And when a market does exist, like Bountify, they work best when you adhere strictly to the requirements:

"Bountify [10] is also a platform based on crowdsourcing contests. It focuses on coding tasks and questions–the website states, “The best tasks to post on Bountify are those with verifiable solutions. Clearly define your problem and specify the deliverables. Be specific about what you want and ensure that you can verify the correctness of the possible solutions you receive.”

The model

Today's paper characterizes software problems as being either (1) a root cause (2) a bug. Every independent root cause can generate an arbitrary number of bugs,

"We develop a bit string model that captures how a particular root cause can generate bugs, and that encodes the relationships amongst bugs and fixes. Each root cause is associated with a bit string length l > 0. The set of bugs belonging to this root cause are modeled through the set of 2^l - 1 non-zero bit strings of length l. The set of permissible fixes for this set of bugs is modeled as the set of 2^l bit strings, including the 2^l -1 non-zero bit strings that address the bug as well as the zero bit string that models a worker who chooses not to submit a fix (referred to as the null fix). Each bit string encodes a different bug, and similarly a different fix."

Taking the image below as an example, root cause R_1 may cause bug b_k, and its fix f_k is indicated as bitstring: 1. Root cause R_2 may cause bugs b_k, b_k+1 and b_k+2, while its fixes are indicated by the bitstring 111.

Today's paper is modeled using Mean Field Equilibrium (MFE) theory. The paper does an excellent job at summarizing the approach, so I'll do no more than list the clearest parts:

"Mean field equilibrium derives its name from mean field models in physics, where large systems display macroscopic behaviour that is easier to analyze than their microscopic behaviour. MFE have been studied in a variety of settings in economics [40], [41], [42], [43] and control theory [44], [45], [46]."

"As the number of agents grows large, it is reasonable to assume that agents optimize instead with respect to long run estimates of the marginal distribution of other agents’ actions. In particular, each worker best-responds as if in a stationary environment."

"Each worker models the future as though there is an i.i.d. distribution D of fix depths submitted by others, where the set of possible fix depths is {0, ..., l} (including null fixes). In addition, a worker assumes that all fixes associated with a particular fix depth occur with equal probability. This induces a probability distribution over the set of all 2^l possible fixes. Finally, a worker infers the conditional distribution on feasible fixes for a specific bug (noting that only some fixes can be valid)."

At every time step t, workers are assigned random bugs, workers predict what utility they get if they fix the bug, submit fixes, and collect payments. The incentive mechanism which this paper introduces is as follows: if a worker fixes a deep bug, and that fix subsumes any shallower fixes, then the worker's reward increases. This incentivizes workers to find deep fixes rather than shallow fixes.

Contrary to the Linus' law: "given enough eyeballs, all bugs are shallow", today's paper uses different formal definitions for shallow and deep fixes based on how many bugs are fixed (indicated by the length of the previously mentioned bit string):

"Definition 3 (Shallow fix). Given a bug b, a shallow fix f is a
valid fix for which |f| = |b|.
Definition 4 (Deep fix). Given a bug b, a deep fix f is a valid
fix with |f| > |b|."

"The deepest fix not only fixes the bug in question but all bugs of that root cause. [..] In this way, a worker now competes with other workers who fix subsequent bugs on the same root cause. Thus the model captures indirect competition because a later worker might produce a deeper fix for a subsequent bug k, that not only fixes k but also subsumes some other worker’s fix for an earlier bug on the same root cause. To allow for this take-over of earlier payment, the payments in a subsumption mechanism occur over a fixed number of installments over time."

A distinction is made between three variations on subsumption mechanisms:

1. "Eager subsumption performs externally observable checks for subsumption using only the bugs and fixes seen so far."

2. "Lazy subsumption decreases the number of false positives by delaying judgement. Although a check for subsumption is performed when a new fix is submitted by a worker, subsumption relations are finalized only when a root cause regenerates"

3. "Eager with Reuse [...] deviates from the eager mechanism in that subsumed fixes are not discarded. Instead they are kept on a waiting list and reused as is done in lazy subsumption. What this means is that there are time periods when a new bug is fixed via a reused fix and the bug does not enter the market or get associated with a new payment amount"

The performance of these subsumption mechanisms is compared to payout mechanisms whereby workers are simply rewarded for all fixes:

"The different mechanisms give rise to different types of competition dynamic, as shown in Table 2. Workers in the myopic mechanism produce deep fixes at low payment levels only. Beyond these levels workers are unresponsive and simply submit the shallowest possible fix since no competition is involved. The installment and installment with transfer mechanisms stop paying when the next bug appears. Hence workers in these mechanisms must contend with aspects of the environment such as bug generation rate. Because subsumption mechanisms involve competition amongst workers as well as transfers and fix-to-fix comparisons (Table 3), their performance is robust across different environments. [...] Our results show that as environments get more complicated, involving more critical systems and higher development costs, there is a shift in the types of mechanisms that function well according to metrics such as percentage of immediate deep fixes and user utility. In these more complex environments, mechanisms that permit transfers and reuse subsumed fixes perform best."

Assumptions: the good the bad and the ugly

Crucially, the model assumes that root causes and their bugs are independent from other root causes and bugs. In reality, causality is hard to determine, the impact of a fix on a bug is not always clear, and various 'root causes' may be causing the same bugs in software development. Moreover, a bug can be fixed in many different ways. I know from personal experience that there is nothing more satisfying than to rewrite an entire codebase to fix one thing, whereas changing a single comma could have solved the issue at hand.

In real life software development, the moment that a bug is found, it can spontaneously be re-categorized as a feature by speaking the magic phrase: "It's not a bug, its a feature". Joking aside, whether or not a particular piece of software is functioning as desired is not always easy to determine. In practice, all software has bugs, and if we had an unlimited amount of time, developers could work endlessly in order to make a piece of software perfect. It is often a very deliberate decision to stop developing software. Therefore, the mechanisms described today may be best applied for small, highly critical software systems.

Finally, the risk is created that developers start introducing their own bugs just to reap rewards, which is not something you want in a highly critical software system.

Applications

Given the limitations mentioned above, are there situations where we can can still use the logic of the subsumption mechanism in real life? I think that a first next step should be to take a good hard look at a large list of real bugs and fixes, in order to see which assumptions hold up.

A first interesting extension of the model would be to reward not just deep fixes, but also to reward finding bugs at all. In the current model bugs are assigned to developers at random, while in reality finding bugs can be just as hard as fixing bugs. A second interesting extension of the model would be to look further dan just software. Software is eating the world, and all information systems can be described as having bugs, i.e. when a company or institution does not achieve its specified target or displays unwanted behavior. The subsumption mechanism could even be used for companies and governments more generally. However, we would need to take a further step back, and find a nicely compiled list of bugs and fixes in a particular set of institutions!