Whether you fear or celebrate big data likely depends on your background, biases, experiences, and, perhaps most importantly, which systems you imagine the data to be benefitting. Benjamin Alarie, Anthony Niblett, and Albert Yoon’s recent paper falls squarely on the celebrate side of the debate—at least in the context of tax administration—and persuasively invites the reader to join them there. In this brief essay, the authors explore how tax agencies and taxpayers can harness data analytics and machine learning to improve tax administration for both government and taxpayers.
For government, data analytics can narrow the tax gap by improving fraud detection. Specifically, tax agencies can mine taxpayer data to predict noncompliance ex ante, rather than uncovering the noncompliance ex post via audit. Such predictions can inform resource allocations, allowing tax agencies to shift resources to high-risk sectors and companies. Augmenting taxpayer data with information from other government agencies would improve these efforts.
The authors mention several concerns, including the risk that reliance on existing data might calcify biases reflected in the data. It is worth spending a moment to consider what such calcification could mean. EITC audits provide an obvious potential flashpoint in the U.S. Imagine that auditors have been more likely to target families of color for EITC audits, and further that those families have worse outcomes upon audit, but not because they are more likely to commit fraud. These taxpayers may fare worse due to language barriers, racial biases, informal care arrangements that are difficult to substantiate, or lack of legal representation. Despite such nuances, the data might merely show that certain characteristics, here race, are correlated with worse outcomes. Predictive analytics might flag taxpayers with these characteristics as “high risk,” subjecting them to heightened monitoring. The authors note that algorithms must be carefully defined to prevent such bias concretization. Indeed, a truly neutral algorithm untainted by prior biases could ameliorate discriminatory outcomes and improve equity. (That is, if such neutrality is possible, which is questionable.)
As Alarie, Niblett, and Yoon explain, data analytics can also assist taxpayers by narrowing the gap in legal interpretation. Using worker classification as an example, the authors describe how legal standards often defy obvious interpretation, resting on many-factor tests and case-by-case analysis. The result is confusion for employers and employees. Worse, because most taxpayers lack access to interpretive tools, such as cases and regulations, they have little guidance in marginal cases.
The authors argue that the solution is not more information for taxpayers, but, rather, systematically processing existing information to provide meaningful guidance. The authors then explain how court decisions and regulatory determinations can form the basis for a dataset that can be used to predict case outcomes. A supervised machine learning algorithm can then use this data to predict the results of novel cases. The authors describe such an algorithm that correctly predicted out-of-sample case outcomes with over 90% accuracy.
This processed information would be a godsend for confused taxpayers. Rather than hiring an attorney to research a miniscule fraction of relevant caselaw, taxpayers could utilize the automated tool to more accurately classify workers. Attorneys could wield automated predictions to apply pressure in settlement negotiations. Even better, assuming such a tool would be relatively inexpensive, it could democratize legal interpretation. Further, if the predictive tool were widely available and a taxpayer still chose to misclassify a worker, substantiating a reasonable cause defense against penalties would be more challenging. This “you-should-have-known-better” legal strategy would enable tax agencies to impose stiffer penalties on obvious scofflaws.
Alarie, Niblett, and Yoon offer thoughtful commentary on a process that seems largely inevitable. However, exciting as it is, the authors’ sanguine treatment of big data somewhat sidesteps the darker side of the debate, spotlighted aptly by Virginia Eubanks’ Automating Inequality. (NYT review here.) Eubanks’ account shows how U.S. state welfare systems’ reliance on data analytics has exacerbated unequal outcomes, led to erroneous denial of benefits, and excluded worthy candidates from support programs because they fail to fit a preset algorithm. While tax provisions rarely affect basic survival, to the extent that they do, these outcomes should give us pause. In the United States, for example, further automating distribution of the EITC and child tax credit should be approached with caution. Although these apprehensions should not discredit an otherwise worthwhile endeavor, system architects should consider them carefully as they harness data to improve tax administration.