The Rise of Big Data and its Worrying Implications for Justice, Equity
geralt / Pixabay
The Rise of Big Data and its Worrying Implications for Justice, Equity

What do the Kyle Rittenhouse trial, the Ahmaud Arbery trial, and Charlottesville’s Unite the Right trial have in common? 

In addition to the considerable media attention they have garnered, and proximity to several other parallel social justice issues, these trials have sparked conversations about the role of race in jury selection. 

The rise of big data, which refers to the high volume of data that can be collected using modern technologies, has brought with it a surge in the amount of information about potential jurors that litigants can access in advance of jury selection. From what posts potential jurors share to what news site they trust, litigants are now able to more accurately speculate about a juror’s political leaning and other key data points. 

Speaking about the controversial juries of the three recent trials, Professor of Law at Georgetown University Paul Butler commented in a recent NPR interview, “[I]f you have concerns about the criminal legal system, even if those concerns are evidence-based, sometimes, that’s used to suggest bias. And that disproportionately eliminates people of color from jury pool.”

Big data has the potential to affect not only jury selection, but virtually all legal processes. Its power comes from its ability to supply a machine learning model, also known as an algorithm. Training itself on the data it has been fed, the algorithm can then perform complex tasks such as finding patterns from the past or making predictions of future behavior. The emerging field of data justice explores the intersection between data usage and social justice.

At the 2017 re:publica conference, Europe’s largest internet and digital society conference, Arne Hintz addressed the concerns of collecting massive amounts of data. Co-Director of the Data Justice Lab, Hintz referred to the process of using such data to sort individuals as “datification.” 

“We are categorized according to data assemblages and our rights and obligations are reconfigured according to these classifications,” Hintz said. “Datification changes society, governance, and power.”

Hintz viewed the changes of data dependency as so widespread as to call for a “renegotiation of civil liberties and democracy and the power shift between different forces in society.”

“Self protection and digital rights are important but are not sufficient to address the more fundamental transformation that we’re witnessing,” Hintz continued. “The shift in the fabric and organization of society through datafication which impacts people’s ability to participate in society and some people more than others.”

In an interview with TED Radio Hour, the founder of the Algorithmic Justice League Joy Buolamwini explored how machine learning can have a disproportionate impact on the ability of certain individuals to participate in society. If the computing model is trained on biased data, or if the data reflects existing inequities, the algorithm will learn to replicate the bias and perpetuate that discrimination. 

“What you might have thought should be a neutral process is actually reflecting the biases that [the computing model] has been trained on,” Boulamwini explained. “And sometimes what you’re seeing is a skewed representation, but other times what machines are picking up on are our own societal biases that are actually true to the data.”

Datasets of images of human faces that are supplied to train facial recognition systems illustrate the implications of skewed representation on machine learning. When these datasets contain mostly lighter-skinned faces, as Boulamwini’s research showed, the facial recognition systems will not be sensitive to faces with other skin-tones. 

“What does that mean for surveillance?” Boulamwini asked. “What does it mean for democracy?”

Amazon’s automated hiring tool, which it eventually disbanded, illustrates the implications of societal bias on machine learning. By sorting through resumes of successful former candidates and employees, Amazon’s algorithm was designed to make recruitment more efficient. 

“What it’s learning are the patterns of what success has looked like in the past,” said Boulamwini. “So if we’re defining success by how it’s looked like in the past and the past has been one where men were given opportunity, white people were given opportunity and you don’t necessarily fit that profile even though you might think you’re creating this objective system, it’s going through resumes, right? This is where we run into problems.”

In his TED Talk, Dr. Phillip Atiba Goff explored how data could be used to combat some of the same discriminatory issues it creates. Co-founder of the Center for Policing Equity, Goff discussed how data can be used to ensure the accountability of law enforcement. 

“You’ve got a problem or a goal, you measure it, you hold yourself accountable to that metric,” said Goff. “So if every other organization measures success this way, why can’t we do that in policing?” 

Goff noted that police departments already practice data-driven accountability, but limit this practice to analyzing crime metrics through a system called CompStat. Goff explained that a similar system can be employed to analyze additional metrics, such as demographic data and police behavioral data, that can uncover racial disparities. 

“Now when you define racism in terms of measurable behaviors, you can do the same thing. You can create a CompStat for justice,” Goff said. 

“The goal of these analyses is to determine how much do crime, poverty, neighborhood demographics predict, let’s say, police use of force,” Goff continued. “Let’s say that those factors predict police will use force on this many Black people. There? So our next question is, how many Black people actually are targeted for police use of force? Let’s say it’s this many. So what’s up with the gap? Well, a big portion of the gap is the difference between what’s predicted by things police can’t control and what’s predicted by things police can control –– their policies and their behaviors.”

Data-for-Justice platforms likewise attempt to use data as a tool to advance justice. The Israeli-American startup founded just over a year ago Darrow is one such example. By scraping online data from corporate reports and other publications, Darrow uses machine learning to discover legal violations and hold bad actors accountable for these transgressions. Common violations include privacy breaches, consumer fraud, environmental pollutants, overcharges, and unfair competition. 

“We teach a machine to understand the law, identify harmful actions, and bring them to light and justice,” Gila Hayat, Co-Founder and Chief Technology Officer, said in a recent press release. “This is a huge challenge that the legal system has not yet been able to deal with.”

Employing machine learning is not only faster than traditional methods, but it also results in the discovery of more violations. “Most violations are never discovered,” Hayat said. “The information is available, but the challenge lies in finding it within the vast amount of online data and assembling the legal story from the bits and pieces scattered across multiple sources.”

Like other human produced tools, machine learning can be used in multiple, and oftentimes contradictory, ways. By perpetuating inequalities, machine learning can hinder justice; or, by leading to more accountability, it can serve justice. 

The opposing ways in  which data can be used has been recognized by government entities and policy makers. Several initiatives, including the OECD AI Policy Observatory and the European Commission’s High-Level Expert Group on Artificial Intelligence, have formed in response to the ethical implications of new technologies. With few legislative guidelines yet passed in the U.S., the legal implications of data usage’s effect on social justice remain unclear. 

As we continue to grapple with the convergence of big data and justice, it bears noting an observation by former US Secretary of State Henry Kissinger and his co-authors of the book Age of A.I.: And Our Human Future: “[AI] is not an enabler of many industries and facets of human life: scientific research, education, manufacturing, logistics, transportation, defense, law enforcement, politics, advertising, art, culture, and more. The characteristics of AI including its capacities to learn, evolve, and surprise will disrupt and transform them all. The outcome will be the alteration of human identity and the human experience of reality at levels not experienced since the dawn of the modern age.”

Jaimee Francis is a first-year law student at Boston University School of Law. Prior to beginning her legal studies, Ms. Francis earned her Bachelor of Science from the Georgia Institute of Technology.