Amnesty International raises concerns about use of unlawful data collection systems to train generative AI

News

geralt / Pixabay

Amnesty International raises concerns about use of unlawful data collection systems to train generative AI

Salma Ben Mariem | Faculty of Law and Political Science of Sousse, TN

May 28, 2026 03:40:48 pm

Amnesty International reported on Thursday that tech companies have used unlawful web scraping to collect large volumes of online data for the development of generative artificial intelligence (AI) models, in violation of the right to privacy and other human rights standards. The organization called for the prohibition of such data collection systems and urged governments to intervene to regulate these practices.

In a report documenting the risks associated with the large-scale data collection and processing systems deployed by companies such as Google, Meta and OpenAI, Amnesty stated that standalone generative AI systems rely on unlawful web scraping to extract extensive amounts of users’ online data to train their generative AI models, which makes these systems unlawful by design and deployment.

According to Likhita Banerji, head of the Algorithmic Accountability Lab at Amnesty International, these data scraping systems extracted information from billions of public online posts worldwide without the explicit consent of web users, violating their right to privacy enshrined in fundamental human rights treaties and specific resolutions on the right to privacy in the digital age. The group also said that the generative AI systems perpetuated racial and gender biases as well as discrimination, which is a reflection of real-world biases and cultural stereotypes inherent to the training data pulled from the web.

Furthermore, Amnesty International highlighted that the rapid development of generative AI models had serious environmental consequences. Firstly, the development of generative AI requires building data centres to house AI servers, which consume substantial water resources during both construction and operation, while water is increasingly scarce in many places. These data centres also produce electronic waste, which often contains hazardous substances such as mercury and lead. Secondly, AI-related infrastructure heavily relies on critical minerals and rare elements, which are often mined unsustainably.

Consequently, Amnesty International called on tech companies to cease the unlawful mass collection of data to train their standalone generative AI models and urged states to hold companies accountable for human rights abuses related to AI tools design and business choices.

There has been a debate regarding the challenges related to AI development. Key concerns included the misuse of AI in surveillance technologies, nonconsensual data extraction, data leakage, inferential profiling of people, and the generation of synthetic media capable of influencing individual behaviour, particularly among vulnerable groups such as children, which prompted national and international efforts to implement appropriate regulations.

In response, several countries, including Brazil and Vietnam, have enacted special legislation to tackle the issue of safety in AI use. In February 2026, the UN Secretary General emphasized the need to maintain AI governance and regulation without impeding innovation, given the increasing use of AI technologies by the military to commit human rights violations.