When "Publicly Available" Became "Unlawfully Processed"

Clearview AI's €30M Lesson: The Internet Isn't Your Database

Share

Privacy Disasters
Return to Resources Page
Clearview AI GDPR Fine

3 billion faces. Zero consent. Regulators everywhere said no.

Clearview AI scraped every photo it could find—from Facebook, Instagram, YouTube, LinkedIn, you name it. It then built a facial recognition database, which it sold to cops, private companies, or anyone else who'd pay. Upload any photo, get back a name and social media profile. That was the promise: "Find anyone, anywhere, instantly."

It wasn’t long before regulators came knocking. Their legal defense? "It's all publicly available. We're just organizing what's already out there."

Between 2021 and 2023, European regulators issued over €30 million in fines and banned them from processing EU data entirely.

The rulings were blunt: publicly visible doesn't mean you can weaponize it. Under the GDPR, accessibility isn't permission. What matters is whether you have legal basis, whether the purpose is legitimate, and whether people could reasonably expect you'd use their data this way. Clearview bombed all three.

The Surveillance Machine

Here's what Clearview built:

Bots scraped billions of photos from social media. Facial recognition then extracted biometric templates. Any client—police, private security, retailers—could upload a face and get matches linked back to profiles.

If you posted vacation photos for friends, Clearview indexed them for surveillance.

If you uploaded a professional headshot for networking, Clearview turned it into a biometric tracking tool.

If parents shared pictures of their kids, Clearview added them to a commercial database without asking.

Nobody opted in. Nobody even knew.

The pitch was dystopian efficiency: identify anyone from a single photo. The catch? Billions of people never agreed to be searchable.

The Crackdown

European regulators responded decisively to Clearview’s actions. The following regulatory actions were handed down:

The findings were consistent:

Where Clearview AI Went Wrong

Mistake #1: "Publicly Available = Fair Game"

Clearview's business model assumed that scraping publicly available data was automatically lawful. That’s Wrong. The GDPR doesn't care if data is public or private. You still need a legal basis. You still need a legitimate purpose. You still need to respect what people expect when they share.

Post a photo to Instagram? You expect your followers to see it, Instagram stores it, maybe it shows up in hashtags. You do not expect a surveillance company to biometrically index your face and sell access to law enforcement.

Context also matters. Data shared in one context can't be silently weaponized for another. That's the purpose limitation principle set forth in Article 5(1)(b).

Regulators were clear: Personal data being "publicly available" does not provide a legal basis for processing. It's just an observation about accessibility.

Mistake #2: "Legitimate Interest Covers Everything"

Clearview argued that its legitimate interests justified its processing.

Regulators applied the standard three-part test:

The Verdict: Legitimate interest doesn't justify dragnet data collection. The test isn't "do we have any legitimate interest?" It's "does our interest outweigh the privacy invasion we're causing?" For mass scraping for purposes of biometric identification, the answer was unambiguous.

Mistake #3: "We're Not in Europe"

Clearview, a US company, thought GDPR's territorial scope didn't reach them. As regulators found, it was wrong again.

Article 3(2) of the GDPR means that the GDPR applies to non-EU controllers processing EU residents' data. Clearview's database included millions of Europeans, and it sold services to EU entities.

Not having a physical presence in Europe does not allow you to avoid application of the GDPR. If you process data on EU data subjects, the GDPR most likely applies. Regulators proved extraterritorial enforcement works—fines were issued, processing was banned, and orders were enforced through cooperation with US authorities.

How to Fix This (If You're Building Similar Systems)

If you’re building a similar system, there are some common-sense steps to take in order to avoid the same fate as Clearview AI.

Fix #1: Purpose-Lock Your Data

Tag data with the original purpose at collection. Photos from social media should have metadata attached. Some examples include:

You should also block repurposing without explicit consent. If data tagged social_networking gets queried for facial_recognition, the system should deny access unless new consent exists or a legal exemption applies (like a judicial warrant). Other steps include auditing every purpose transition, logging attempts to use data beyond original scope, and flagging for legal review before allowing.

Result: Silent repurposing becomes architecturally impossible.

Fix #2: Gate Biometric Processing Behind Explicit Consent

Biometric identification requires special category protections under Article 9 of the GDPR. Explicit consent required before biometric templates are created should be precise. Example disclosures that should be included are:

You should build transparency, as well. For example, consider:

Result: No biometric processing without documented, informed, explicit consent.

Fix #3: Enforce Data Minimization

Even if you have legal basis, proportionality matters. For example, law enforcement requests a match for a robbery suspect? Limit search to relevant geography and timeframe—not the entire database. Other steps include:

Result: The system enforces least-privilege processing by default.

How to Fix This (If You're Building Similar Systems)

The Clearview case clarified something fundamental: scraping publicly available data doesn't grant processing rights.

Three rules emerged:

1. "Publicly available" isn't a legal basis. The GDPR requires a lawful basis under Article 6, and "it's on the internet" doesn't qualify.

2. Context can't be erased. Data shared for social networking can't be silently repurposed for surveillance. Purpose limitation applies to public data.

3. Biometric data demands consent. Facial recognition is a form of special category processing. Mass scraping fails both the consent and narrow exception tests.

For anyone building AI systems, there is another lesson: training data legality can't be assumed. Accessible doesn't mean lawful. Scale matters. Purpose matters. Consent matters. Your data source doesn't determine compliance. Your legal basis does. And "I found it on Google" isn't one.

This is the second installment of our Privacy Disasters series, created in collaboration with Priya Balakrishnan. In this series, we explore privacy failures to uncover lessons about data protection and risk management in practice.

About the Author

Priya Balakrishnan is a privacy and Governance, Risk, and Compliance (GRC) leader with deep expertise in GDPR, U.S. state privacy laws, and global data protection regulations. She designs and leads scalable compliance frameworks aligned with SOC 2, ISO 27001, NIST, and other leading standards, integrating privacy, security, and business strategy to build resilient, audit-ready organizations. Holding CISA, CISM, CIPM, and CDPSE certifications, Priya brings a strategic, forward-looking approach to governance and AI-era risk management. She currently works as the AI Governance, GRC, and Privacy Manager at ExtraHop. She is also the creator of Privacy Byte-Size, where she translates complex privacy and data protection issues into clear, actionable insights for professionals and consumers worldwide.

Privacy Bootcamp Student

Study the Smart Way With Privacy Bootcamp

Privacy Bootcamp Student
  • Comprehensive, all-in-one training source
  • Pass on your first attempt — or your money back*
  • Gain real exam experience with a live testing environment