3 billion faces. Zero consent. Regulators everywhere said no.
Clearview AI scraped every photo it could find—from Facebook, Instagram, YouTube, LinkedIn, you name it. It then built a facial recognition database, which it sold to cops, private companies, or anyone else who'd pay. Upload any photo, get back a name and social media profile. That was the promise: "Find anyone, anywhere, instantly."
It wasn’t long before regulators came knocking. Their legal defense? "It's all publicly available. We're just organizing what's already out there."
Between 2021 and 2023, European regulators issued over €30 million in fines and banned them from processing EU data entirely.
The rulings were blunt: publicly visible doesn't mean you can weaponize it. Under the GDPR, accessibility isn't permission. What matters is whether you have legal basis, whether the purpose is legitimate, and whether people could reasonably expect you'd use their data this way. Clearview bombed all three.
The Surveillance Machine
Here's what Clearview built:
Bots scraped billions of photos from social media. Facial recognition then extracted biometric templates. Any client—police, private security, retailers—could upload a face and get matches linked back to profiles.
If you posted vacation photos for friends, Clearview indexed them for surveillance.
If you uploaded a professional headshot for networking, Clearview turned it into a biometric tracking tool.
If parents shared pictures of their kids, Clearview added them to a commercial database without asking.
Nobody opted in. Nobody even knew.
The pitch was dystopian efficiency: identify anyone from a single photo. The catch? Billions of people never agreed to be searchable.
The Crackdown
European regulators responded decisively to Clearview’s actions. The following regulatory actions were handed down:
- France: €20M fine + cease processing
- Italy: €20M fine + delete all data
- UK: £7.5M fine + processing ban
- Greece, Austria, Netherlands: Additional fines and prohibition orders
The findings were consistent:
- No valid legal basis. Clearview claimed "legitimate interest" under Article 6(1)(f) of the GDPR. Mass biometric surveillance without consent or transparency? The balancing test wasn't even close, regulators found.
- Purpose hijacking. Photos shared for social connection were repurposed for identification without notice. Article 5(1)(b) requires purpose specification. Clearview's silent pivot violated it.
- Zero transparency. Billions had no idea they were in a surveillance database. Article 13 mandates clear processing information. Clearview provided none until regulators forced them to do so.
- Special category violations. Biometric data for unique identification is a special category of personal data under Article 9 of the GDPR—it requires explicit consent or the application of narrow legal exceptions. Commercial scraping qualified for neither.
Where Clearview AI Went Wrong
Mistake #1: "Publicly Available = Fair Game"
Clearview's business model assumed that scraping publicly available data was automatically lawful. That’s Wrong. The GDPR doesn't care if data is public or private. You still need a legal basis. You still need a legitimate purpose. You still need to respect what people expect when they share.
Post a photo to Instagram? You expect your followers to see it, Instagram stores it, maybe it shows up in hashtags. You do not expect a surveillance company to biometrically index your face and sell access to law enforcement.
Context also matters. Data shared in one context can't be silently weaponized for another. That's the purpose limitation principle set forth in Article 5(1)(b).
Regulators were clear: Personal data being "publicly available" does not provide a legal basis for processing. It's just an observation about accessibility.
Mistake #2: "Legitimate Interest Covers Everything"
Clearview argued that its legitimate interests justified its processing.
Regulators applied the standard three-part test:
- 1. Do you have a legitimate interest? Maybe, regulators said—e.g., law enforcement is a recognized legitimate interest.
- 2. Is processing necessary? No, because alternatives exist. Among them are targeted investigations, warrant-based access to platforms, and traditional detective work.
- 3. Does your interest override individual rights? Definitely not. Mass biometric surveillance without consent, transparency, or ability to object? Individual rights win decisively.
The Verdict: Legitimate interest doesn't justify dragnet data collection. The test isn't "do we have any legitimate interest?" It's "does our interest outweigh the privacy invasion we're causing?" For mass scraping for purposes of biometric identification, the answer was unambiguous.
Mistake #3: "We're Not in Europe"
Clearview, a US company, thought GDPR's territorial scope didn't reach them. As regulators found, it was wrong again.
Article 3(2) of the GDPR means that the GDPR applies to non-EU controllers processing EU residents' data. Clearview's database included millions of Europeans, and it sold services to EU entities.
Not having a physical presence in Europe does not allow you to avoid application of the GDPR. If you process data on EU data subjects, the GDPR most likely applies. Regulators proved extraterritorial enforcement works—fines were issued, processing was banned, and orders were enforced through cooperation with US authorities.
How to Fix This (If You're Building Similar Systems)
If you’re building a similar system, there are some common-sense steps to take in order to avoid the same fate as Clearview AI.
Fix #1: Purpose-Lock Your Data
Tag data with the original purpose at collection. Photos from social media should have metadata attached. Some examples include:
- purpose: social_networking
- source: user_uploaded
- consent_scope: platform_terms
You should also block repurposing without explicit consent. If data tagged social_networking gets queried for facial_recognition, the system should deny access unless new consent exists or a legal exemption applies (like a judicial warrant). Other steps include auditing every purpose transition, logging attempts to use data beyond original scope, and flagging for legal review before allowing.
Result: Silent repurposing becomes architecturally impossible.
Fix #2: Gate Biometric Processing Behind Explicit Consent
Biometric identification requires special category protections under Article 9 of the GDPR. Explicit consent required before biometric templates are created should be precise. Example disclosures that should be included are:
- "We will use your photo for facial recognition matching" (specific)
- "Your biometric template will be stored for X years" (clear retention)
- "You can withdraw consent and request deletion anytime" (revocable)
You should build transparency, as well. For example, consider:
- A public registry of who's in the database, searchable by individuals.
- Notification when biometric data is matched or accessed.
- Easy objection mechanisms.
- Encrypting biometric templates with keys requiring consent verification.
- Automatic deletion on withdrawal.
Result: No biometric processing without documented, informed, explicit consent.
Fix #3: Enforce Data Minimization
Even if you have legal basis, proportionality matters. For example, law enforcement requests a match for a robbery suspect? Limit search to relevant geography and timeframe—not the entire database. Other steps include:
- Rate-limit bulk queries.
- Require case-by-case justification.
- Escalate mass searches to DPO review.
- Prevent wholesale database exports.
- Build audit trails for every query showing: legal basis, scope, approver, results.
Result: The system enforces least-privilege processing by default.
How to Fix This (If You're Building Similar Systems)
The Clearview case clarified something fundamental: scraping publicly available data doesn't grant processing rights.
Three rules emerged:
1. "Publicly available" isn't a legal basis. The GDPR requires a lawful basis under Article 6, and "it's on the internet" doesn't qualify.
2. Context can't be erased. Data shared for social networking can't be silently repurposed for surveillance. Purpose limitation applies to public data.
3. Biometric data demands consent. Facial recognition is a form of special category processing. Mass scraping fails both the consent and narrow exception tests.
For anyone building AI systems, there is another lesson: training data legality can't be assumed. Accessible doesn't mean lawful. Scale matters. Purpose matters. Consent matters. Your data source doesn't determine compliance. Your legal basis does. And "I found it on Google" isn't one.
This is the second installment of our Privacy Disasters series, created in collaboration with Priya Balakrishnan. In this series, we explore privacy failures to uncover lessons about data protection and risk management in practice.