Princeton computer scientist Arvind Narayanan (previously) has posted slides and notes from a recent MIT talk on “How to recognize AI snake oil” in which he divides AI applications into three (nonexhaustive) categories and rates how difficult they are, and thus whether you should believe vendors who claim that their machine learning models can perform as advertised.
Narayanan’s categories are:
* Perception, such as facial recognition and song identification, where there is a definitive correct answer, which is making “genuine, rapid progress.”
* Automating judgment, such as spam detection, copyright violation, essay grading, where humans routinely make judgments that can be used to train a model, which is “far from perfect, but improving,” albeit with limits, because “reasonable people can disagree about the correct decision.”
* Predicting social outcomes, such as predictive policing, predicting terrorist risk, predicting which kids are at risk, which is “fundamentally dubious” because regression analysis and other statistical tools do not work better than “manual scoring using just a few features” — and this doesn’t work very well (and that’s before you get into areas like training data bias, etc).
Moreover, the use of AI to predict social outcomes doesn’t just produce bad predictions, it also drives demand for more surveillance to feed the machine-learning models, and uses up energy that could be deployed on better-performing techniques for mitigating these harms.
This is a great, compact presentation, but I feel the need to weigh in critically on Narayanan’s claim that ML can be used for judging “copyright violation”: this is a common misconception among computer scientists who lack a nuanced understanding of copyright law and its limitations and exceptions. If copyright was an absolute right — no one is ever allowed to copy your copyrighted thing, ever — then ML would be pretty good at policing it.
Likewise, if the exceptions to copyright were deterministic and rules-based (“Taking this much of a song in a sample is fair use, and more is not”) then machine learning could assess well whether a use was fair or not.
But none of this is true. Copyright’s limitations and exceptions are as important as copyright itself: as the Supreme Court held in Eldred, without Fair Use, copyright would violate the First Amendment, and any copyright regime that doesn’t accommodate Fair Use is thus unconstitutional.
And Fair Use is not deterministic at all: indeed, determining whether a use is fair is a lot more like figuring out whether a prisoner is a recidivism risk than it is like figuring out whether a given face matches one from a database. For example, some uses are judged “fair” because they do not take the “heart of the work.” No algorithm can tell you which part of a song or a poem or a photo is its “heart.”
Some uses are fair because they are “critical,” “transformative” or “parodical.” No machine learning system can distinguish parody from mere appropriation.
Some uses are fair because the work they’re making use of is primarily factual, or because there is a strong public interest in the new work’s existence.
Here are some things that are fair, sometimes: copying an entire work and making commercial remixes of a work. Sometimes, these uses are not fair. No machine learning system can distinguish between the cases where the use is fair and where it is infringing.
The belief that machine learning can resolve copyright claims animated one of the most catastrophic tech regulations of the decade, and so it’s really important that we get this right, or we’re going to see more of that sort of thing.
The belief that “content matching” is functionally equivalent to “copyright protection” is an underappreciated programmer myth. I hope I’ve done my bit here to dispel it.
How to recognize AI snake oil [Arvind Narayanan/Princeton]
(via Four Short Links)
Caroline McCarthy is a journalist and ex-googler who now works as an ad-tech exec for a startup that Fox bought and they transfered to Disney when the two companies merged; in this great, impassioned Tedx talk, she lays out the case for being a “tech policy activist” and explains how the field of tech policy, […]
Ten years, 100 songs, three minutes. Sheer. Fucking. Genius. Watch it before a Youtube copyright enforcement bot deletes it and DJ Earworm’s channel with it. (via Metafilter)
C4D4U’s SOFTBODY TETRIS V16 is (as the name implies), the latest in a series of “softbody” simulations of Tetris, in which the tetronimoes are rubbery, jelly-like solids that glisten as they wobble into place. It’s an incredibly soothing thing to watch (C4D4U calls them “ASMR for my eyes”) and part of a wider genre of […]
We all know those gifts we get “for the kids,” the ones that parents are secretly more excited to open. Drones are a perfect example, but there’s a model out there that really doubles down on that appeal. Introducing the Space Fighter Building Block Drones, a series of space fighter drones that are a blast […]
The hardest part of web design can be nailing down the look. These days, even non-designers can easily spot a stale stock photo or lazily-made icon. What’s the solution? No matter what kind of artist you are, it’s always a good idea to widen your palette. And with more than a million vector images to […]
For all that tech that gets squeezed into them, the best wireless earbuds are ones we barely have to think about. That’s the whole point, right? We get wireless because we just want to hit play and have a hands-free, cordless soundtrack for the rest of the commute. If that’s your philosophy, definitely give these […]