Whispers of A.I.’s Modular Future [The New Yorker]

February 1, 2023

ChatGPT is in the spotlight, but it’s Whisper—OpenAI’s open-source speech-transcription program—that shows us where machine learning is going.

By

One day in late December, I downloaded a program called Whisper.cpp onto my laptop, hoping to use it to transcribe an interview I’d done. I fed it an audio file and, every few seconds, it produced one or two lines of eerily accurate transcript, writing down exactly what had been said with a precision I’d never seen before. As the lines piled up, I could feel my computer getting hotter. This was one of the few times in recent memory that my laptop had actually computed something complicated—mostly I just use it to browse the Web, watch TV, and write. Now it was running cutting-edge A.I.

Despite being one of the more sophisticated programs ever to run on my laptop, Whisper.cpp is also one of the simplest. If you showed its source code to A.I. researchers from the early days of speech recognition, they might laugh in disbelief, or cry—it would be like revealing to a nuclear physicist that the process for achieving cold fusion can be written on a napkin. Whisper.cpp is intelligence distilled. It’s rare for modern software in that it has virtually no dependencies—in other words, it works without the help of other programs. Instead, it is ten thousand lines of stand-alone code, most of which does little more than fairly complicated arithmetic. It was written in five days by Georgi Gerganov, a Bulgarian programmer who, by his own admission, knows next to nothing about speech recognition. Gerganov adapted it from a program called Whisper, released in September by OpenAI, the same organization behind ChatGPT and DALL-E. Whisper transcribes speech in more than ninety languages. In some of them, the software is capable of superhuman performance—that is, it can actually parse what somebody’s saying better than a human can.

What’s so unusual about Whisper is that OpenAI open-sourced it, releasing not just the code but a detailed description of its architecture. They also included the all-important “model weights”: a giant file of numbers specifying the synaptic strength of every connection in the software’s neural network. In so doing, OpenAI made it possible for anyone, including an amateur like Gerganov, to modify the program. Gerganov converted Whisper to C++, a widely supported programming language, to make it easier to download and run on practically any device. This sounds like a logistical detail, but it’s actually the mark of a wider sea change. Until recently, world-beating A.I.s like Whisper were the exclusive province of the big tech firms that developed them. They existed behind the scenes, subtly powering search results, recommendations, chat assistants, and the like. If outsiders have been allowed to use them directly, their usage has been metered and controlled.

There have been a few other open-source A.I.s in the past few years, but most of them have been developed by reverse engineering proprietary projects. LeelaZero, a chess engine, is a crowdsourced version of DeepMind’s AlphaZero, the world’s best computer player; because DeepMind didn’t release AlphaZero’s model weights, LeelaZero had to be trained from scratch, by individual users—a strategy that was only workable because the program could learn by playing chess against itself. Similarly, Stable Diffusion, which conjures images from descriptions, is a hugely popular clone of OpenAI’s DALL-E and Google’s Imagen, but trained with publicly available data. Whisper may be the first A.I. in this class that was simply gifted to the public. In an era of cloud-based software, when all of our programs are essentially rented from the companies that make them, I find it somewhat electrifying that, now that I’ve downloaded Whisper.cpp, no one can take it away from me—not even Gerganov. His little program has transformed my laptop from a device that accesses A.I. to something of an intelligent machine in itself...

Read the full story on The New Yorker's website using the link below.