In March 2023, David Sloly embarked on a journey to create the world’s first AI generated podcast. Made in AI follows an eccentric AI family, taking over the world. In turn, the podcast has taken over the charts, climbing Goodpods and ReHack’s top 10 AI Podcasts lists.
We spoke with David about how he made the Made in AI podcast, and what’s in store for the future of AI entertainment.
A sitcom full of parody created and told using artificial intelligence.
I'd studied AI for a number of years with Oxford University, I came back from studying it, very excited, and immediately tried to embed it into HarveyDavid (the business I run with my partner Harvey). But I found myself philosophically at odds with some of the staff because they were worried about the ethics, which I respected and understood they had very good points.
So, I decided to play with AI in a safe environment and create something that uses AI that's nothing to do with the business: a sandbox experiment. I have a background in audio production and storytelling, so I chose to make a sitcom using AI. I quickly learned that you can't do a bit of AI here and there - that takes as much time as if you were using more, plus an additional complexity. But, if you reimagine the entire production flow with AI at the core, you could compress five days of work into two days.
What was just going to be a short experiment has now been going for almost a year and the sitcom podcast has reached season four.
I ask AI to come up with an idea for a show. Then, I ask it to create the structure: to sketch out and plot key scenes, then I ask it to start filling in some things that could happen and additional information.
I take the ideas and ask AI to write bits of script: what could they say here? With small adjustments to the script elements, I can write the rest of the script directly into an AI voice synthesis program.
No real script exists for the show. I have to write directly into the voice synthesisers because they can't be directed. It’s pointless having this big script where it says “Athena gets angry and tells them all to leave the room, so they leave the room” because Athena may never get angry. So I have to keep trying until what I say works for the show. Whatever works leads to the next line. I enjoy the idea of trying to direct characters that can't be directed.
60% is AI, 40% is human.
It takes humans a lot more time to do anything - we aren’t as fast as AI! So, there's actually two equations going on. AI is doing 50% - 60% of the total work, but the physical work and effort is probably 70% - 80% human. AI works in no time, it goes “there’s your script” which would’ve taken me a whole day to put together. AI spits the work out as fast as I can say it.
Everything was very clunky when I started. The voice synthesisers couldn't even say the word AI, so I had to write the letters A or AY then EYE.
Back then, I knew AI could self-learn through a large language model and progress quickly. I just didn't consider how this progress would present itself.
Now the language models have learned so much - they're even beginning to let me include expressions and tone. At the close of each season of the show I clear out old AI tools and bring in new ones. The new tools work faster, enable me to do things like add music, and do all kinds of magic that make the podcast more realistic.
I've been pleasantly surprised by the progression. I didn't really anticipate it, but it's obvious with hindsight. It’s a big difference to other technologies, CVS didn't change for 50 years, they stayed the same. Radios still haven't really changed, and they didn't change at all for 25 - 40 years. Fundamental changes in AI are occurring at an accelerated rate we’ve never really seen in our lifetime.
I get up at 5:00 - 5:30 in the morning. There's no one to distract me at that time, so I can just focus on AI. I can't make any noise because my family are asleep so all I can do is sit with a pair of headphones and work on the show. Whilst AI is processing stuff, I'm thinking about what I want to do next.
001 is my favourite character. He's the father in the show and represents ‘old AI’. There's a little bit of me in him. I think watching the world through the wrong lens is a wonderful way to watch the world, it puts a smile on my face.
AI’s already built the faces of the characters. But within the next three months, OpenAI is going to release a moving image model, Sora. Before the show goes visual, my goal is to use AI to create a show that's on all the time, one the listener can dip in and out of. The next phase from there is deep personalisation.
You may not be interested in the same Made in AI experience as somebody else.
Deep personalisation would be an episode of the show designed specifically for you. It will take a snapshot of your social media and produce the show in real time, always talking about things you’re interested in. Personalisation makes a massive impact for the listener.
One of the most wonderful experiences I've had was driving to France with my wife and son, who was about eight or nine at the time. We were listening to the Chitty Chitty Bang Bang audiobook, heading towards Dover. We listened to the characters in the book getting on the ferry from Dover - and there we were, in Dover, about to get on a ferry. I thought this was an incredible element of storytelling context.
Now, I want to build context modelling. If a listener is in India, but they've been talking about how much they’d love to go to California, they would go to California in the show. It pulls you in and creates a personalised story for you. And the whole point of it is to bring joy to people.
In season five the AI family will descend from the heavens in a spacecraft and launch a global cult. In season six they'll enslave all humans, connect them to digital tethers and make human dreams their entertainment... Maybe that's a bit much? Watch this space.