Like everyone else, we were impressed by NotebookLM’s ability to generate podcasts: Two virtual people having a discussion. You can give it some links and it will generate a podcast based on the links. The podcasts were interesting and engaging. But they also had certain limitations.
The problem with NotebookLM is that while you can challenge it, it does what it’s supposed to do for the most part. It generates a podcast with two voices – one male and one female – and gives you a little control over the result. There is an optional prompt to customize the conversation, but this single prompt doesn’t let you do much. You cannot specifically tell him which topics to discuss or in what order to discuss them. You can try, but he won’t listen to you. It’s also not conversational, which is a bit of a surprise when we’ve all gotten used to chatting with AI. You can’t tell it to repeat itself by saying “That was good, but please generate a new version by changing these details” like you can with ChatGPT or Gemini.
Learn faster. Dig deeper. See further.
Can we do better? Can we integrate our knowledge of books and technology with the ability of AI to summarize? We have argued (and will continue to argue) that simply learning how to use AI is not enough; you have to learn how to do something with AI that is better than what AI could do on its own. You have to integrate artificial intelligence with human. To see what this would look like in practice, we created a custom toolchain that gives us much more control over the results. This is a multi-stage pipeline:
- We use artificial intelligence to generate summaries for each chapter of the book to make sure all important topics are covered.
- We use artificial intelligence to compile chapter summaries into one summary. This step basically gives us an expanded outline.
- We use artificial intelligence to generate a two-person dialogue that becomes the podcast script.
- We edit the script manually, again making sure the summaries cover the right topics in the right order. This is also an opportunity to correct errors and hallucinations.
- We’re using Google’s multi-speech speech-to-text interface (still in preview) to generate a summary podcast with two participants.
Why focus on summaries? We are interested in summaries for several reasons. First, let’s face it: Having two non-existent people discuss something you’ve written is fascinating—especially since they sound genuinely engaged and excited. Hearing the voices of non-existent cyber-people discussing your work makes you feel like you’re living in a sci-fi fantasy. More practically: Generative AI is undeniably good at summarizing. There are few errors and almost no outright hallucinations. Ultimately, our users want a summary. At O’Reilly Answers, our customers often ask for summaries: summarize this book, summarize this chapter. They want to find the information they need. They want to know if they really need to read the book – and if so, which parts. A summary helps them do this and saves time at the same time. It allows them to quickly determine if the book will be useful, and it does this better than back cover copy or an ad on Amazon.
With that in mind, we had to think about what summary would be most useful to our members. Should there be one speaker or two? As a single synthesized voice summed up the book, my eyes (ears?) quickly glazed over. It was much easier to listen to a podcast-style summary where the virtual attendees were enthusiastic and enthusiastic, like those on NotebookLM, than to a lecture. The give and take of discussion, albeit simulated, gave podcasts an energy that a single speaker didn’t have.
How long should the summary be? This is an important question. At some point, the listener loses interest. We could put the entire text of the book into a speech synthesis model and get an audio version – we can still do that; it is a product that some people want. Overall, however, we expect summaries to be minutes rather than hours long. I might listen for 10 minutes, maybe 30 if it’s a topic or speaker that fascinates me. But I’m noticeably impatient when I listen to podcasts, and I don’t have any downtime to listen to the commute. Your preferences and your situation may be very different.
What exactly do listeners expect from these podcasts? Do users expect to learn, or do they just want to see if the book has what they’re looking for? That depends on the topic. I don’t see anyone learning Go from a summary – maybe more to the point, I don’t see anyone fluently teaching Go how to program with AI. Summaries are useful for presenting the key ideas presented in a book: For example, a summary Cloud Native Go gave a good overview of how Go can be used to solve problems faced by people writing software that runs in the cloud. However, actually learning this material requires looking at examples, writing code, and practicing—something that is beyond the reach of an audio-limited medium. I’ve heard of AIs reading Python source code dumps; it’s horrible and pointless. Learning is more likely with a book like Facilitating software architecturewhich is more about concepts and ideas than code. Someone might come away with useful ideas from the discussion and possibly put them into practice. But again, the podcast summary is just an overview. You need the book to get all the value and details. In a recent article, Ethan Mollick writes, “Requesting a summary is not the same as reading for yourself. Asking AI to solve a problem for you is not an effective way to learn, even if it seems like it should be. To learn something new, you will have to read and think for yourself.’
Another difference between NotebookLM podcasts and ours may be more important. All the podcasts we’ve generated from our toolchain are about six minutes long. Podcasts generated by NotebookLM range from 10 to 25 minutes. A longer length might allow NotebookLM podcasts to be more detailed, but it doesn’t actually happen. Rather than being about the book itself, NotebookLM tends to use the book as a starting point for a broader discussion. Podcasts generated by O’Reilly are more directed. They follow the structure of the book because we’ve provided a blueprint, an outline, for the AI to follow. Virtual podcasters still express enthusiasm, they still bring ideas from other sources, but they go in a certain direction. In contrast, NotebookLM’s longer podcasts can feel aimless as they circle back and pick up on ideas they’ve already covered. At least it seems like an important point to me. Admittedly, using the book as a starting point for a wider discussion is also useful, and a balance needs to be maintained. You don’t want it to feel like you’re listening to content. But you also don’t want him to feel unfocused. And if you want a book discussion, you should get a book discussion.
None of these AI-generated podcasts are without limitations. AI-generated summaries are not good at revealing and reflecting nuances in the original writing. It was clearly out of our control with NotebookLM. With our own toolchain, we could certainly modify the script to reflect what we wanted, but the voices themselves weren’t under our control and wouldn’t necessarily follow the text. (It’s arguable that reflecting the nuances of a 250-page book in a six-minute podcast is a flop.) Bias—the kind of implied nuance—is a bigger problem. Our first experiments with NotebookLM tended to ask questions in a female voice, with a male voice providing the answers, although this seemed to improve over time. Our toolchain gave us control because we provided the script. We won’t claim we were unbiased – no one should – but at least we controlled how our virtual people presented themselves.
Our experiments are complete; it’s time to show you what we’ve created. We took five books, generated short podcasts summarizing each with NotebookLM and our toolchain, and posted both sets on oreilly.com. We will add more books in 2025. Listen to them – see what works for you. And please let us know what you think!