I remember exactly where I was when GPT-4 launched in March. Sitting at my kitchen table, coffee going cold, running prompt after prompt and watching it handle things GPT-3.5 couldn't touch: nuanced reasoning, multimodal inputs, answers that didn't just sound plausible but actually were. I texted a friend: "This is different." She replied: "I know. I've been up since 5 AM playing with it."
That feeling, that this is actually different, defined 2023.
The pace was relentless. GPT-4 arrived on March 14, scoring in the 90th percentile on the bar exam and introducing multimodal capabilities. Midjourney V5 launched the same month, producing images so photorealistic that the line between generated and real started to genuinely blur. By May, V5.1. By June, V5.2. The iteration speed was unlike anything the creative tools space had seen.
But the real story of 2023 isn't just that language and image models got better. It's that generative AI expanded into every modality.
Runway released Gen-2 for text-to-video generation, letting anyone create short video clips from a text prompt. Stability AI released audio generation models. Meta open-sourced Llama 2 in July, making a powerful language model freely available to developers and researchers. And in December, Google launched Gemini, a multimodal model designed to compete directly with GPT-4.
Text. Images. Video. Audio. Code. In the span of twelve months, generative AI went from "it can write pretty good emails" to "it can create across every medium humans use to communicate."
I'm a product person, so I think about this through the lens of what's buildable.
A year ago, adding AI to a product meant calling an API for text generation and hoping it didn't hallucinate too badly. Today, the toolkit includes image generation, video synthesis, voice cloning, code generation, and multimodal reasoning. The design space for what a product can do has expanded dramatically.
At work, I'm watching teams prototype ideas that would have been science fiction 18 months ago. Personalized video content generated on the fly. Search experiences that understand images and text simultaneously. Onboarding flows that adapt in real time to how a user interacts. Not all of these will ship. But the fact that they're prototypeable at all changes how product teams think about what's possible.
I want to be honest about something: I'm genuinely excited. Not in the performative, "we're living in the future" LinkedIn way. In the quiet, personal way where you realize the tools you've been imagining for years are suddenly real.
I've been writing about AI on this blog since 2016. From digital assistants to NLP breakthroughs to GPT-3's few-shot learning to the generative explosion of 2022. Each year, the gap between what AI could do and what it could do for real people got smaller. In 2023, for the first time, it feels like the gap closed.
That doesn't mean everything is solved. The hallucination problem is real. The ethical questions are enormous. The gap between a demo and a production-quality product is still wide. But the trajectory is undeniable, and for a product person who's spent a career at the intersection of technology and human needs, this is the most exciting moment I can remember.
I don't know exactly what 2024 brings. But I know the building blocks are here: language, vision, video, audio, reasoning, all accessible through APIs, all improving faster than anyone predicted. The question isn't what generative AI can do. It's what we choose to build with it.
I've never been more excited to find out.