Introduction

In a landmark keynote delivered at AI Startup School in San Francisco on June 17, 2025, Andrej Karpathy, former director of AI at Tesla and a leading figure in artificial intelligence and deep learning, laid out a compelling vision for the future of software. Drawing on his extensive experience at Stanford, OpenAI, and Tesla, Karpathy argued that software is undergoing a profound transformation—one that is as fundamental as any since the dawn of computing itself.

He introduced the concept of Software 3.0, an era where natural language becomes the primary programming interface, and large language models (LLMs) act as the new kind of computer. This shift is not just about new tools; it is about building a new computing paradigm, reshaping how developers write code, how users interact with software, and how entire software ecosystems evolve.

This article captures every insight, story, and nuance from Karpathy’s keynote. We’ll explore his detailed analogies, practical examples, and forward-looking advice on how to thrive in this new era of software.

The Evolution of Software: From 1.0 to 3.0

Karpathy began by framing the historical arc of software development. He pointed out that for roughly 70 years, software fundamentally remained the same: humans wrote explicit code to instruct computers. He called this era Software 1.0—the traditional paradigm of programming where developers write lines of code in languages like C++ or Python.

Then came Software 2.0, a concept Karpathy himself popularized years ago. This era is characterized by neural networks—models whose behavior is defined not by explicit code, but by learned parameters (weights). Instead of writing step-by-step instructions, developers curate datasets and run optimization algorithms to “train” these networks. The neural net weights become the new “code,” encoding complex functions such as image recognition or speech understanding.

Karpathy illustrated this shift with the example of the AlexNet image recognizer, a neural network trained to classify images without explicit programming of features. He emphasized that Software 2.0 models were until recently fixed-function computers—specialized for tasks like classification.

What’s changed dramatically is the emergence of Software 3.0, where neural networks, especially large language models, become programmable via natural language prompts. Now, instead of writing code or training weights, developers write English instructions that program the model’s behavior dynamically. Karpathy described this as a fundamentally new kind of computer:

“Your prompts are now programs that program the LLM. And remarkably, these prompts are written in English. So it’s kind of a very interesting programming language.”

He noted that this is a revolutionary shift because it makes programming accessible in a natural language, breaking down traditional barriers to software development.

Karpathy also highlighted how GitHub repositories are evolving to include not just code, but English interspersed with code—signaling this new hybrid programming paradigm.

Programming in English: The Rise of Software 3.0

Karpathy’s excitement about programming in English is palpable. He shared a memorable moment when he tweeted:

“Remarkably, we’re now programming computers in English.”

This tweet captured the imagination of many and reflects a profound change: the new programming language is the language humans already use daily.

He gave a concrete example contrasting traditional sentiment classification. Previously, a developer might write Python code or train a neural network to classify sentiment. Now, with a large language model, one can simply write a few-shot prompt in English instructing the model to perform sentiment analysis. This prompt acts as a program, dynamically guiding the model’s behavior.

Karpathy emphasized that this new programming paradigm is not just a novelty but a fundamental shift requiring developers to become fluent in multiple paradigms:

“If you’re entering the industry, it’s a very good idea to be fluent in all of them [Software 1.0, 2.0, and 3.0] because they all have slight pros and cons.”

He stressed the importance of fluidly transitioning between writing explicit code, training neural nets, and programming LLMs with natural language.

LLMs as Utilities, Fabs, and Operating Systems

Karpathy then moved to a fascinating analogy, comparing LLMs to utilities, semiconductor fabs, and operating systems—three pillars of modern computing infrastructure.

LLMs as Utilities

He likened LLMs to utilities like electricity, highlighting how labs such as OpenAI, Gemini, and Anthropic invest heavily in capital expenditures (capex) to build these models. The models are then served via APIs, metered by usage, much like electricity consumption.

“We demand low latency, high uptime, consistent quality… When state-of-the-art LLMs go down, it’s like an intelligence brownout in the world.”

This analogy captures the critical role LLMs play as foundational services powering countless applications.

LLMs as Fabs

Karpathy also compared LLM labs to semiconductor fabs, pointing out the deep tech trees and research secrets concentrated in these organizations. The massive investment in training infrastructure and hardware is akin to building and operating cutting-edge fabs.

He noted the analogy is imperfect because software is malleable and less defensible than physical fabs, but it still conveys the scale and complexity involved.

LLMs as Operating Systems

Most compellingly, Karpathy argued that LLMs resemble operating systems:

“This is not just electricity or water. These are increasingly complex software ecosystems.”

Like Windows, MacOS, or Linux, LLMs form the platform upon which applications run. There are closed-source providers (OpenAI, Google) and open-source alternatives (LLaMA ecosystem) akin to Linux.

He sketched a vision where LLMs orchestrate memory and compute for problem-solving, with context windows acting as working memory. This is a return to the 1960s era of computing, where time-sharing and batch processing dominated, centralized in the cloud due to the expense of compute.

Karpathy also pointed out the current lack of a general graphical user interface (GUI) for LLMs, comparing direct interaction with ChatGPT to a terminal interface. He suggested that a general-purpose GUI for LLMs has yet to be invented, but many specialized apps are beginning to fill this gap.

The Psychology of LLMs: People Spirits and Cognitive Quirks

Switching gears, Karpathy offered a unique perspective on the psychology of LLMs. He described them as:

“People spirits — stochastic simulations of people, where the simulator is an autoregressive Transformer.”

Because LLMs are trained on vast corpora of human text, they develop an emergent psychology: they have encyclopedic knowledge and memory far beyond any individual human, but also significant cognitive deficits.

Superhuman Strengths

Karpathy compared LLMs to an autistic savant, referencing the movie Rain Man:

“They can remember lots of things, a lot more than any single individual human can because they read so many things.”

LLMs can recall hashes, facts, and patterns with superhuman accuracy and speed.

Cognitive Deficits

However, LLMs hallucinate, making up false information confidently. They lack a robust internal model of self-knowledge and can produce jagged intelligence—being superhuman in some tasks but making mistakes no human would make.

He gave examples like insisting “9.11 is greater than 9.9” or misspelling “strawberry” with two Rs.

LLMs also suffer from a form of anterograde amnesia — they do not consolidate knowledge over time like humans do by sleeping and reflecting. Their context windows act as working memory but are wiped regularly, limiting long-term learning.

Karpathy recommended watching movies Memento and 50 First Dates as metaphors for LLM memory limitations.

Security and Gullibility

He cautioned that LLMs are gullible and susceptible to prompt injection attacks, data leakage, and other security risks. These limitations must be carefully managed when building applications.

Designing LLM Apps with Partial Autonomy

Karpathy then explored the practical opportunities that arise from LLMs’ unique capabilities and limitations, focusing on the concept of partial autonomy.

Rather than treating LLMs as fully autonomous agents, Karpathy advocates for building apps where humans and AI collaborate closely, with humans retaining control and oversight.

Example: Cursor

He highlighted Cursor, an AI-powered code editor, as an early exemplar of this approach.

Cursor integrates multiple LLMs and embedding models to assist developers, but still provides a traditional interface for manual work. It orchestrates context management, multiple LLM calls, and applies diffs to code, all while giving users a clear GUI to audit changes.

Karpathy emphasized the importance of GUIs in LLM apps:

“You don’t want to talk to the operating system directly in text. Text is very hard to read, interpret, and understand… A GUI allows a human to audit the work of these fallible systems and go faster.”

The Autonomy Slider

A key design principle Karpathy introduced is the autonomy slider—a control allowing users to tune how much autonomy the AI has.

In Cursor, users can:

Use tap completion for small changes (high human control)
Command K to modify chunks of code
Command L to change entire files
Command I for full autonomy over the repo

This flexibility enables users to balance speed and control depending on the task complexity.

Perplexity and Other Apps

Karpathy also mentioned Perplexity, another LLM-powered app with similar features: orchestrating multiple models, citing sources, providing GUIs for auditing, and autonomy sliders for varying levels of AI assistance.

The Human-AI Collaboration Loop

Karpathy stressed the importance of fast, efficient human-AI collaboration loops. Humans generate and verify AI outputs rapidly to maintain control and ensure correctness.

He warned against over-reliance on fully autonomous agents producing massive diffs or outputs without human review:

“Even though 10,000 lines come out instantly, I have to make sure it’s not introducing bugs or security issues.”

He encouraged developers to find best practices for keeping AI “on a leash” and iterating in small, verifiable steps.

Lessons from Tesla Autopilot: Autonomy and Human-in-the-Loop

Karpathy drew on his experience leading Tesla Autopilot to illustrate how partial autonomy works in practice.

He recounted his first ride in a self-driving car in 2013, which was flawless but still far from full autonomy. Over his tenure, Tesla progressively shifted functionality from traditional coded software (Software 1.0) to neural networks (Software 2.0), gradually deleting legacy code as neural nets improved.

He emphasized that driving is a hard problem and that full autonomy remains elusive even today:

“We still haven’t really solved the problem. There’s still a lot of teleoperation and human-in-the-loop driving.”

This experience informs his caution about the hype around fully autonomous AI agents in 2025. He advocates for careful, incremental progress with humans supervising and controlling AI.

The Iron Man Analogy: Augmentation vs. Agents

Karpathy invoked the Iron Man suit as a metaphor for how AI should augment human capabilities:

“The Iron Man suit is both an augmentation and an agent. Tony Stark can drive it, but it can also fly around autonomously.”

He argued that most AI products today should be more like Iron Man suits—augmenting users with partial autonomy—rather than fully autonomous robots.

This analogy captures the need for custom GUIs, fast generation-verification loops, and autonomy sliders that allow users to gradually delegate tasks.

Vibe Coding: Everyone Is Now a Programmer

One of the most optimistic parts of Karpathy’s talk focused on how Software 3.0 democratizes programming.

Because LLMs are programmed in natural language, everyone becomes a programmer:

“This is extremely bullish and very interesting to me and also completely unprecedented.”

Karpathy shared the story of vibe coding, a meme and movement celebrating how natural language programming lowers barriers to software creation.

He showed a heartwarming video of kids vibe coding, highlighting the wholesome and empowering nature of this new paradigm.

Karpathy himself experimented with vibe coding, building iOS apps and web apps without deep knowledge of Swift or devops. He described how writing the code was surprisingly easy, but integrating real-world infrastructure like authentication and payments was still challenging and slow.

His reflections reveal a key insight: while LLMs simplify coding, the surrounding ecosystem of deployment, authentication, and infrastructure remains a bottleneck.

Building for Agents: Future-Ready Digital Infrastructure

Karpathy then addressed the question: if LLMs and AI agents become primary consumers and manipulators of digital information, how should we build software infrastructure?

He proposed that:

Traditional GUIs and APIs were designed for humans and programs, respectively.
Now, agents (LLMs) form a third category of digital information consumers.
To serve agents effectively, software and documentation must become agent-friendly.

Example: lm.txt and Markdown Documentation

Karpathy suggested a new convention like lm.txt files—simple markdown files that explicitly describe a domain or API for LLMs, much like robots.txt guides web crawlers.

He pointed out that most documentation is written for humans, with instructions like “click this button,” which LLMs cannot interpret directly.

Companies like Vercel and Stripe are early movers, rewriting docs in markdown and replacing UI instructions with equivalent command-line or API calls that LLM agents can execute.

Tools for Ingesting Data

Karpathy highlighted tools that convert GitHub repos into LLM-friendly formats by concatenating files and building directory structures, enabling LLMs to answer questions about codebases.

He also mentioned Deep Wiki, which analyzes repos and generates comprehensive documentation pages, making software more accessible to AI agents.

The Future of Agent Interaction

Karpathy envisions a future where LLMs can interact with software infrastructure directly—clicking buttons, making API calls, and navigating documentation seamlessly.

However, he cautioned that it is crucial to meet LLMs halfway by designing infrastructure that is easier and more reliable for them to consume.

Summary and Conclusion: We’re in the 1960s of LLMs — Time to Build

Karpathy concluded with a powerful call to action:

We are living in the early days of Software 3.0, analogous to the 1960s era of operating systems.
LLMs are complex, fallible “people spirits” that require new infrastructure, interfaces, and programming paradigms.
There is a massive opportunity to rewrite and build new software leveraging LLMs as utilities, fabs, and operating systems.
Developers must learn to work with these models collaboratively, designing partial autonomy products with human-in-the-loop verification.
The democratization of programming through natural language promises a future where everyone can build software.
We must also build digital infrastructure ready for LLM agents as a new class of software consumers.

Karpathy’s vision is both a roadmap and an invitation:

“It’s an amazing time to get into the industry. We need to rewrite a ton of code. These LLMs are kind of like coders, utilities, fabs, but especially like operating systems. It’s so early, and I can’t wait to build it with all of you.”

Final Thoughts

Andrej Karpathy’s keynote is a masterclass in understanding the seismic shifts underway in software development. By preserving every detail and story from his talk, this article provides a comprehensive resource for anyone eager to grasp the full implications of Software 3.0.

From the historical context and technical analogies to practical app design and future infrastructure, Karpathy’s insights illuminate a path forward for developers, entrepreneurs, and technologists in the AI era.

As we stand at this inflection point, the message is clear: software is changing again, and the future belongs to those who learn to program the new computers—in English.

For further exploration, you can access Andrej Karpathy’s slides here: Slides PDF

More content from Andrej Karpathy: YouTube Channel

Apply to Y Combinator: https://ycombinator.com/apply
Work at a startup: https://workatastartup.com

Introduction#

The Evolution of Software: From 1.0 to 3.0#

Programming in English: The Rise of Software 3.0#

LLMs as Utilities, Fabs, and Operating Systems#

LLMs as Utilities#

LLMs as Fabs#

LLMs as Operating Systems#

The Psychology of LLMs: People Spirits and Cognitive Quirks#

Superhuman Strengths#

Cognitive Deficits#

Security and Gullibility#

Designing LLM Apps with Partial Autonomy#

Example: Cursor#

The Autonomy Slider#

Perplexity and Other Apps#

The Human-AI Collaboration Loop#

Lessons from Tesla Autopilot: Autonomy and Human-in-the-Loop#

The Iron Man Analogy: Augmentation vs. Agents#

Vibe Coding: Everyone Is Now a Programmer#

Building for Agents: Future-Ready Digital Infrastructure#

Example: lm.txt and Markdown Documentation#

Tools for Ingesting Data#

The Future of Agent Interaction#

Summary and Conclusion: We’re in the 1960s of LLMs — Time to Build#

Final Thoughts#