Our next SF event is AI UX 2024 - let’s see the new frontier for UX since last year!
Last call: we are recording a preview of the AI Engineer World’s Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!
Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more!
David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft’s initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI.
Why Google *couldn’t* make GPT-3
While we wanted to discuss Adept, we couldn’t talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room.
It’s often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David’s team at OpenAI who ended up making GPT 1/2/3.
David has some interesting answers:
“So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too…
You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do.
And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.”
Cloning HGI for AGI
Human intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra’s Biological Anchors report:
The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode).
David argues that there’s a shortcut. We can bootstrap from existing intelligence.
“Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there… I think we are ignoring the fact that you have a giant shortcut, which is you can behaviorally clone everything humans already know. And that's what we solved with LLMs!”
LLMs today basically model intelligence using all (good!) written knowledge (see our Datasets 101 episode), and have now expanded to non-verbal knowledge (see our HuggingFace episode on multimodality). The SOTA self-supervised pre-training process is surprisingly data-efficient in taking large amounts of unstructured data, and approximating reasoning without overfitting.
But how do you cross the gap from the LLMs of today to building the AGI we all want?
This is why David & friends left to start Adept.
“We believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal” — ACT-1 Blogpost
Critical Path: Abstraction with Reliability
The AGI dream is fully autonomous agents, but there are levels to autonomy that we are comfortable giving our agents, based on how reliable they are. In David’s word choice, we always want higher levels of “abstractions” (aka autonomy), but our need for “reliability” is the practical limit on how high of an abstraction we can use.
“The critical path for Adept is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow.
That's the critical path for the company. Everything we do is in service of that.”
We saw how Adept thinks about different levels of abstraction at the 2023 Summit:
The highest abstraction is the “AI Employee”, but we’ll get there with “AI enabled employees”. Alessio recently gave a talk about the future of work with “services as software” at this week’s Nvidia GTC (slides).
No APIs
Unlike a lot of large research labs, Adept's framing of AGI as "being able to use your computer like a human" carries with it a useful environmental constraint:
“Having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path (to economic value).”
This realization and conviction means that multimodal modals are the way to go. Instead of using function calling to call APIs to build agents, which is what OpenAI and most of the open LLM industry have done to date, Adept wants to “drive by vision”, (aka see the screen as a human sees it) and pinpoint where to click and type as a human does. No APIs needed, because most software don’t expose APIs.
Extra context for readers: You can see the DeepMind SIMA model in the same light:
One system that learned to play a diverse set of games (instead of one dedicated model per game) using only pixel inputs and keyboard-and-mouse action outputs!
The OpenInterpreter team is working on a “Computer API” that also does the same.
To do this, Adept had to double down on a special kind of multimodality for knowledge work:
“A giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents…
…I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera… (but) where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs.
And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so Adept spent a lot of time building that.”
With this context, you can now understand the full path of Adept’s public releases:
* ACT-1 (Sept 2022): a large Transformers model optimized for browser interactions. It has a custom rendering of the browser viewport that allows it to better understand it and take actions.
* Persimmon-8B (Sept 2023): a permissive open LLM (weights and code here)
* Fuyu-8B (Oct 2023): a small version of the multimodal model that powers Adept. Vanilla decoder-only transformer with no specialized image encoder, which allows it to handle input images of varying resolutions without downsampling.
* Adept Experiments (Nov 2023): A public tool to build automations in the browser. This is powered by Adept's core technology but it's just a piece of their enterprise platform. They use it as a way to try various design ideas.
* Fuyu Heavy (Jan 2024) - a new multimodal model designed specifically for digital agents and the world’s third-most-capable multimodal model (beating Gemini Pro on MMMU, AI2D, and ChartQA), “behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger”
The Fuyu-8B post in particular exhibits a great number of examples on knowledge work multimodality:
Why Adept is NOT a Research Lab
With OpenAI now worth >$90b and Anthropic >$18b, it is tempting to conclude that the AI startup metagame is to build a large research lab, and attract the brightest minds and highest capital to build AGI.
Our past guests (see the Humanloop episode) and (from Imbue) combined to ask the most challenging questions of the pod - with David/Adept’s deep research pedigree from Deepmind and OpenAI, why is Adept not building more general foundation models (like Persimmon) and playing the academic benchmarks game? Why is Adept so focused on commercial agents instead?
“I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from “Can we make a better agent”…
… I think pure play foundation model companies are just going to be pinched by how good the next couple of (Meta Llama models) are going to be… And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.”
and the commercial grounding is his answer to Kanjun too (whom we also asked the inverse question to compare with Adept):
“… the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build AGI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations are not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals.. I think that's a degree of practicality that really helps.”
And his customers seem pretty happy, because David didn’t need to come on to do a sales pitch:
David: “One of the things we haven't shared before is we're completely sold out for Q1.”
Swyx: “Sold out of what?”
David: “Sold out of bandwidth to onboard more customers.”
Well, that’s a great problem to have.
Show Notes
* Dextro at Data Driven NYC (2015)
* Adept
* ACT-1
* Persimmon-8B
* Adept Experiments
* Fuyu-8B
* Amelia Wattenberger talk at AI Engineer Summit
* Figure
Chapters
* [00:00:00] Introductions
* [00:01:14] Being employee #30 at OpenAI and its early days
* [00:13:38] What is Adept and how do you define AGI?
* [00:21:00] Adept's critical path and research directions
* [00:26:23] How AI agents should interact with software and impact product development
* [00:30:37] Analogies between AI agents and self-driving car development
* [00:32:42] Balancing reliability, cost, speed and generality in AI agents
* [00:37:30] Potential of foundation models for robotics
* [00:39:22] Core research questions and reasons to work at Adept
Transcripts
Alessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.
Swyx [00:00:15]: Hey, and today we have David Luan, CEO, co-founder of Adept in the studio. Welcome.
David [00:00:20]: Yeah, thanks for having me.
Swyx [00:00:21]: Been a while in the works. I've met you socially at one of those VC events and you said that you were interested in coming on and glad we finally were able to make this happen.
David: Yeah, happy to be part of it.
Swyx: So we like to introduce the speaker and then also just like have you talk a little bit about like what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first sort of real time video detection classification API that was Dextro, and that was your route to getting acquired into Axon where you're a director of AI. Then you were the 30th hire at OpenAI?
David [00:00:53]: Yeah, 30, 35, something around there. Something like that.
Swyx [00:00:56]: So you were VP of Eng for two and a half years to two years, briefly served as tech lead of large models at Google, and then in 2022 started Adept. So that's the sort of brief CV. Is there anything else you like want to fill in the blanks or like people should know more about?
David [00:01:14]: I guess a broader story was I joined OpenAI fairly early and I did that for about two and a half to three years leading engineering there. It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and we're like, you know, you should take over our directs and we'll go mostly do IC work. So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened. The company, the Dota effort was going pretty hard and then more broadly trying to put bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then I led Google's LLM efforts, but also co-led Google Brain was one of the brain leads more broadly. You know, there's been a couple of different eras of AI research, right? If we count everything before 2012 as prehistory, which people hate it when I say that, kind of had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017. And I think the game changed in 2017 and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers. And I think-
Swyx [00:02:15]: It's causally neat.
David [00:02:16]: Yeah. Well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically compared to, you know, hey, we're just smaller Google Brain, or like you work at OpenAI if you live in SF and don't want to commute to Mountain View or don't want to live in London, right? That's like not enough to like hang your technical identity as a company. And so what we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about how do you like leave some room for that, but really make it about like, what are the big scientific outcomes that you want to show? And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple of years, right? And then what's changed now is I think the number one driver of AI products over the next couple of years is going to be the deep co-design and co-evolution of product and users for feedback and actual technology. And I think labs, every tool to go do that are going to do really well. And that's a big part of why I started Adept.
Alessio [00:03:20]: You mentioned Dota, any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?
David [00:03:33]: Like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, Hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's going to happen. Even this definition of like, Hey, AGI is something that outperforms humans at economically valuable tasks is kind of implicit view of the world about what's going to be the role of people. I think what I'm more interested in is like a definition of AGI that's oriented around like a model that can do anything a human can do on a computer. If you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so what did all the work we did on our own stuff like that get us was it got us a really clear formulation. Like you have a goal and you want to maximize the goal, you want to maximize reward, right? And the natural LLM formulation doesn't come with that out of the box, right? I think that we as a field got a lot right by thinking about, Hey, how do we solve problems of that caliber? And then the thing we forgot is the Novo RL is like a pretty terrible way to get there quickly. Why are we rediscovering all the knowledge about the world? Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there. Right.
Swyx [00:04:44]: The biological basis theory. Right.
David [00:04:46]: So I think we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning, everything that humans already know. Right. So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet in the future, the multimodal models are becoming more of a thing where behavioral cloning the visual world. But really, what we're just going to have is like a universal byte model, right? Where tokens of data that have high signal come in, and then all of those patterns are like learned by the model. And then you can regurgitate any combination now. Right. So text into voice out, like image into other image out or video out or whatever, like these like mappings, right? Like all just going to be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of how do we combine this with all of the lessons we learned during the RL period. That's what's going to drive progress.
Swyx [00:05:35]: I'm still going to pressure you for a few more early opening stories before we turn to the ADET stuff. On your personal site, which I love, because it's really nice, like personal, you know, story context around like your history. I need to update it. It's so old. Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right?
David [00:05:53]: I actually don't quite remember. I think I was joining right around- Right around then?
Swyx [00:05:57]: I was right around that, yeah. Yeah. So what I remember was Alec, you know, just kind of came in and was like very obsessed with Transformers and applying them to like Reddit sentiment analysis. Yeah, sentiment, that's right. Take us through-
David [00:06:09]: Sentiment neuron, all this stuff.
Swyx [00:06:10]: The history of GPT as far as you know, you know, according to you. Ah, okay.
David [00:06:14]: History of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized, where like, again, you and your three best friends write papers, right? Okay. So zooming way out, right? I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line. My job is not actually to promote a million ideas and never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right? That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too.
Swyx [00:07:17]: He's talking about trillion parameter models in 2017.
David [00:07:20]: Yeah. So that's the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But after GPT-2, we were all really excited about GPT-2. I can tell you more stories about that. It was the last paper that I even got to really touch before everything became more about building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing, right? He's got this decoder only transformer that's probably going to get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why, right? At the time, there was a thing called the brain credit marketplace. And did you guys know the brain credit marketplace? No, I never heard of this. Oh, so it's actually, it's a, you can ask any Googler.
Swyx [00:08:23]: It's like just like a thing that, that, I mean, look like, yeah, limited resources, you got to have some kind of marketplace, right? You know, sometimes it's explicit, sometimes it isn't, you know, just political favors.
David [00:08:34]: You could. And so then basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right? Of like this modern AI era to phase two. And I think in the same way, I think phase three company is going to out execute phase two companies because of the same asymmetry of success.
Swyx [00:09:12]: Yeah. I think it's underrated how much NVIDIA works with you in the early days as well. I think maybe, I think it was Jensen. I'm not sure who circulated a recent photo of him delivering the first DGX to you guys.
David [00:09:24]: I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for NVIDIA. It is unreal.
Swyx [00:09:34]: But like with OpenAI, like kind of give their requirements, like co-design it or just work of whatever NVIDIA gave them.
David [00:09:40]: So we work really closely with them. There's, I'm not sure I can share all the stories, but examples of ones that I've found particularly interesting. So Scott Gray is amazing. I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that. As a result, like we had very close ties to NVIDIA. Actually, one of my co-founders at Adept, Eric Elson, was also one of the early GPGPU people. So he and Scott and Brian Catanzaro at NVIDIA and Jonah and Ian at NVIDIA, I think all were very close. And we're all sort of part of this group of how do we push these chips to the absolute limit? And I think that kind of collaboration helped quite a bit. I think one interesting set of stuff is knowing the A100 generation, that like quad sparsity was going to be a thing. Is that something that we want to go look into, right? And figure out if that's something that we could actually use for model training. Really what it boils down to is that, and I think more and more people realize this, six years ago, people, even three years ago, people refused to accept it. This era of AI is really a story of compute. It's really the story of how do you more efficiently map actual usable model flops to compute,
Swyx [00:10:38]: Is there another GPT 2, 3 story that you love to get out there that you think is underappreciated for the amount of work that people put into it?
David [00:10:48]: So two interesting GPT 2 stories. One of them was I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments was we were writing the modeling section. And I'm pretty sure the modeling section was the shortest modeling section of any ML, reasonably legitimate ML paper to that moment. It was like section three model. This is a standard vanilla decoder only transformer with like these particular things, those paragraph long if I remember correctly. And both of us were just looking at the same being like, man, the OGs in the field are going to hate this. They're going to say no novelty. Why did you guys do this work? So now it's funny to look at in hindsight that it was pivotal kind of paper, but I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about, hey, is there like four different really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?
Swyx [00:11:42]: Right. And it's like you innovate on maybe like data set and scaling and not so much the architecture.
David [00:11:48]: We all know how it works now, right? Which is that there's a collection of really hard won knowledge that you get only by being at the frontiers of scale. And that hard won knowledge, a lot of it's not published. A lot of it is stuff that's actually not even easily reducible to what looks like a typical academic paper. But yet that's the stuff that helps differentiate one scaling program from another. You had a second one? So the second one is, there's like some details here that I probably shouldn't fully share, but hilariously enough for the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before. So I always had a tremendous amount of anxiety about partner meetings, which this basically this is what it was. I had Kevin Scott and Satya and Amy Hood, and it was my job to give the technical slides about what's the path to AGI, what's our research portfolio, all of this stuff, but it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up. And as we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so I'd spent all this time trying to figure out how to keep this thing on rails. I had my canned demos, but I knew I had to go turn it around over to Satya and Kevin and let them type anything in. And that just, that really kept me up all night.
Swyx [00:13:06]: Nice. Yeah.
Alessio [00:13:08]: I mean, that must have helped you talking about partners meeting. You raised $420 million for Adept. The last round was a $350 million Series B, so I'm sure you do great in partner meetings.
Swyx [00:13:18]: Pitchers meetings. Nice.
David [00:13:20]: No, that's a high compliment coming from a VC.
Alessio [00:13:22]: Yeah, no, I mean, you're doing great already for us. Let's talk about Adept. And we were doing pre-prep and you mentioned that maybe a lot of people don't understand what Adept is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse. Like what is Adept? Yeah.
David [00:13:38]: So I think Adept is the least understood company in the broader space of foundational models plus agents. So I'll give some color and I'll explain what it is and I'll explain also why it's actually pretty different from what people would have guessed. So the goal for Adept is we basically want to build an AI agent that can do, that can basically help humans do anything a human does on a computer. And so what that really means is we want this thing to be super good at turning natural language like goal specifications right into the correct set of end steps and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use. And so the end vision of this is effectively like I think in a couple of years everyone's going to have access to like an AI teammate that they can delegate arbitrary tasks to and then also be able to, you know, use it as a sounding board and just be way, way, way more productive. Right. And just changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of what should I be doing and why. Right. And I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out. I think systems like Adept are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models. We also don't sell bottom up products. We're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we work with a range of different companies, some like late stage multi-thousand people startups, some fortune 500s, et cetera. And what we do for them is we basically give them an out of the box solution where big complex workflows that their employees do every day could be delegated to the model. And so we look a little different from other companies in that in order to go build this full agent thing, the most important thing you got to get right is reliability. So initially zooming way back when, one of the first things that DEP did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere because like we did that in the original Act One demo and like showed that, showed like Google Sheets, all this other stuff. Over the last like year since that has come out, there's been a lot of really cool demos and you go play with them and you realize they work 60% of the time. But since we've always been focused on how do we build an amazing enterprise product, enterprises can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability. And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just blowing money and poor truck drivers going places.
Alessio [00:16:30]: Interesting. One of the, our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that, you know, you would ask somebody to do and the software just does it for you. When you think about these use cases, do the users still go in and look at the agent kind of like doing the things and can intervene or like are they totally removed from them? Like the truck thing is like, does the truck just show up or are there people in the middle checking in?
David [00:17:04]: I think there's two current flaws in the framing for services as software, or I think what you just said. I think that one of them is like in our experience, as we've been rolling out Adept, the people who actually do the jobs are the most excited about it because they don't go from, I do this job to, I don't do this job. They go from, I do this job for everything, including the shitty rote stuff to I'm a supervisor. And I literally like, it's pretty magical when you watch the thing being used because now it parallelizes a bunch of the things that you had to do sequentially by hand as a human. And you can just click into any one of them and be like, Hey, I want to watch the trajectory that the agent went through to go solve this. And the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result. It just fails to execute. And the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves. And so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it. I think the second piece of it that we've found is our strategy as a company is to always be an augmentation company. And I think one out of principle, that's something we really care about. But two, actually, if you're framing yourself as an augmentation company, you're always going to live in a world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback. And that's how you build a data flywheel. That's how you actually learn from the smartest humans how to solve things models can't do today. And so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job is to deliver you a lights off solution for X.
Alessio [00:18:42]: Yeah. It's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it. And similarly, in a software development, you have Copilot, which is the augmentation product, and then you have sweep.dev and you have these products, which they just do the whole thing. I'm really curious to see how that evolves. I agree that today the reliability is so important in the enterprise that they just don't use most of them. Yeah. Yeah. No, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, a dev, they do Act One, they do Persimon, they do Fuyu, they do all this stuff. Yeah, it's just the public stuff.
Swyx [00:19:20]: It's just public stuff.
David [00:19:21]: So one of the things we haven't shared before is we're completely sold out for Q1. And so I think...
Swyx [00:19:26]: Sold out of what?
David [00:19:27]: Sold out of bandwidth to go on board more customers. And so we're like working really hard to go make that less of a bottleneck, but our expectation is that I think we're going to be significantly more public about the broader product shape and the new types of customers we want to attract later this year. So I think that clarification will happen by default.
Swyx [00:19:43]: Why have you become more public? You know, if the whole push has... You're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things.
David [00:19:53]: I think we just flipped over that way fairly recently. That's a good question. I think it actually boils down to two things. One, I think that, frankly, a big part of it is that the public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January 2022, everybody in the field knew about the agents thing from RL, but the general public had no conception of what it was. They were still hanging their narrative hat on the tree of everything's a chatbot. And so I think now one of the things that I really care about is that when people think agent, they actually think the right thing. All sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. To me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps. And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Redept as they think about what the next thing they want to do in their careers. The field is quickly pivoting in a world where foundation models are looking more and more commodity. And I think a huge amount of gain is going to happen from how do you use foundation models as the well-learned behavioral cloner to go solve agents. And I think people who want to do agents research should really come to Redept.
Swyx [00:21:00]: When you say agents have become more part of the public narrative, are there specific things that you point to? I'll name a few. Bill Gates in his blog post mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying that agents are the future for open AI.
David [00:21:17]: I think before that even, I think there was something like the New York Times, Cade Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company, but now brand themselves as an AI agent company. It's just like, it's a term I just feel like people really want.
Swyx [00:21:31]: From the VC side, it's a bit mixed. Is it? As in like, I think there are a lot of VCs where like, I would not touch any agent startups because like- Why is that? Well, you tell me.
Alessio [00:21:41]: I think a lot of VCs that are maybe less technical don't understand the limitations of the-
Swyx [00:21:46]: No, that's not fair.
Alessio [00:21:47]: No, no, no, no. I think like- You think so? No, no. I think like the, what is possible today and like what is worth investing in, you know? And I think like, I mean, people look at you and say, well, these guys are building agents. They needed 400 million to do it. So a lot of VCs are maybe like, oh, I would rather invest in something that is tacking on AI to an existing thing, which is like easier to get the market and kind of get some of the flywheel going. But I'm also surprised a lot of funders just don't want to do agents. It's not even the funding. Sometimes we look around and it's like, why is nobody doing agents for X? Wow.
David [00:22:17]: That's good to know actually. I never knew that before. My sense from my limited perspective is there's a new agent company popping up every day.
Swyx [00:22:24]: So maybe I'm- They are. They are. But like I have advised people to take agents off of their title because it's so diluted.
David [00:22:31]: It's now so diluted.
Swyx [00:22:32]: Yeah. So then it doesn't stand for anything. Yeah.
David [00:22:35]: That's a really good point.
Swyx [00:22:36]: So like, you know, you're a portfolio allocator. You have people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adepts and sort of research directions? Kind of take us through the stuff you shipped recently and how people should think about the trajectory of what you're doing.
David [00:22:56]: The critical path for adepts is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is can we teach large model basically how to even actuate your computer? And I think we're one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and texts. But I think from there on out, we really realized was that in order to get reliability, companies just do things in various different ways. You actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing. And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents. Back then we had to do a ton of research basically on how do we actually make that possible? Well, first off, like back in forgot exactly one month to 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the Fuyu architecture. I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera. Coco. Yeah, right. And the Coco is awesome. Like I love Coco. I love TY. Like it's really helped the field. Right. But like that's the build one thing. I actually think it's really clear today. Multimodal models are the default foundation model, right? It's just going to supplant LLMs. Like you just train a giant multimodal model. And so for that though, like where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. Right. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so a depth spent a lot of time building that. And so the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff. But you take a lot of that data and then you make it really fast and make it really good at things like dense OCR on screens. And then now you have the right like raw putty to go make a good agent. So that's kind of like some of the modeling side, we've kind of only announced some of that stuff. We haven't really announced much of the agent's work, but that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think we're seeing, and you guys probably see this a little bit more than I do, but we're seeing like a little bit of a pushback against the tyranny of chatbots as form factor. And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full vertical integration of all these bits in order to get to where we are.
Swyx [00:25:44]: Yeah. I'll plug Amelia Wattenberger’s talk at our conference, where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. I was kind of excited at Adept experiments or Adept workflows, I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product.
David [00:26:06]: So you basically just use experiments as like a way to go push various ideas on the design side to some people and just be like, yeah, we'll play with it. Actually the experiments code base underpins the actual product, but it's just the code base itself is kind of like a skeleton for us to go deploy arbitrary cards on the side.
Swyx [00:26:22]: Yeah.
Alessio [00:26:23]: Makes sense. I was going to say, I would love to talk about the interaction layer. So you train a model to see UI, but then there's the question of how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like, they manage the end point. So the whole computer, you're more at the browser level. I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is going to change with that in mind as more and more of the work is done by agents instead of people?
David [00:26:58]: This is, there's so much surface area here and it's actually one of the things I'm really excited about. And it's funny because I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool. So I would say the best analogy I have to why Adept is pursuing a path of being able to use your computer like a human, plus of course being able to call APIs and being able to call APIs is the easy part, like being able to use your computer like a human is a hard part. It's in the same way why people are excited about humanoid robotics, right? In a world where you had T equals infinity, right? You're probably going to have various different form factors that robots could just be in and like all the specialization. But the fact is that humans live in a human environment. So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. I kind of think about this early days of the agent interaction layer level is a little bit like, do you all remember Windows 3.1? Like those days? Okay, this might be, I might be, I might be too old for you guys on this. But back in the day, Windows 3.1, we had this transition period between pure command line, right? Being the default into this new world where the GUI is the default and then you drop into the command line for like programmer things, right? The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing. And you typed Windows and you hit enter, and then you got put into Windows. And then the GUI kind of became a layer above the command line. The same thing is going to happen with agent interfaces is like today we'll be having the GUI is like the base layer. And then the agent just controls the current GUI layer plus APIs. And in the future, as more and more trust is built towards agents and more and more things can be done by agents, if more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, what changes for software is that a lot of software is going to be either systems or record or like certain customized workflow execution engines. And a lot of how you actually do stuff will be controlled at the agent layer.
Alessio [00:29:19]: And you think the rabbit interface is more like it would like you're not actually seeing the app that the model interacts with. You're just saying, hey, I need to log this call on Salesforce. And you're never actually going on salesforce.com directly as the user. I can see that being a model.
David [00:29:33]: I think I don't know enough about what using rabbit in real life will actually be like to comment on that particular thing. But I think the broader idea that, you know, you have a goal, right? The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems or record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal, all just really leads to a world where you don't really need to ever interface with the apps underneath unless you're a power user for some niche thing.
Swyx [00:30:03]: General question. So first of all, I think like the sort of input mode conversation. I wonder if you have any analogies that you like with self-driving, because I do think like there's a little bit of how the model should perceive the world. And you know, the primary split in self-driving is LiDAR versus camera. And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like the multimodal approach, you know, multimodal vision, very heavy vision, all the Fuyu stuff that you're doing. You're focusing on that, including charts and tables. And do you find that inspiration there from like the self-driving world? That's a good question.
David [00:30:37]: I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. I think that's awesome. But I think that our number one goal is for agents not to look like self-driving. We want to minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving. We want to be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, compared to self-driving, like two things that people really undervalue is like really easy to driving a car down highway 101 in a sunny day demo. That actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is a really strong focus on actually why does the model not do this thing? And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems.
Swyx [00:31:43]: I was slightly surprised just because I do generally consider the way most that we see all around San Francisco as the most, I guess, real case of agents that we have in very material ways.
David [00:31:55]: Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature from when it entered the consciousness and the driving down 101 on a sunny day moment happened to now. Right. So I want to see that more compressed.
Swyx [00:32:07]: And I mean, you know, cruise, you know, RIP. And then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale. Because you have reliability, you also have cost, you have speed, speed is a huge emphasis for a debt. The tendency or the temptation is to reduce generality to improve reliability and to improve cost, improve speed. Do you perceive a trade-off? Do you have any insights that solve those trade-offs for you guys?
David [00:32:42]: There's definitely a trade-off. If you're at the Pareto frontier, I think a lot of folks aren't actually at the Pareto frontier. I think the way you get there is basically how do you frame the fundamental agent problem in a way that just continues to benefit from data? I think one of the main ways of being able to solve that particular trade-off is you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible. I think that's how you really solve. Then you get into the other problems like, okay, are you overfitting on these end use cases? You're not doing a thing where you're being super prescriptive for the end steps that the model can only do, for example.
Swyx [00:33:17]: Then the question becomes, do you have one house model that you can then customize for each customer and you're fine-tuning them on each customer's specific use case?
David [00:33:25]: Yeah.
Swyx [00:33:26]: We're not sharing that. You're not sharing that. It's tempting, but that doesn't look like AGI to me. You know what I mean? That is just you have a good base model and then you fine-tune it.
David [00:33:35]: For what it's worth, I think there's two paths to a lot more capability coming out of the models that we all are training these days. I think one path is you figure out how to spend, compute, and turn it into data. In that path, I consider search, RL, all the things that we all love in this era as part of that path, like self-play, all that stuff. The second path is how do you get super competent, high intelligence demonstrations from humans? I think the right way to move forward is you kind of want to combine the two. The first one gives you maximum sample efficiency for a little second, but I think that it's going to be hard to be running at max speed towards AGI without actually solving a bit of both.
Swyx [00:34:16]: You haven't talked much about synthetic data, as far as I can tell. Probably this is a bit too much of a trend right now, but any insights on using synthetic data to augment the expensive human data?
David [00:34:26]: The best part about framing AGI as being able to help people do things on computers is you have an environment.
Swyx [00:34:31]: Yes. So you can simulate all of it.
David [00:34:35]: You can do a lot of stuff when you have an environment.
Alessio [00:34:37]: We were having dinner for our one-year anniversary. Congrats. Yeah. Thank you. Raza from HumanLoop was there, and we mentioned you were coming on the pod. This is our first-
Swyx [00:34:45]: So he submitted a question.
Alessio [00:34:46]: Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPD 4 Data and Exist, now you have a GPD 4 vision and help you building a lot of those things. How do you think about the things that are unique to you as Adept, and like going back to like the maybe research direction that you want to take the team and what you want people to come work on at Adept, versus what is maybe now become commoditized that you didn't expect everybody would have access to?
David [00:35:11]: Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two so he can push back on my assumption about his question, but I think implicit in that question is calculus of where does advantage accrue in the overall ML stack. And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is always going to be a giant advantage of vertical integration. I think like it lets us do things like have a really, really fast base model, is really good at agent things, but is bad at cat and dog photos. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos, right? So like we're allocating our capacity wisely, right? That's like one thing that you really get to do. I also think that the other thing that is pretty important now in the broader foundation modeling space is I feel despite any potential concerns about how good is agents as like a startup area, right? Like we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from can we make a better agent? Because right now I think we all see that, you know, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so I think pure play foundation model companies are just going to be pinched by how good the next couple of llamas are going to be and the next what good open source thing. And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.
Swyx [00:36:56]: So you don't consider yourself a pure play foundation model company?
David [00:36:59]: No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other...
Swyx [00:37:06]: You're dedicated towards the agent. Yeah.
David [00:37:09]: And our business is an agent business. We're not here to sell you tokens, right? And I think like selling tokens, unless there's like a...
Swyx [00:37:14]: Not here to sell you tokens. I love it.
David [00:37:16]: It's like if you have a particular area of specialty, right? Then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute. But if you don't have a specialty, I find that, I think it's going to be a little tougher.
Swyx [00:37:27]: Interesting. Are you interested in robotics at all? Just a...
David [00:37:30]: I'm personally fascinated by robotics. I've always loved robotics.
Swyx [00:37:33]: Embodied agents as a business, you know, Figure is like a big, also sort of open AI affiliated company that raises a lot of money.
David [00:37:39]: I think it's cool. I think, I mean, I don't know exactly what they're doing, but...
Swyx [00:37:44]: Robots. Yeah.
David [00:37:46]: Well, I mean, that's a...
Swyx [00:37:47]: Yeah. What question would you ask? If we had them on, what would you ask them?
David [00:37:50]: Oh, I just want to understand what their overall strategy is going to be between now and when there's reliable stuff to be deployed. But honestly, I just don't know enough about it.
Swyx [00:37:57]: And if I told you, hey, fire your entire warehouse workforce and, you know, put robots in there, isn't that a strategy? Oh yeah.
David [00:38:04]: Yeah. Sorry. I'm not questioning whether they're doing smart things. I genuinely don't know what they're doing as much, but I think there's two things. One, I'm so excited for someone to train a foundation model of robots. It's just, I think it's just going to work. Like I will die on this hill, but I mean, like again, this whole time, like we've been on this podcast, we're just going to continually saying these models are basically behavioral cloners. Right. So let's go behavioral clone all this like robot behavior. Right. And then you figure out everything else you have to do in order to teach you how to solve a new problem. That's going to work. I'm super stoked for that. I think unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero sum job replacement play. Right. And I'm personally less excited about that.
Alessio [00:38:46]: We had a Ken June from InBoo on the podcast. We asked her why people should go work there and not at Adept.
Swyx [00:38:52]: Oh, that's so funny.
Alessio [00:38:54]: Well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent. And for her, the biggest research thing was like getting models, better reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at Adept instead of InBoo? And maybe what are like the core research questions that people should be passionate about to have fun at Adept? Yeah.
David [00:39:22]: First off, I think that I'm sure you guys believe this too. The AI space to the extent there's an AI space and the AI agent space are both exactly as she likely said, I think colossal opportunities and people are just going to end up winning in different areas and a lot of companies are going to do well. So I really don't feel that zero something at all. I would say to like change the zero sum framing is why should you be at Adept? I think there's two huge reasons to be at Adept. I think one of them is everything we do is in the service of like useful agents. We're not a research lab. We do a lot of research in service of that goal, but we don't think about ourselves as like a classic research lab at all. And I think the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations, they're not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals, solve it, right? I think that's really cool. Like everybody knows a lot of these evals are like pretty saturated and the new ones that even are not saturated. You look at someone and you're like, is this actually useful? Right? I think that's a degree of practicality that really helps. Like we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff. They're very grounded in actual needs right now, which is really cool.
Swyx [00:40:45]: Yeah. This has been a wonderful dive. You know, I wish we had more time, but I would just leave it kind of open to you. I think you have broad thoughts, you know, just about the agent space, but also just in general AI space. Any, any sort of rants or things that are just off of mind for you right now?
David [00:40:57]: Any rants?
Swyx [00:40:59]: Mining you for just general...
David [00:41:01]: Wow. Okay. So Amelia has already made the rant better than I have, but, but like not just, not just chatbots is like kind of rant one. And two is AI has really been the story of compute and compute plus data and ways in which you could change one for the other. And I think as much as our research community is really smart, we have made many, many advancements and that's going to continue to be important. But now I think the game is increasingly changing and the rapid industrialization era has begun. And I think we unfortunately have to embrace it.
Swyx [00:41:30]: Yep.
Alessio [00:41:31]: Excellent. Awesome, David. Thank you so much for your time.
David [00:41:34]: Cool. Thanks guys.
Get full access to Latent Space at www.latent.space/subscribe
[by:whisper.cpp]
[00:00.00]大家好,歡迎大家來到「Lit and Space Pockest」
[00:02.50]我是Alessio,會員,和CTO在職業的職業會員
[00:05.74]我是Makojo Swicks, founder of SmallAI
[00:08.84]今天我們有David Luan, co-founder of ADEPT,在工作室,歡迎
[00:12.98]謝謝你
[00:14.10]一段時間在工作,我遇到你在VC的社交平台上
[00:17.98]你也說了,你很興奮,我們終於能夠做到這件事了
[00:21.88]對,很高興認識你
[00:23.88]我們想介紹你的職業,然後再說一下你剛才說了什麼,在你的連結,什麼人應該知道你
[00:32.02]你開始了一間公司,是第一次在實際視頻的視頻研究,例如DEXTRO,那是你的路,在你導致的AI,你開始了XON,然後你開始了30年,你開始了OpenAI?
[00:47.06]對,30、35年,或是在那裡,或是在那裡,VP Avenge,兩年半,兩年半後,
[00:53.48]我們在2022年開始了一個大型模式的創新
[00:57.08]然後在2022年開始了一個大型模式的創新
[01:00.32]所以那是一個短暫的CV
[01:02.98]是否有其他東西?
[01:03.98]對,是否有其他東西?
[01:04.98]你覺得要做什麼?
[01:05.98]或是人們應該知道更多?
[01:07.98]我猜是一個比較大的故事
[01:09.48]是加入OpenAI比較早期的
[01:11.98]然後就做了兩、三個月的研究
[01:15.48]那是很有趣的
[01:16.48]第二或第三天的我的時間在OpenAI
[01:18.98]Gregg and Ilya 找我住在房間,我們說要拿到我們的創新,我們會去…
[01:23.98]我看過很多創新的工作
[01:25.98]所以那是很有趣的
[01:26.98]就在結合了一堆團隊
[01:28.98]有幾個早期的領導人已經有了
[01:30.98]公司的資料項目是很努力的
[01:32.98]然後再多次地在大型研究中放大型的圖案
[01:35.98]我們在做基本研究
[01:36.98]所以我花了很多時間在做這個
[01:37.98]然後我再加上Google的LM項目
[01:39.98]但也加上Google的Brain
[01:41.98]是一個Brain的領導人,更多次地
[01:42.98]你知道,有幾個不同的領導人在AI的研究
[01:46.98]我們在2012 before prehistory
[01:48.98]很多人很討厭我
[01:50.98]我跟你們三個最好的朋友
[01:51.98]寫了一個研究的文件
[01:53.98]從2012到2017
[01:56.98]我覺得遊戲的改善在2017
[01:58.98]然後很多學生都沒有發現
[01:59.98]但是我們在OpenAI上真的做了
[02:01.98]我想大部分的幫助是
[02:02.98]Ilya的 constant beating of the drum
[02:04.98]讓世界被遮蓋在data centers
[02:06.98]還有其他人需要…
[02:07.98]對,我覺得我們有確定在那裡
[02:10.98]但沒有到我們開始看到
[02:11.98]結果的結果,那是我們要去的
[02:14.98]但也有一個部分
[02:15.98]是在OpenAI上
[02:16.98]我第一次加入
[02:17.98]我認為一件事我必須要做
[02:19.98]是如何告訴我們
[02:20.98]我們是否有不同的觀點
[02:22.98]比起我們是更小的GoogleBrain
[02:25.98]或是我們在OpenAI上
[02:26.98]只要生活在SF
[02:27.98]然後不想接受Mountain View
[02:28.98]或不想要生活在London
[02:29.98]那是不足夠的
[02:31.98]利用你的技術活動
[02:33.98]所以我們真的…
[02:34.98]我花了很多時間在推廣這個
[02:36.98]就是我們要怎麼
[02:37.98]要專注在
[02:38.98]一個大學生的大學生
[02:41.98]你從最底下的研究
[02:44.98]變成了
[02:45.98]如何讓你放棄這個環境
[02:47.98]而讓你覺得
[02:48.98]什麼是大學生的大學生
[02:50.98]想要展現
[02:51.98]然後你把他們解決
[02:52.98]所有的財困
[02:53.98]不管是否要在創意
[02:54.98]創作什麼
[02:55.98]這就變成了
[02:56.98]大學生的大學生
[02:57.98]對嗎
[02:58.98]然後現在的改變
[02:59.98]是我認為
[03:00.98]第一次加入AiPrice
[03:01.98]在下一幾年
[03:02.98]會是最深的
[03:03.98] co-design
[03:04.98]和 co-evolution
[03:05.98]產品和資料
[03:07.98]和實際技術
[03:08.98]而我認為
[03:09.98]每個技術的技術
[03:10.98]都會做得很好
[03:11.98]那是一大部分
[03:12.98]為何我開始深入
[03:13.98]你提及Dota
[03:14.98]哪些記憶在想
[03:16.98]從RL 和 Transformers
[03:18.98]在時間中
[03:19.98]然後我認為
[03:20.98]製造的工具
[03:21.98]更加在LM 上
[03:23.98]然後離開
[03:24.98]更多的Agent Simulation
[03:25.98]工作
[03:26.98]像在移動的道路
[03:27.98]我覺得Agent
[03:28.98]是一個
[03:29.98]完全正確的長途
[03:30.98]你只要去找
[03:31.98]AGI 是吧
[03:32.98]你會說
[03:33.98]首先
[03:34.98]我其實不喜歡AGI
[03:35.98]用人的改變
[03:36.98]因為我真的不想
[03:37.98]這樣會發生
[03:38.98]我認為這個改變
[03:39.98]AGI 是一些
[03:40.98]人們表現的
[03:41.98]非常值得的技術
[03:43.98]是一個
[03:44.98]極端的看法
[03:45.98]和人的改變
[03:46.98]我認為
[03:47.98]我比較有興趣
[03:48.98]AGI 的改變
[03:49.98]就是
[03:50.98]一個模式
[03:51.98]可以做任何的
[03:52.98]人能做的
[03:53.98]如果你想到
[03:54.98]超級有趣
[03:55.98]Agent
[03:56.98]是一種
[03:57.98]自然的
[03:58.98]改變
[03:59.98]所以
[04:00.98]所有的工作
[04:01.98]我們在RL
[04:02.98]這些技術
[04:03.98]導致我們
[04:04.98]有很清楚的
[04:05.98]形容
[04:06.98]你需要增加
[04:07.98]你需要增加
[04:08.98]對
[04:09.98]而自然的LM
[04:10.98]形容
[04:11.98]沒有出現
[04:12.98]我認為
[04:13.98]我們
[04:14.98]在這個場地
[04:15.98]有很多想法
[04:16.98]想想
[04:17.98]我們如何解決
[04:18.98]問題的問題
[04:19.98]然後
[04:20.98]我們忘記
[04:21.98]我們在RL
[04:22.98]是一個
[04:23.98]很不容易的
[04:24.98]方式
[04:25.98]我們為何
[04:26.98]我們在世界
[04:27.98]找到所有的
[04:28.98]知識
[04:29.98]我們在一年
[04:30.98]和一位
[04:31.98]伯克里斯教授
[04:32.98]教授
[04:33.98]我們會拿到
[04:34.98]AGI
[04:35.98]他的觀點
[04:36.98]對
[04:37.98]他的理想
[04:38.98]對
[04:39.98]所以
[04:40.98]我們都在
[04:41.98]記錄
[04:42.98]我們會
[04:43.98]解決
[04:44.98]我們已經解決
[04:45.98]LM
[04:46.98]我們已經解決
[04:47.98]我們已經解決
[04:48.98]我們已經解決
[04:49.98]我們已經解決
[04:50.98]我們已經解決
[04:51.98]我們已經解決
[04:52.98]我們已經解決
[04:53.98]我們已經解決
[04:54.98]我們已經解決
[04:55.98]我們已經解決
[04:56.98]我們已經解決
[04:57.98]我們已經解決
[04:58.98]我們已經解決
[04:59.98]我們已經解決
[05:00.98]我們已經解決
[05:01.98]我們已經解決
[05:02.98]每一句
[05:03.98]文字
[05:04.98]然後所有的圖案都會學習到模式
[05:07.94]然後你能夠合作任何的組織
[05:10.14]例如寫進、聲音、畫面、其他畫面、影片等等
[05:14.42]這些都是圖案的圖案,可以學習到這類的動作
[05:18.50]所以我希望我們能夠解決這件事
[05:20.10]然後我們回到當時的歷史
[05:22.74]我們如何跟我們一起學習這些圖案的學習
[05:27.06]這就是我們要去進行的進步
[05:28.62]我還要向大家提醒你多多的明年開放的故事
[05:31.30]我們再回到大陸的故事
[05:32.90]在你的個人網站,我愛的,因為是一個很好的個人的故事
[05:37.38]故事的內容,像你的歷史
[05:39.38]我需要更新,因為太老了
[05:42.38]但是你提及GPC2,你忘記了GPC1嗎?我認為你忘記了,對吧?
[05:46.18]我其實不太記得,我記得在那邊,我記得在那邊
[05:50.70]對,《Canonical Story》是阿力的故事,他很擔心傳播者和傳播者
[05:58.74]傳播者和傳播者和傳播者的訊息
[06:01.38]對,你帶我們去… 拿我們傳播者和傳播者和傳播者的訊息
[06:03.66]GPC的歷史,你也知道,對你來說
[06:07.46]對我來說,歷史和GPC的歷史是一個很好的問題
[06:10.02]所以我認為《Canonical Story》的故事,GPC的歷史是在谷歌上,對吧?
[06:14.30]因為那是關於傳播者的故事
[06:17.30]而我認為最驚訝的一件事,是…
[06:21.26]這是一個成績,例如在谷歌設立,你跟你的最好的朋友寫文章,對吧?
[06:26.26]好,所以在調查,我認為我的工作,當我當了學校的學長,是一個領導的領導人,對吧?
[06:33.02]所以我真的有很好的朋友,我的工作是把人們的小數目和好幾個好意義,然後向他們進行完結的工作
[06:41.10]我的工作不是在提供一百萬個意義,然後沒有任何股份的資料
[06:45.54]然後當我的想法開始合作,然後我開始工作,我的工作是向他們扭動資料,向他們做好工作
[06:52.50]然後開始將一些不正確的工作拆除,對吧?
[06:56.06]那股股份並沒有存在在我的時間在谷歌上
[06:59.34]如果他們有做好工作,他們會說:
[07:02.06]“喂,你真棒,你懂這些東西的效果嗎?”
[07:05.98]“這裡是所有的我們的TPUs,然後我認為他們會殺掉我們”
[07:09.94]他肯定是想要的,他在2017年也說了一百萬公升的計劃
[07:13.18]對,所以我認為這回合是在關於GPT的故事,對嗎?
[07:15.98]就是我正在跳舞歷史,對嗎?
[07:18.38]但在GPT2之後,我們都很期待GPT2,我可以告訴你更多的故事
[07:22.50]這是我最後的一篇文章,我甚至真的受到觸碍了,所以我變成了研究研究研究員
[07:27.70]每天每天我們進行GPT3,我會醒來,然後感到緊張
[07:32.38]我感到緊張,因為…你只要看看Fax,對嗎?
[07:35.54]Google有所有的帖子,Google有所有的人 who invented all of these underlying technologies
[07:40.74]有一個人叫Noam,他很聰明,他已經做了這個討論,他想要一百萬的計劃模式
[07:46.54]我認為我們可能只是在做一些複雜的研究,對嗎?他有這個扣子,只有轉換模式,他可能會在我們之前進行的
[07:54.66]我心想,拜託,讓這個模式結束,對嗎?
[07:57.90]然後,整個時間都變成了他們沒有得到股票的資金
[08:01.62]所以,我年紀中,我帶了Google的LM的活動,我當時是一名手機的,我變得很清楚為什麼,對嗎?
[08:06.98]那時候,有一個東西叫做"Brain Credit Marketplace"
[08:11.06]你記得Brain Credit Marketplace嗎?
[08:13.26]沒有,我沒聽過這說法
[08:14.30]其實,你會問任何Google,就像一件事,對嗎?
[08:18.58]對,有限定資訊,你必須有一個市場的市場,對嗎?
[08:23.06]你可能,有些時候是貧富,有些時候是政治欺負
[08:27.34]你可能,所以,基本上,每個人都要給錢,對嗎?
[08:30.10]如果你有錢,你必須買N-CHIPS,按照貿易和責任的方式
[08:33.74]如果你想做一個大職業,你可能有19、20個朋友不願意去工作
[08:38.86]如果這就是它們的效果
[08:40.74]它們很難得獲得
[08:42.14]當中的肺炎
[08:43.86]去學習這些東西
[08:44.98]而 Google 的團隊
[08:45.86]正在打架
[08:47.02]但我們只能打擊它們
[08:48.22]因為我們拿了大大的肺炎
[08:50.62]然後我們注射
[08:51.42]然後我認為
[08:52.30]這就像是一部分的故事
[08:53.54]像是一部分的歷史
[08:54.34]像是一部分的歷史
[08:55.62]像是一部分的歷史
[08:57.58]像是一部分的歷史
[08:58.90]我認為同樣的
[09:00.22]我認為一部分的
[09:01.02]三部分會成為
[09:01.90]一部分的歷史
[09:03.22]因為是一部分的
[09:04.18]一部分的成績
[09:05.62]對
[09:06.30]我覺得這部分的內容是如何的
[09:07.70]和影片也有關的
[09:09.02]在前一天的情況下
[09:10.06]我認為可能
[09:11.10]我認為是Jensen
[09:11.90]不確定是誰
[09:12.86]把最近的照片
[09:13.90]給大家看過的
[09:15.26]他在第一張DGX的照片中
[09:17.66]我覺得Jensen 已經是
[09:19.06]一個完美的
[09:21.22]技術
[09:21.94]和精神的一切
[09:24.10]我對NVIDIA的尊敬有多大關注
[09:26.22]是不實際的
[09:26.94]但我會打開
[09:27.74]我給他們的需要
[09:29.46]讓他們構思一下
[09:30.30]或者
[09:31.34]你只要用任何NVIDIA給他們的東西
[09:33.70]所以我們很接近他們的工作
[09:35.38]我不確定能分享所有的故事
[09:37.62]但例子是我找到的
[09:39.42]特別有趣的
[09:40.14]所以 Scott Gray 是很棒的
[09:41.54]我很喜歡他
[09:42.22]他在我的隊伍中
[09:43.30]是一名超級電腦隊伍
[09:45.62]就是Chris Burner 做的
[09:46.74]Chris Burner 還做了很多東西
[09:48.82]結果
[09:49.70]我們有很接近NVIDIA的 ties
[09:52.62]其實我的 co-founder
[09:53.70]在Adept Eric Elson
[09:54.74]是一位以前的GPGPU人士
[09:56.78]所以他和Scott
[09:57.82]和Brian Kanzaro
[09:58.86]NVIDIA
[09:59.66] and Jonah
[10:00.26] and Ian at NVIDIA
[10:01.14]我覺得我們全都很接近
[10:02.54]我們是一部分的組織
[10:03.70]我們如何推動這些股票的限度
[10:05.82]我覺得那種組織
[10:07.42]幫助了我們
[10:08.38]我想有趣的部分
[10:09.50]是 knowing the A100 generation
[10:11.22]那個Quadsbar city
[10:12.26]會是一件事
[10:12.98]是我們想找到的
[10:14.50]來解決
[10:15.22]這是我們可以利用的
[10:16.50]模特兒訓練
[10:17.14] really what it boils down to
[10:18.50]是
[10:19.22]我認為更多人
[10:20.06]知道這件事
[10:21.26]6 年前
[10:22.34]甚至3 年前
[10:23.34]人們拒絕接受
[10:24.98]這個AI 是一件故事
[10:27.02]是一件故事
[10:27.62]如何讓你更能复入
[10:29.22]實際使用模特兒
[10:30.38]使用模特兒
[10:31.66]還有GPT 2 3 故事嗎
[10:35.78]你喜歡在外面
[10:37.78]我認為是
[10:38.78]很欣賞
[10:39.86]這個模特兒的作用
[10:41.66]有趣的GPT 2 故事
[10:43.66]我花了很長的時間
[10:45.86]幫Alex使用模特兒
[10:48.58]我記得
[10:49.82]最有趣的一刻
[10:52.22]是我們寫了模特兒
[10:54.70]我確定模特兒
[10:56.22]是一個最短的模特兒
[10:57.70]有任何ML
[10:58.70]像是最理想的
[10:59.90]ML 模特兒
[11:01.42]是三個模特兒
[11:03.18]這是一種模特兒
[11:04.54]Vanilla 模特兒
[11:05.58]只有轉換的模特兒
[11:06.38]這些特別的東西
[11:07.34]我記得是在《ParaGraph》裡
[11:08.58]我記得是在《ParaGraph》裡
[11:09.42]我們都在看這件事
[11:11.02]我認為是很難看的模特兒
[11:11.82]OGs 在廣場上
[11:13.02]會很討厭這個模特兒
[11:14.02]他們會說沒有創意
[11:15.50]為什麼你們要做這個作用
[11:16.94]現在是很有趣的
[11:18.02]在後期的看法是
[11:19.54]一件很刺激的作用
[11:20.82]但我覺得是一件很早的事
[11:22.54]我們完全遲到
[11:24.42]我們都要關心的問題是 AI 和不關的
[11:27.58]是否有四種不同的想法
[11:29.34]是否有一個很簡單的想法
[11:30.34]是否有一個很簡單的想法
[11:31.34]是否有一個很簡單的想法
[11:32.34]是否有一個很簡單的想法
[11:33.34]是否有一個很簡單的想法
[11:34.34]是否有一個很簡單的想法
[11:35.34]是否有一個很簡單的想法
[11:36.34]是否有一個很簡單的想法
[11:37.34]是否有一個很簡單的想法
[11:38.34]是否有一個很簡單的想法
[11:39.34]是否有一個很簡單的想法
[11:40.34]是否有一個很簡單的想法
[11:41.34]是否有一個很簡單的想法
[11:42.34]是否有一個很簡單的想法
[11:43.34]是否有一個很簡單的想法
[11:44.34]是否有一個很簡單的想法
[11:45.34]是否有一個很簡單的想法
[11:46.34]是否有一個很簡單的想法
[11:47.34]是否有一個很簡單的想法
[11:48.34]是否有一個很簡單的想法
[11:49.34]是否有一個很簡單的想法
[11:50.34]是否有一個很簡單的想法
[11:51.34]是否有一個很簡單的想法
[11:52.34]是否有一個很簡單的想法
[11:53.34]是否有一個很簡單的想法
[11:54.34]是否有一個很簡單的想法
[11:55.34]是否有一個很簡單的想法
[11:56.34]是否有一個很簡單的想法
[11:57.34]是否有一個很簡單的想法
[11:58.34]是否有一個很簡單的想法
[11:59.34]是否有一個很簡單的想法
[12:00.34]是否有一個很簡單的想法
[12:01.34]是否有一個很簡單的想法
[12:02.34]是否有一個很簡單的想法
[12:03.34]是否有一個很簡單的想法
[12:04.34]是否有一個很簡單的想法
[12:05.34]是否有一個很簡單的想法
[12:06.34]是否有一個很簡單的想法
[12:07.34]是否有一個很簡單的想法
[12:08.34]是否有一個很簡單的想法
[12:09.34]之前 Microsoft invested in OpenAI
[12:11.34]Sam Altman, myself, and our CFO
[12:13.34] flew up to Seattle
[12:14.34] to do the final pitch meeting
[12:16.34] and I’d been a founder before
[12:17.34] so I always had a tremendous amount of anxiety
[12:19.34] about partner meetings
[12:21.34] which this basis is what it was
[12:22.34] it was like Kevin Scott
[12:23.34] and Satya and Amy Hood
[12:25.34] and it was my job to give the technical slides
[12:27.34] about what’s the path to AGI
[12:29.34] what’s our research portfolio
[12:30.34] all of this stuff
[12:31.34] but it was also my job to give the GPT-2 demo
[12:34.34] we had a slightly bigger version of GPT-2
[12:36.34] that we had just cut
[12:38.34] maybe a day or two before this flight up
[12:40.34] and as we all know now
[12:42.34]Model behaviors you find predictable
[12:44.34] at one checkpoint
[12:45.34] are not predictable in another checkpoint
[12:46.34] and so like I spent all this time
[12:48.34] trying to figure out how to keep this thing on rails
[12:50.34] I had my canned demos
[12:51.34] but I knew I had to go
[12:52.34] turn it around over to Satya and Kevin
[12:54.34] and let them type anything in
[12:56.34] and that just that really kept me up all night
[12:58.34]Nice, yeah
[13:00.34]I mean that must have helped you
[13:01.34] talking about partners meeting
[13:03.34]You raised 420 million for ADAPT
[13:06.34]The last round was a $350 million series B
[13:09.34]So I’m sure you do great
[13:10.34]Pitching and painting
[13:12.34]Nice
[13:13.34]No, that’s a high compliment coming from a VC
[13:15.34]Yeah, I mean you’re doing great
[13:17.34]Let’s talk about ADAPT
[13:19.34]and we were doing pre prep
[13:21.34]and you mentioned that maybe a lot of people
[13:22.34]don’t understand what ADAPT is
[13:23.34]So usually we try and introduce the product
[13:26.34]and then have the founders fill in the blanks
[13:27.34]but maybe let’s do the reverse
[13:28.34]Like what is ADAPT?
[13:30.34]Yeah, so I think ADAPT
[13:31.34]is the least understood company
[13:34.34]in the broader space of foundation models
[13:36.34]plus agents
[13:37.34]So I’ll give some color
[13:39.34]and I’ll explain what it is
[13:40.34]and I’ll explain also
[13:41.34]why it’s actually pretty different
[13:43.34]from what people would have guessed
[13:44.34]So the goal for ADAPT
[13:46.34]is we basically want to build an AI agent
[13:48.34]that can do
[13:49.34]that can basically help humans
[13:50.34]do anything a human does on a computer
[13:51.34]and so what that really means is
[13:53.34]we want this thing to be super good
[13:55.34]at turning natural language
[13:56.34]like goal specifications
[13:58.34]right into the correct set of end steps
[14:00.34]and then also have all the correct sensors
[14:02.34]and actuators
[14:03.34]to go get that thing done for you
[14:04.34]across any software tool
[14:05.34]that you already use
[14:06.34]and so the end vision of this
[14:07.34]is effectively like
[14:08.34]I think in a couple years
[14:09.34]everyone’s going to have access
[14:10.34]to an AI teammate
[14:11.34]that they can delegate arbitrary tasks to
[14:14.34]and then also be able to use it
[14:16.34]to a sounding board
[14:17.34]and just be way, way, way more productive
[14:19.34]right and just changes the shape
[14:21.34]of every job
[14:22.34]from something where you’re mostly
[14:23.34]doing execution
[14:24.34]to something where you’re mostly
[14:25.34]actually doing these core liberal arts skills
[14:26.34]of what should I be doing and why
[14:28.34]right and
[14:29.34]I find this like really exciting
[14:31.34]motivating because
[14:32.34]I think it’s actually
[14:33.34]pretty different vision
[14:34.34]for how AI will play out
[14:36.34]I think systems like ADAPT
[14:37.34]are the most likely systems
[14:38.34]to be proto-AGI’s
[14:40.34]but I think the ways in which
[14:41.34]we are really counterintuitive
[14:42.34]to everybody
[14:43.34]is that
[14:44.34]we’ve actually been really quiet
[14:45.34]because we are
[14:46.34]not a developer company
[14:47.34]we don’t sell APIs
[14:48.34]we don’t sell open source models
[14:50.34]we also don’t sell bottom-up products
[14:52.34]we’re not a thing
[14:53.34]that you go and click
[14:54.34]and download the extension
[14:55.34]and like we want more users
[14:56.34]signing up for that thing
[14:57.34]we’re actually an enterprise company
[14:58.34]so what we do is
[14:59.34]we work with a range
[15:00.34]of different companies
[15:01.34]some like late-stage
[15:02.34]multi-thousand people start-ups
[15:04.34]some Fortune 500s etc
[15:06.34]and what we do for them
[15:07.34]is we basically give them
[15:09.34]an out-of-the-box solution
[15:11.34]where big complex workflows
[15:12.34]that their employees
[15:13.34]do every day
[15:14.34]could be delegated to the model
[15:15.34]and so we look a little
[15:16.34]different from other companies
[15:17.34]in that in order
[15:18.34]to go build this
[15:19.34]full agent thing
[15:20.34]the most important thing
[15:21.34]you gotta get right
[15:22.34]is reliability
[15:23.34]so initially zooming
[15:24.34]way back when
[15:25.34]one of the first things
[15:26.34]debt did was we released
[15:27.34]this demo called Act 1
[15:28.34]act 1 was like pretty cool
[15:30.34]it’s kind of become
[15:31.34]a hello world thing
[15:32.34]for people to show
[15:33.34]agent demos
[15:34.34]by going to redfin
[15:35.34]and asking to buy a house
[15:36.34]somewhere
[15:37.34]because like we did that
[15:38.34]in the original Act 1 demo
[15:39.34]and like showed that
[15:40.34]showed like Google Sheets
[15:41.34]all this other stuff
[15:42.34]over the last like year
[15:44.34]since that has come out
[15:45.34]there’s been a lot
[15:46.34]of really cool demos
[15:47.34]and you go play with them
[15:48.34]and you realize
[15:49.34]they work 60% of the time
[15:50.34]but since we’ve always
[15:51.34]been focused on
[15:52.34]how do we build
[15:53.34]an amazing enterprise product
[15:54.34]enterprises can’t use
[15:55.34]anything
[15:56.34]the reliability
[15:57.34]and so we’ve
[15:58.34]actually had to go down
[15:59.34]a slightly different
[16:00.34]tech tree than what you
[16:01.34]might find in the
[16:02.34]prompt engineering
[16:03.34]sort of plays in
[16:04.34]the agent space
[16:05.34]to get that reliability
[16:06.34]and we’ve decided
[16:07.34]to prioritize reliability
[16:08.34]over all else
[16:09.34]so like one of our use
[16:10.34]cases is crazy enough
[16:11.34]that it actually ends
[16:12.34]with a physical truck
[16:13.34]being sentto a place
[16:15.34]as the result
[16:16.34]of the agent workflow
[16:17.34]and if you’re like
[16:18.34]if that works like 60%
[16:19.34]of the time
[16:20.34]you’re just blowing money
[16:21.34]and poor truck drivers
[16:22.34]going places
[16:23.34]interesting
[16:24.34]one of the
[16:25.34]common teams
[16:26.34]has this idea of services
[16:27.34]as software
[16:28.34]I’m actually giving a talk
[16:29.34]at nvidia gtc
[16:30.34]about this
[16:31.34]but basically
[16:32.34]software as a service
[16:33.34]you’re wrapping
[16:34.34]user productivity
[16:35.34]in software
[16:36.34]with agents
[16:37.34]and services as software
[16:38.34]is replacing things
[16:39.34]that you know
[16:40.34]you would ask somebody
[16:41.34]to do
[16:42.34]and the software
[16:43.34]just does it for you
[16:44.34]when you think
[16:45.34]about these usecases
[16:46.34]do the users
[16:47.34]still go in
[16:48.34]and look at the agent
[16:49.34]kindof like
[16:50.34]doing the things
[16:51.34]and can intervene
[16:52.34]or likeare they slowly
[16:53.34]remove from them
[16:54.34]are there people
[16:55.34]in the middle
[16:56.34]checking in
[16:57.34]I think there’s two current flaws
[16:58.34]in the framing
[16:59.34]for services
[17:00.34]as software
[17:01.34]or I think what you just said
[17:02.34]I think that one of them
[17:03.34]is likein our experience
[17:04.34]as we’ve been rolling
[17:05.34]out adept
[17:06.34]the people who actually
[17:07.34]do the jobs
[17:08.34]are the most excited
[17:09.34]about it
[17:10.34]because they don’t go from
[17:11.34]I do this job
[17:12.34]to I don’t do this job
[17:13.34]they go from
[17:14.34]I do this job
[17:15.34]for everything
[17:16.34]including the shitty
[17:17.34]wrote stuff
[17:18.34]to I’m a supervisor
[17:19.34]and I literally
[17:20.34]likeit’s pretty magical
[17:21.34]when you watch the thing
[17:22.34]being used
[17:23.34]sequentially by hand
[17:24.34]as a human
[17:25.34]and you can just click
[17:26.34]in any one of them
[17:27.34]be like hey I want to watch
[17:28.34]the trajectory
[17:29.34]the agent went through
[17:30.34]to go solve this
[17:31.34]and the nice thing
[17:32.34]about agent execution
[17:33.34]as opposed to
[17:34.34]like LLM generations
[17:35.34]is that
[17:36.34]a good chunk of the time
[17:37.34]when the agent
[17:38.34]fails to execute
[17:39.34]it doesn’t give you
[17:40.34]the wrong result
[17:41.34]it just fails to execute
[17:42.34]and the whole trajectory
[17:43.34]is just broken and dead
[17:44.34]and the agent knows it
[17:45.34]right so then
[17:46.34]those are the ones
[17:47.34]that the human
[17:48.34]then goes and solves
[17:49.34]and so then they become
[17:50.34]a troubleshooter
[17:51.34]they work on the more
[17:52.34]present piece
[17:53.34]of it
[17:54.34]that we found
[17:55.34]is our strategy
[17:56.34]as a company
[17:57.34]is to always be
[17:58.34]an augmentation company
[17:59.34]and I think
[18:01.34]one out of principle
[18:02.34]that’s something
[18:03.34]we really care about
[18:04.34]but two
[18:05.34]actually if you’re
[18:06.34]framing yourself
[18:07.34]as an augmentation
[18:08.34]company
[18:09.34]you’re always going to
[18:10.34]live in the world
[18:11.34]where you’re solving
[18:12.34]tasks that are a little
[18:13.34]too hard for what
[18:14.34]the model can do today
[18:15.34]and still needs a human
[18:16.34]to provide oversight
[18:17.34]provide clarifications
[18:18.34]provide human feedback
[18:19.34]and that’s how you
[18:20.34]build a data flywheel
[18:21.34]smart as humans
[18:22.34]how to solve
[18:23.34]things models
[18:24.34]can’t do today
[18:25.34]and so I actually
[18:26.34]think that
[18:27.34]being an augmentation
[18:28.34]company
[18:29.34]forces you to go
[18:30.34]develop your core
[18:31.34]AI capabilities
[18:32.34]faster than someone
[18:33.34]who’s saying
[18:34.34]ah okay
[18:35.34]my job’s like
[18:36.34]deliver you
[18:37.34]a lights off
[18:38.34]solution for X
[18:39.34]it’s interesting
[18:40.34]because we’ve seen
[18:41.34]two parts
[18:42.34]of the market
[18:43.34]one is
[18:44.34]we have one company
[18:45.34]that does
[18:46.34]agents for
[18:47.34]sock analysts
[18:48.34]people just
[18:49.34]don’t have them
[18:50.34]which is
[18:51.34]the augmentation product
[18:52.34]and then you have
[18:53.34]sweep.dev
[18:54.34]any of these products
[18:55.34]which they just
[18:56.34]do the whole thing
[18:57.34]I’m really curious
[18:58.34]to see how that evolves
[18:59.34]I agree that today
[19:00.34]the reliability is
[19:01.34]so important
[19:02.34]in the enterprise
[19:03.34]that they just
[19:04.34]don’t use
[19:05.34]most of them
[19:06.34]that’s cool
[19:07.34]but it’s great
[19:08.34]to hear the story
[19:09.34]because I think
[19:10.34]from the outside
[19:11.34]people are like
[19:12.34]oh that
[19:13.34]they do act one
[19:14.34]they do person on
[19:15.34]they do foo you
[19:16.34]they do all these
[19:17.34]it’s just the public stuff
[19:18.34]it’s just the public stuff
[19:19.34]我們想要更多的客人來領導
[19:22.20]所以我們想要更多的客人來領導
[19:26.08]但我們希望我們會更多的客人來領導
[19:29.32]我們想要更多的客人來領導
[19:31.48]我們想要更多的客人來領導
[19:33.68]所以這次我們想要更多的客人來領導
[19:36.70]為什麼你變得更多的客人?
[19:38.78]如果整個推動…
[19:40.12]你已經領導了你的公司
[19:41.82]但是你也會更加努力去領導更多的客人來領導
[19:46.20]我覺得我們剛剛領導過那一步
[19:48.14]因為我最近還沒有領導過那一步
[19:49.14]這是一個好問題
[19:50.14]我認為這兩件事其實是很重要的
[19:51.14]一件事我認為是…
[19:53.14]坦白說,大部分是公共的歷史
[19:56.14]在公司中的公司中的歷史是最重要的
[19:58.14]我非常高興這件事發生
[20:00.14]因為當我們開始公司在2022年代
[20:03.14]大家都在社會中知道歷史的歷史
[20:06.14]但公司中的歷史沒有任何意義
[20:08.14]他們還會把所有的歷史都放在桌上
[20:11.14]所以我認為現在
[20:13.14]我真的要注意的是
[20:15.14]當人們認為歷史
[20:16.14]他們會認為是對的
[20:17.14]對,所有各種各樣的東西都會被引起
[20:19.14]會被引起的電話電話電話電話
[20:20.14]會被引起的東西都會被引起的東西
[20:21.14]或是被引起的電話電話電話
[20:22.14]我認為電話電話電話
[20:23.14]是一個可以給你一個目標
[20:25.14]再次進行的工作
[20:27.14]並且在最少數個步驟中
[20:28.14]所以這就是一個大部分的原因
[20:30.14]我認為其中一個部分
[20:31.14]是因為我認為更好讓人們
[20:33.14]更加 aware of the depth
[20:34.14]他們想要做的事情
[20:35.14]他們的生意
[20:36.14]這塊地是在世界中
[20:38.14]在於在更多的利益
[20:40.14]我認為大量的利益
[20:43.14]會發生從
[20:44.14]你使用的研究模式
[20:46.14]作為大量學童的學童
[20:49.14]去解決這些事
[20:50.14]我認為那些人
[20:51.14]想要做的研究
[20:52.14]應該有所改善
[20:53.14]當你提到
[20:54.14]研究已經變成
[20:55.14]更多的一部分
[20:56.14]有什麼特別的東西
[20:57.14]你會問我嗎
[20:58.14]我會給你一個名字
[20:59.14] Bill Gates 在 his blog post
[21:00.14]提及「Agent of the Future」
[21:02.14]我是那個人 who made OSs
[21:04.14]我認為「Agent of the Next Thing」
[21:05.14]所以 Bill Gates
[21:07.14]我會叫他出來
[21:08.14]然後 Sam Altman 也會說
[21:09.14]「Agent of the Future for Open AI」
[21:10.14]我認為之前
[21:11.14]我認為
[21:12.14]有些人在《紐約 Times》
[21:13.14]Kade Metz 也在《紐約 Times》
[21:15.14]對於現在
[21:16.14]在一些不同的
[21:17.14]我看過 AI 開始的
[21:18.14]使用的研究模式
[21:19.14]是 AI 公司
[21:20.14]現在的 AI 公司
[21:21.14]是 AI 公司
[21:22.14]只是我認為
[21:23.14]是一段時間
[21:24.14]從 VC 開始
[21:25.14]是有點混合
[21:26.14]是嗎
[21:27.14]我認為有很多 VC
[21:28.14]會說我不會
[21:29.14]觸碰 any agent start-ups
[21:30.14]因為
[21:31.14]為什麼
[21:32.14]你告訴我
[21:33.14]我認為有很多 VC
[21:35.14]比較少技術
[21:37.14]不懂得
[21:38.14]限制的東西
[21:39.14]不不不
[21:40.14]你會這樣嗎
[21:41.14]不不
[21:42.14]我認為
[21:43.14]今天的可能性
[21:44.14]是否適用
[21:46.14]我認為
[21:47.14]人們會看你
[21:48.14]然後說
[21:49.14]這傢伙
[21:50.14]需要 400 億元
[21:51.14]去做
[21:52.14]所以有很多 VC
[21:53.14]都會說
[21:54.14]我會再加上
[21:55.14]有些東西
[21:56.14]協助 AI
[21:57.14]有些東西
[21:58.14]是比較容易
[21:59.14]進行
[22:00.14]進行的
[22:01.14]但我還驚訝
[22:02.14]有些 funders
[22:03.14]不想做 agent
[22:04.14]不只是 funding
[22:05.14]有時候
[22:06.14]我們在看
[22:07.14]為什麼沒有人
[22:08.14]做 agent for acts
[22:09.14]那是好
[22:10.14]其實
[22:11.14]我從沒知道
[22:12.14]我的觀點
[22:13.14]是
[22:14.14]有新的 agent company
[22:16.14]在進行
[22:17.14]所以可能
[22:18.14]他們也有
[22:19.14]但我提供人員
[22:20.14]去取消 agent
[22:21.14]他們的名字
[22:22.14]是因為
[22:23.14]他們的名字
[22:24.14]他們的名字
[22:25.14]所以
[22:26.14]他們不等待
[22:27.14]對
[22:28.14]那是好處
[22:29.14]你的 portfolio allocator
[22:31.14]有些人
[22:32.14]知道 about persimmon
[22:33.14]一些人知道
[22:34.14]for you and for you heavy
[22:35.14]你覺得
[22:36.14]怎麼想
[22:37.14]那個 evolution of that
[22:38.14]什麼人
[22:39.14]想想
[22:40.14]那是
[22:41.14]a depth
[22:42.14]搜尋個案
[22:43.14] kind of take us
[22:44.14]through the stuff
[22:45.14]you should recently
[22:46.14]and how people
[22:47.14]should think about
[22:48.14]the trajectory
[22:49.14]what you’re doing
[22:50.14]the critical path
[22:51.14]for adept
[22:52.14]is we want to build
[22:53.14]agents that can do
[22:54.14]a higher and higher
[22:55.14]level of abstraction
[22:56.14]things over time
[22:57.14]all while keeping
[22:58.14]insanely
[22:59.14]high reliability standard
[23:00.14]because that’s
[23:01.14]what turns this from
[23:02.14]research into something
[23:03.14]that customers want
[23:04.14]and if you build
[23:05.14]agents with really
[23:06.14]high reliability standard
[23:07.14]your users
[23:08.14]how to get that
[23:09.14]next level of
[23:10.14]straction faster
[23:11.14]so that’s how
[23:12.14]you actually build
[23:13.14]the data level
[23:14.14]that’s the critical path
[23:15.14]for the company
[23:16.14]everything we do
[23:17.14]is in service of that
[23:18.14]so you go zoom
[23:19.14]way way back to
[23:20.14]act one days right
[23:21.14]like the core thing
[23:22.14]behind act one
[23:23.14]is can we teach
[23:24.14]large model basically
[23:25.14]how to even
[23:26.14]actuate your computer
[23:27.14]and I think we’re
[23:28.14]one of the first places
[23:29.14]to have solved that
[23:30.14]and shown it
[23:31.14]and shown the generalization
[23:32.14]that you get when you
[23:33.14]give it various different
[23:34.14]workflows and texts
[23:35.14]but I think from
[23:36.14]these models
[23:37.14]to be able to
[23:38.14]get a lot better
[23:39.14]at having some
[23:40.14]specificationof some
[23:41.14]guardrails for what it
[23:42.14]actually should be doing
[23:43.14]and I think in conjunction
[23:44.14]with that a giant thing
[23:45.14]that was really
[23:46.14]necessaryis really
[23:47.14]fast multimodal models
[23:48.14]that are really good
[23:49.14]at understanding
[23:50.14]knowledge work
[23:51.14]and really good
[23:52.14]at understanding screens
[23:53.14]and that needs to
[23:54.14]kind of be the base
[23:55.14]for some of these
[23:56.14]agentsback then
[23:57.14]we had to do a ton
[23:58.14]ofresearchbasically
[23:59.14]on how do we
[24:00.14]actually make that
[24:01.14]possiblewell first off
[24:02.14]back in
[24:03.14]free at exact
[24:04.14]one month of 23
[24:05.14]and then
[24:06.14]we had to
[24:07.14]get a lot better
[24:08.14]at the first place
[24:09.14]and then
[24:10.14]we had to
[24:11.14]get a lot better
[24:12.14]at the first place
[24:13.14]and then
[24:14.14]we had to
[24:15.14]get a lot better
[24:16.14]at the first place
[24:17.14]and then
[24:18.14]we had to
[24:19.14]get a lot better
[24:20.14]at the first place
[24:21.14]and then
[24:22.14]we had to
[24:23.14]get a lot better
[24:24.14]at the first place
[24:25.14]and then
[24:26.14]we had to
[24:27.14]get a lot better
[24:28.14]at the first place
[24:29.14]and then
[24:30.14]we had to
[24:31.14]get a lot better
[24:32.14]at the first place
[24:33.14]and then
[24:34.14]we had to
[24:35.14]get a lot better
[24:36.14]at the first place
[24:37.14]and then
[24:38.14]we had to
[24:39.14]get a lot better
[24:40.14]at the first place
[24:41.14]and then
[24:42.14]we had to
[24:43.14]get a lot better
[24:44.14]at the first place
[24:45.14]and then
[24:46.14]we had to
[24:47.14]get a lot better
[24:48.14]at the first place
[24:49.14]and then
[24:50.14]we had to
[24:51.14]get a lot better
[24:52.14]at the first place
[24:53.14]and then
[24:54.14]we had to
[24:55.14]get a lot better
[24:56.14]at the first place
[24:57.14]and then
[24:58.14]we had to
[24:59.14]get a lot better
[25:00.14]at the first place
[25:01.14]and then
[25:02.14]we had to
[25:03.14]get a lot better
[25:04.12]at the first place
[25:05.12]and then
[25:06.12]we had to
[25:07.12]get a lot better
[25:08.12]at the first place
[25:09.12]and then
[25:10.12]we had to
[25:11.12]get a lot better
[25:12.12]at the first place
[25:13.12]and then
[25:14.12]we had to
[25:15.12]get a lot better
[25:16.12]at the first place
[25:17.12]and then
[25:18.12]we had to
[25:19.12]get a lot better
[25:20.12]at the first place
[25:21.12]and then
[25:22.12]we had to
[25:23.12]get a lot better
[25:24.12]at the first place
[25:25.12]and then
[25:26.12]we had to
[25:27.12]get a lot better
[25:28.12]at the first place
[25:29.12]and then
[25:30.12]we had to
[25:31.12]get a lot better
[25:32.12]at the first place
[25:33.12]and then
[25:34.12]we had to
[25:35.12]get a lot better
[25:36.12]at the first place
[25:37.12]and then
[25:38.12]we had to
[25:39.12]get a lot better
[25:40.12]at the first place
[25:41.12]and then
[25:42.12]we had to
[25:43.12]get a lot better
[25:44.12]at the first place
[25:45.12]and then
[25:46.12]we had to
[25:47.12]get a lot better
[25:48.12]at the first place
[25:49.12]and then
[25:50.12]we had to
[25:51.12]get a lot better
[25:52.12]at the first place
[25:53.12]and then
[25:54.12]we had to
[25:55.12]get a lot better
[25:56.12]at the first place
[25:57.12]and then
[25:58.12]we had to
[25:59.12]get a lot better
[26:00.12]at the first place
[26:01.12]and then
[26:02.12]we had to
[26:03.12]get a lot better
[26:04.12]at the first place
[26:05.12]and then
[26:06.12]we had to
[26:07.12]get a lot better
[26:08.12]at the first place
[26:09.12]and then
[26:10.12]we had to
[26:11.12]get a lot better
[26:12.12]at the first place
[26:13.12]and then
[26:14.12]we had to
[26:15.12]get a lot better
[26:16.12]at the first place
[26:17.12]and then
[26:18.12]we had to
[26:19.12]get a lot better
[26:20.12]at the first place
[26:21.12]and then
[26:22.12]we had to
[26:23.12]get a lot better
[26:24.12]at the first place
[26:25.12]and then
[26:26.12]we had to
[26:27.12]get a lot better
[26:28.12]at the first place
[26:29.12]and then
[26:30.12]we had to
[26:31.12]get a lot better
[26:32.12]at the browser level
[26:33.12]I really want
[26:34.12]at your papers
[26:35.12]you have like a different representation
[26:36.12]kind of like
[26:37.12]you don’t just take the dome
[26:38.12]and act on it
[26:39.12]you do a lot more stuff
[26:40.12]how do you think about
[26:41.12]the best way
[26:42.12]the models will interact
[26:43.12]with the software
[26:44.12]and like how
[26:45.12]the development of products
[26:46.12]is going to change
[26:47.12]with that in mind
[26:48.12]as more and more
[26:49.12]the work is done by agents
[26:50.12]instead of people
[26:51.12]this is
[26:52.12]there’s so much surface area here
[26:53.12]and it’s actually one of the things
[26:54.12]I’m really excited about
[26:55.12]and it’s funny because
[26:56.12]I’ve spent most of my time
[26:57.12]doing research stuff
[26:58.12]but this is like a whole
[26:59.12]new ball game that I’ve been
[27:00.12]doing about
[27:01.12]and I find it
[27:02.12]really cool
[27:03.12]so I would say
[27:04.12]the best analogy
[27:05.12]I have to
[27:06.12]why ADAPT
[27:07.12]is pursuing a path
[27:08.12]of being able to
[27:09.12]use your computer
[27:10.12]like a human
[27:11.12]plus of course
[27:12.12]being able to call
[27:13.12]APIs
[27:14.12]being able to call
[27:15.12]APIs is the easy part
[27:16.12]like being able to
[27:17.12]use your gear like humans
[27:18.12]is a hard part
[27:19.12]it’s in the same way
[27:20.12]why people are excited
[27:21.12]about humanoid robotics
[27:22.12]right
[27:23.12]in a world where
[27:24.12]you had t=infinity
[27:25.12]right you’re probably
[27:26.12]gonna have various
[27:27.12]different form factors
[27:28.12]that robots
[27:29.12]do
[27:30.12]without changing
[27:31.12]everything along the way
[27:32.12]it’s the same thing
[27:33.12]for software
[27:34.12]right
[27:35.12]if you go itemize out
[27:36.12]the number of things
[27:37.12]you wanna do on your computer
[27:38.12]for which every step
[27:39.12]has an api
[27:40.12]those numbers
[27:41.12]will workflows add up
[27:42.12]pretty close to zero
[27:43.12]and so then many
[27:44.12]points along the way
[27:45.12]you need the ability
[27:46.12]to actually control
[27:47.12]your computer like a human
[27:48.12]it also lets you learn
[27:49.12]from human usage
[27:50.12]of computers
[27:51.12]as a source of training
[27:52.12]data that you don’t get
[27:53.12]if you have to somehow
[27:54.12]figure out how every
[27:55.12]particular step needs to be
[27:56.12]some particular custom
[27:57.12]private api thing
[27:58.12]it’s the most practical path
[27:59.12]i think a lot of
[28:00.12]success will come
[28:01.12]from going down
[28:02.12]this path
[28:03.12]i kinda think about this
[28:04.12]early days of the agent
[28:05.12]interaction layer
[28:06.12]level is a little bit
[28:07.12]like do y’all remember
[28:08.12]windows 3.1
[28:10.12]like those days
[28:11.12]this might be
[28:12.12]i might be too old
[28:13.12]for you guys on this
[28:14.12]but back in the day
[28:15.12]windows 3.1
[28:16.12]we had this transition period
[28:17.12]between pure command line
[28:18.12]right
[28:19.12]being the default
[28:20.12]into this new world
[28:21.12]with the gui is the default
[28:22.12]and then you drop into the
[28:23.12]command line for like
[28:24.12]programmer things
[28:25.12]the old way was
[28:26.12]you booted your computer up
[28:27.12]and then it would
[28:28.12]give you the c colon
[28:29.12]slash thing
[28:30.12]and you typed windows
[28:31.12]and you hit enter
[28:32.12]and then you got
[28:33.12]put into windows
[28:34.12]and then the gui
[28:35.12]kind of became a layer
[28:36.12]above the command line
[28:37.12]the same thing
[28:38.12]is gonna happen
[28:39.12]with agent interfaces
[28:40.12]is like today
[28:41.12]what we have in the gui
[28:42.12]is like the base layer
[28:44.12]and then the agent
[28:45.12]just controls
[28:46.12]the current gui
[28:47.12]layer plus apis
[28:48.12]and in the future
[28:50.12]as more and more
[28:51.12]trust is built towards
[28:52.12]agents and more and more
[28:53.12]things can be done by
[28:54.12]agents and more UIs
[28:55.12]for agents are actually
[28:56.12]users
[28:57.12]then that just becomes
[28:58.12]a standard
[28:59.12]interaction layer
[29:00.12]and if that becomes
[29:01.12]a standard
[29:02.12]interaction layer
[29:03.12]what changes for
[29:04.12]software is that
[29:05.12]a lot of software
[29:06.12]is gonna be
[29:07.12]either systems
[29:08.12]or record
[29:09.12]or like certain
[29:10.12]customized
[29:11.12]workflow
[29:12.12]execution engines
[29:13.12]and a lot of
[29:14.12]how you actually
[29:15.12]do stuff will be
[29:16.12]controlled at the
[29:17.12]agent layer
[29:18.12]and you think the
[29:19.12]rabbit interface
[29:20.12]is more like
[29:21.12]it would like
[29:22.12]you’re not actually
[29:23.12]seeing the app
[29:24.12]that the model
[29:25.12]I can see that
[29:26.12]being a model
[29:27.12]I think
[29:28.12]I don’t know
[29:29.12]enough about
[29:30.12]what using
[29:31.12]rabbit in real life
[29:32.12]will actually be like
[29:33.12]to comment on
[29:34.12]that particular
[29:35.12]thing but I think
[29:36.12]the broader idea
[29:37.12]that you know
[29:38.12]you have a goal
[29:39.12]the agent knows
[29:40.12]how to break
[29:41.12]your goal down into steps
[29:42.12]the agent knows
[29:43.12]how to use
[29:44.12]the underlying
[29:45.12]software
[29:46.12]and systems
[29:47.12]or record
[29:48.12]to achieve
[29:49.12]that goal for you
[29:50.12]the agent may presents
[29:51.12]you information
[29:52.12]in a custom way
[29:53.12]that’s only
[29:54.12]you’re a power
[29:55.12]user
[29:56.12]for some niche thing
[29:57.12]general question
[29:58.12]so first of all
[29:59.12]I think like
[30:00.12]the sort of input
[30:01.12]mode conversation
[30:02.12]I wonder if you have
[30:03.12]any analogies
[30:04.12]that you like
[30:05.12]with self-driving
[30:06.12]because I do think
[30:07.12]there’s a little bit
[30:08.12]of how the model
[30:09.12]should perceive the world
[30:10.12]and you know
[30:11.12]the primary split
[30:12.12]in self-driving
[30:13.12]is LiDAR
[30:14.12]versus camera
[30:15.12]and I feel like
[30:16.12]most agent companies
[30:17.12]that I’m tracking
[30:18.12]are all moving towards
[30:19.12]camera approach
[30:20.12]which is like
[30:21.12]the multimodal approach
[30:22.12]that we’re doing
[30:23.12]you’re
[30:24.12]focusing on that
[30:25.12]including charts
[30:26.12]and tables
[30:27.12]and do you find
[30:28.12]inspiration there
[30:29.12]from the self-driving
[30:30.12]world?
[30:31.12]that’s a good question
[30:32.12]I think sometimes
[30:33.12]the most useful
[30:34.12]inspiration I’ve found
[30:35.12]from self-driving
[30:36.12]is the levels analogy
[30:37.12]I think that’s awesome
[30:38.12]but I think that
[30:39.12]our number one
[30:40.12]goals for agents
[30:41.12]not to look like
[30:42.12]self-driving
[30:43.12]we want to minimize
[30:44.12]the chances
[30:45.12]that agents are sort
[30:46.12]of a thing
[30:47.12]that you just
[30:48.12]have to bang
[30:49.12]your head at
[30:50.12]for a long time
[30:51.12]to get to like
[30:52.12]completely
[30:53.12]and that takes you
[30:54.12]all the way
[30:55.12]up to the top
[30:56.12]but similarly
[30:57.12]I mean
[30:58.12]compared to self-driving
[30:59.12]like two things
[31:00.12]that people really
[31:01.12]undervalue
[31:02.12]that’s like really
[31:03.12]easy to driving
[31:04.12]a car down
[31:05.12]highway 101
[31:06.12]in a sunny day
[31:07.12]demo
[31:08.12]that actually
[31:09.12]doesn’t prove anything
[31:10.12]anymore
[31:11.12]and I think
[31:12.12]the second thing
[31:13.12]is that
[31:14.12]as a non-self-driving
[31:15.12]expert
[31:16.12]I think one of the things
[31:17.12]that we believe
[31:18.12]really strongly
[31:19.12]is that
[31:20.12]everyone under
[31:21.12]get a lot
[31:22.12]of reliability
[31:23.12]is a really
[31:24.12]strong focus on
[31:25.12]actually why
[31:26.12]does the model
[31:27.12]not do this thing
[31:28.12]and the non-trivial amount
[31:29.12]of time
[31:30.12]the time the model
[31:31.12]doesn’t actually
[31:32.12]do the thing
[31:33.12]is because if
[31:34.12]you’re a wizard
[31:35.12]of ozing it yourself
[31:36.12]or if you have
[31:37.12]unreliable actuators
[31:38.12]you can’t do the thing
[31:39.12]and so we’ve
[31:40.12]had to fix
[31:41.12]a lot of those problems
[31:42.12]I was slightly
[31:43.12]surprised just because
[31:44.12]I do generally
[31:45.12]consider the way
[31:46.12]most that we see
[31:47.12]all around San Francisco
[31:48.12]as the most
[31:49.12]I guess real case
[31:50.12]it’s a big
[31:51.12]job but it has taken
[31:52.12]a long time
[31:53.12]for self-driving
[31:54.12]temperature from
[31:55.12]when it entered
[31:56.12]the consciousness
[31:57.12]and the driving down
[31:58.12]when it went on a sunny
[31:59.12]day moment
[32:00.12]happened to now.
[32:01.12]so I want to see
[32:02.12]the more compressed
[32:03.12]cruise, you know,
[32:04.12]R.I.P.
[32:05.12]recently.
[32:06.12]and then one more thing
[32:07.12]on just like
[32:08.12]just going back on
[32:09.12]this reliability
[32:10.12]thing, something
[32:11.12]I have been holding
[32:12.12]in my head
[32:13.12]that I’m curious
[32:14.12]to get your commentary on
[32:15.12]is I think there’s a
[32:16.12]treatup between
[32:17.12]reliability and generality
[32:18.12]or I want to broaden
[32:19.12]because you have
[32:20.12]reliability also have
[32:21.12]cost of speed
[32:22.12]speed is a huge emphasis
[32:23.12]for a debt
[32:24.12]the tendency or the
[32:25.12]attemptation is to reduce
[32:26.12]generalityto improve
[32:27.12]reliability
[32:28.12]and to improve
[32:29.12]cost improve speed
[32:30.12]do you perceive a tradeoff
[32:31.12]do you have any
[32:32.12]insights that
[32:33.12]solve those tradeoffs
[32:34.12]for you guys
[32:35.12]there’s definitely a tradeoff
[32:36.12]if you’re at
[32:37.12]the predo frontier
[32:38.12]I think a lot of folks
[32:39.12]aren’t actually
[32:40.12]at the predo frontier
[32:41.12]I think the way you get
[32:42.12]there is basically
[32:43.12]how do you frame
[32:44.12]the fundamental
[32:45.12]agent problem in a way
[32:46.12]that just continues
[32:47.12]to benefit from data
[32:48.12]I think one of
[32:49.12]the main ways
[32:50.12]of being able to solve
[32:51.12]that particular tradeoff
[32:52.12]is you basically
[32:53.12]just want to formulate
[32:54.12]the problem such that
[32:55.12]every particular use
[32:56.12]case just looks like
[32:57.12]you collecting more
[32:58.12]data to go make
[32:59.12]that use case possible
[33:00.12]I think that’s how
[33:01.12]you really solve it
[33:02.12]then you get into the
[33:03.12]other problems like
[33:04.12]are you overfitting
[33:05.12]on these end use cases
[33:06.12]right but like you’re
[33:07.12]not doing a thing
[33:08.12]where you’re like
[33:09.12]being super prescriptive
[33:10.12]for the end steps
[33:11.12]that the model can
[33:12.12]only do for example
[33:13.12]then the question becomes
[33:14.12]kind of do you have
[33:15.12]one sort of house model
[33:16.12]they customize
[33:17.12]the customer’s
[33:18.12]specific use case
[33:19.12]we’re not sharing
[33:20.12]we’re not sharing
[33:21.12]it’s tempting
[33:22.12]but that doesn’t
[33:23.12]look like AGI to me
[33:24.12]you know what I mean
[33:25.12]that is just
[33:26.12]you have a good
[33:27.12]base model
[33:28.12]and then
[33:29.12]you fine tune it
[33:30.12]for what it’s worth
[33:31.12]I think there’s
[33:32.12]two paths
[33:33.12]to a lot more
[33:34.12]capability coming out
[33:35.12]of the models
[33:36.12]that we
[33:37.12]all are training
[33:38.12]these days
[33:39.12]one path
[33:40.12]is you figure out
[33:41.12]how to spend
[33:42.12]compute and turn
[33:43.12]into data
[33:44.12]and so in that
[33:45.12]path I consider
[33:46.12]off play
[33:47.12]all that stuff
[33:48.12]the second path
[33:49.12]is how do you
[33:50.12]get super
[33:52.12]competent
[33:53.12]high intelligence
[33:54.12]demonstrations
[33:55.12]from humans
[33:56.12]and I think
[33:57.12]the right way
[33:58.12]to move forward
[33:59.12]is you kind of
[34:00.12]want to combine the two
[34:01.12]the first one
[34:02.12]gives you maximum
[34:03.12]sample efficiency
[34:04.12]for the second
[34:05.12]but I think
[34:06.12]that is going to be
[34:07.12]hard to be running
[34:08.12]at max speed
[34:09.12]towards AGI
[34:10.12]without actually
[34:11.12]solving a bit of both
[34:12.12]you haven’t talked
[34:13.12]much about synthetic
[34:14.12]data as far as I can
[34:15.12]any insights
[34:16.12]on using synthetic
[34:17.12]data to augment
[34:18.12]the expensive
[34:19.12]human data
[34:20.12]the best part
[34:21.12]about framing AGI
[34:22.12]is being able
[34:23.12]to help people do
[34:24.12]things on computers
[34:25.12]is you have an environment
[34:26.12]yes
[34:27.12]so you can
[34:28.12]simulate all of it
[34:29.12]you can do a lot
[34:30.12]of stuff
[34:31.12]when you have an environment
[34:32.12]we were having dinner
[34:33.12]for our one year
[34:34.12]anniversary
[34:35.12]the other round
[34:36.12]thank you
[34:37.12]Raza from human
[34:38.12]loop was there
[34:39.12]and we mentioned
[34:40.12]you were coming on
[34:41.12]the pod
[34:42.12]this is our first
[34:43.12]so he submitted a question
[34:44.12]now you had
[34:45.12]gbd4 vision
[34:46.12]and help you
[34:47.12]building a lot
[34:48.12]of those things
[34:49.12]how do you think
[34:50.12]about the things
[34:51.12]that are unique to you
[34:52.12]as a depth
[34:53.12]and like going back
[34:54.12]to like the maybe
[34:55.12]research direction
[34:56.12]that you want to take
[34:57.12]the team and what you
[34:58.12]want people to come
[34:59.12]work on at a depth
[35:00.12]versus what is maybe
[35:01.12]not become commoditized
[35:02.12]that you didn’t expect
[35:03.12]everybody would
[35:04.12]have access to
[35:05.12]yeah that’s
[35:06.12]a really good question
[35:07.12]I think implicit
[35:08.12]in that question
[35:09.12]and I wish he were
[35:10.12]tier two so he can
[35:11.12]push back on my
[35:12.12]assumption about his
[35:13.12]questionbut I think
[35:14.04]is calculus of where
[35:16.04]does advantage a crew
[35:18.04]in the overall
[35:19.04]ML stack
[35:20.04]and maybe part
[35:21.04]of the assumption
[35:22.04]is that advantage
[35:23.04]a crew is solely
[35:24.04]to base model scaling
[35:25.04]but I actually
[35:26.04]believe pretty strongly
[35:27.04]that the way
[35:28.04]that you really
[35:29.04]win is that you
[35:30.04]have to go build
[35:31.04]an agent stack
[35:32.04]that is much more
[35:33.04]than that
[35:34.04]of the base model itself
[35:35.04]and so I think
[35:36.04]like that is
[35:37.04]always going to be
[35:38.04]a giant advantage
[35:39.04]of vertical integration
[35:40.04]I think like
[35:41.04]it lets us do things
[35:42.04]like have a really
[35:43.04]bad cat and dog
[35:44.04]photo
[35:45.04]it’s pretty good
[35:46.04]at cat and dog
[35:47.04]photo
[35:48.04]it’s not like
[35:49.04]soda at cat
[35:50.04]and dogphoto
[35:51.04]so like we’re allocating
[35:52.04]our capacity wisely
[35:53.04]is like one thing
[35:54.04]that you
[35:55.04]really get to do
[35:56.04]I also think that
[35:57.04]the other thing
[35:58.04]that is pretty
[35:59.04]important now
[36:00.04]in the broader
[36:01.04]foundation modeling
[36:02.04]space is
[36:03.04]I feel despite any
[36:04.04]potential concerns
[36:05.04]about how good
[36:06.04]is agents as
[36:07.04]like a startup area
[36:08.04]like we were talking
[36:09.04]about earlier
[36:10.04]I feel super good
[36:11.04]that we’re
[36:12.04]cap just flowing
[36:13.04]from can we make
[36:14.04]a better agent
[36:15.04]because right now
[36:16.04]I think we all see
[36:17.04]that you know
[36:18.04]if you’re training
[36:19.04]on publicly available
[36:20.04]web data
[36:21.04]you put in the
[36:22.04]flops and you do
[36:23.04]reasonable things
[36:24.04]then you get
[36:25.04]decent results
[36:26.04]and if you just
[36:27.04]double the amount
[36:28.04]of compute
[36:29.04]then you get
[36:30.04]predictably
[36:31.04]better results
[36:32.04]and so I think
[36:33.04]pure play foundation
[36:34.04]model companies
[36:35.04]are just going to be
[36:36.04]pinched by how
[36:37.04]good the next couple
[36:38.04]lamas are going to be
[36:39.04]and the next
[36:40.04]what good open source
[36:41.04]on these base foundation
[36:42.04]models I think it’s
[36:43.04]gonna commoditize a lot
[36:44.04]of the regular llms
[36:45.04]and soon regular
[36:46.04]multimodal models
[36:47.04]so I feel really good
[36:48.04]that we’re just focused
[36:49.04]on agents so you
[36:50.04]don’t consider yourself
[36:51.04]a pure play foundation
[36:52.04]model company no
[36:53.04]because if we were pure
[36:54.04]play foundation model
[36:55.04]company we would be
[36:56.04]traininggeneral foundation
[36:57.04]models that do
[36:58.04]summarization and
[36:59.04]all this dedicated
[37:00.04]towards the agent
[37:01.04]yeah and our business
[37:02.04]is an agent business
[37:03.04]we’re not here to
[37:04.04]sell you tokens right
[37:05.04]and I think like
[37:06.04]selling tokens unless
[37:07.04]there’s like yeah I
[37:08.04]love it there’s like
[37:09.04]if you have a particular
[37:10.04]area of specialty
[37:11.04]right then you won’t
[37:13.04]get caught in the fact
[37:14.04]that everyone’s just
[37:15.04]scaling to ridiculous
[37:16.04]levels of compute
[37:17.04]but if you don’t have a
[37:18.04]specialty I find that
[37:19.04]I think it’s gonna be
[37:20.04]a little tougher
[37:21.04]interesting are you
[37:22.04]interested in robotics at
[37:23.04]all just a personally
[37:24.04]fascinated by robotics
[37:25.04]have always loved robotics
[37:26.04]embodied agents as a
[37:27.04]business you know figure
[37:28.04]is like a big also
[37:29.04]so the open ai
[37:30.04]affiliated company
[37:31.04]that raises a lot of
[37:32.04]money I think it’s
[37:33.04]cool I think I mean
[37:34.04]I don’t know exactly
[37:35.04]what they’re exactly
[37:36.04]what they’re doing but
[37:37.04]robots yeah yeah
[37:38.04]well I mean that’s
[37:39.04]well Christian
[37:40.04]would you ask
[37:41.04]like if we
[37:42.04]had them on like
[37:43.04]what would you ask them
[37:44.04]oh I just want to
[37:45.04]understand what their
[37:46.04]overall strategy is
[37:47.04]gonna be between now
[37:48.04]and when there’s reliable
[37:49.04]stuff to be deployed
[37:50.04]but honestly
[37:51.04]I just don’t know
[37:52.04]enough about it
[37:53.04]and if I told you
[37:54.04]hey fire your entire
[37:55.04]warehouse workforce
[37:56.04]and you know
[37:57.04]put robots in there
[37:58.04]isn’t that a strategy
[37:59.04]oh yeah yeah sorry
[38:00.04]I’m not questioning
[38:01.04]whether
[38:02.04]they’re doing smart
[38:03.04]things I genuinely
[38:04.04]don’t know what
[38:05.04]they’re doing as much
[38:06.04]but I think there’s
[38:07.04]two things one
[38:08.04]it’s just
[38:09.04]I think it’s
[38:10.04]just gonna work
[38:11.04]like I will die
[38:12.04]on this hill
[38:13.04]like I mean
[38:14.04]like again this whole
[38:15.04]this whole time
[38:16.04]like we’ve been on this
[38:17.04]podcast it’s just
[38:18.04]gonna continually saying
[38:19.04]these models
[38:20.04]are basically behavioral
[38:21.04]cloners right
[38:22.04]so let’s go behavioral
[38:23.04]clone all this like
[38:24.04]robot behavior right
[38:25.04]and then
[38:26.04]now you figure out
[38:27.04]everything else
[38:28.04]you have to do in order
[38:29.04]to teach you how to
[38:30.04]solve new problem
[38:31.04]that’s gonna work
[38:32.04]I’m super stoked for that
[38:33.04]I think unlike
[38:34.04]what we’re doing with
[38:35.04]helping humans with
[38:36.04]knowledge work
[38:37.04]and I’m personally
[38:38.04]less excited about that
[38:39.04]we had a
[38:40.04]canjun from imbu
[38:41.04]on the podcast
[38:42.04]we asked her
[38:43.04]why people should
[38:44.04]go work there
[38:45.04]and not at adept
[38:46.04]so I wanna
[38:47.04]well she said
[38:48.04]you know
[38:49.04]there’s space for everybody
[38:50.04]in this market
[38:51.04]we’re all doing
[38:52.04]interesting work
[38:53.04]and she said
[38:54.04]they’re really excited
[38:55.04]about building
[38:56.04]an operating system
[38:57.04]for agent
[38:58.04]and for her
[38:59.04]the biggest research
[39:00.04]thing was like
[39:01.04]getting models
[39:02.04]better reasoning
[39:03.04]and planning
[39:04.04]for these agents
[39:05.04]the reverse question
[39:06.04]I’m excited to
[39:07.04]come work at adept
[39:08.04]instead of imbu
[39:09.04]and maybe
[39:10.04]what are like
[39:11.04]the core research
[39:12.04]questions
[39:13.04]that people should
[39:14.04]be passionate about
[39:15.04]to have fun at adept
[39:16.04]yeah first off
[39:17.04]I think that
[39:18.04]I’m sure you guys
[39:19.04]believe this too
[39:20.04]the AI space
[39:21.04]to the center
[39:22.04]there’s an AI space
[39:23.04]and the AI agent
[39:24.04]space are both
[39:25.04]exactly as
[39:26.04]she likely said
[39:27.04]I think colossal
[39:28.04]opportunities
[39:29.04]and people are just
[39:30.04]going to end up
[39:31.04]winning in different
[39:32.04]areas and a lot
[39:33.04]of companies are
[39:34.04]going to do well
[39:35.04]to be at
[39:36.04]adept
[39:37.04]I think there’s
[39:38.04]two huge reasons
[39:39.04]to be at adept
[39:40.04]I think one of them
[39:41.04]is everything we do
[39:42.04]is in the service
[39:43.04]of like useful agents
[39:44.04]we’re not a
[39:45.04]research lab
[39:46.04]we do a lot of research
[39:47.04]in service of that goal
[39:48.04]but we don’t
[39:49.04]think about ourselves
[39:50.04]as like a classic
[39:51.04]research lab at all
[39:52.04]and I think the second
[39:53.04]reason at work at
[39:54.04]adeptis
[39:55.04]if you believe that
[39:56.04]actually having customers
[39:57.04]and a reward signal
[39:58.04]from customers
[39:59.04]lets you build
[40:00.04]AGI faster
[40:01.04]which we really believe
[40:02.04]then you should come here
[40:03.04]and I think the examples
[40:04.04]are evaluations
[40:05.04]they’re not
[40:06.04]academic evals
[40:07.04]they’re not simulator
[40:08.04]evals
[40:09.04]they’re like
[40:10.04]okay we have a
[40:11.04]customer that
[40:12.04]really needs us to do
[40:13.04]these particular things
[40:14.04]we can do some
[40:15.04]of them
[40:16.04]these other ones
[40:17.04]they want us to
[40:18.04]we can’t do them at
[40:19.04]all we’ve turned
[40:20.04]those into evals
[40:21.04]solve it
[40:22.04]I think that’s
[40:23.04]really cool
[40:24.04]like everybody knows
[40:25.04]a lot of these evals
[40:26.04]are like
[40:27.04]pretty saturated
[40:28.04]and the new ones
[40:29.04]that even are
[40:30.04]not saturated you look
[40:31.04]at someone and you’re
[40:32.04]like is this actually
[40:33.04]and all of this stuff
[40:34.04]but they’re very grounded
[40:35.04]and actual needs
[40:36.04]right now
[40:37.04]which is really cool
[40:38.04]yeah this has been
[40:39.04]wonderful dive
[40:40.04]I wish we had more time
[40:41.04]but I’ll just leave it
[40:42.04]kind of open to you
[40:43.04]I think you have broad thoughts
[40:44.04]you know just about
[40:45.04]the agent space
[40:46.04]but also just general AI
[40:47.04]space any sort of rants
[40:48.04]or things that
[40:49.04]they’re just helping
[40:50.04]might for you right now
[40:51.04]any rants
[40:52.04]minding you
[40:53.04]for just general
[40:54.04]wow okay
[40:55.04]so Amelia’s already
[40:56.04]made the rant better
[40:57.04]than I have
[40:58.04]but not just
[40:59.04]not just chatbots
[41:00.04]is like kind of rant one
[41:01.04]but the rant two
[41:02.04]is AI’s really been
[41:03.04]the story of compute
[41:04.04]and compute plus data
[41:06.04]and ways in which
[41:07.04]you could change one
[41:08.04]for the other
[41:09.04]and I think as much as
[41:10.04]our research community
[41:11.04]is really smart
[41:12.04]we have made many
[41:13.04]many advancements
[41:14.04]and that’s going to
[41:15.04]continue to be important
[41:16.04]but now I think
[41:17.04]the game is
[41:18.04]increasingly changing
[41:19.04]and the rapid
[41:20.04]industrialization
[41:21.04]error has begun
[41:22.04]and I think
[41:23.04]we unfortunately
[41:24.04]have to embrace it
[41:25.04]excellent awesome David
[41:26.04]thank you so much
[41:27.04]for your time
[41:28.04]cool yeah thanks guys
[41:29.04]this was fun
[41:30.04]thank you
[41:31.04]thank you
[41:32.04]thank you
[41:32.04]thank you
[41:33.04]thank you
[41:34.04]thank you
[41:35.04]thank you
[41:36.04]thank you
[41:37.04]thank you
[41:38.04]thank you
[41:39.04]thank you
[41:40.04]thank you
[41:41.04]thank you
[41:42.04]thank you
[41:43.04]thank you
[41:44.04]thank you
[41:45.04]thank you
[41:46.04]thank you
[41:47.04]字幕by索兰娅
[41:49.04]字幕:J Chong
[41:50.04]请不吝点赞 订阅 转发 打赏 打赏