Scaling AI: Where We Are Today
As frontier models continuously get larger and more capable, Katherine and Anna discuss the challenges and breakthroughs in AI architecture, energy consumption and data requirements that are shaping the future of technology
- Guests & Resources
- Transcript
Partner
» BiographyPartner
» BiographyKatherine Forrest: Good morning, everyone, and welcome to today's episode of “Waking Up With AI,” a Paul, Weiss podcast. I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel. So, Katherine, you know I've been in Abu Dhabi having a really fun set of meetings.
Katherine Forrest: I know you have been there, I think, for some time. And the difference in the time zones is killing us in terms of coordinating work. So it's good that you don't really need any sleep.
Anna Gressel: I kind of sleep odd hours like a bat. But anyway, I've been really thinking a lot about some of the challenges we're seeing across the globe with respect to the continued scaling of AI.
Katherine Forrest: So let's pause and tell our audience what we mean by the word scaling, because that word scaling is used a lot now in the AI area.
Anna Gressel: Yeah, I'll talk about a particular meaning. So for me, I mean the ability to increase in reach and scope, like the phrase to scale up.
Katherine Forrest: Got it. And so you were saying...
Anna Gressel: Well, I'm starting from the premise that we're seeing all of these extraordinary developments in AI models.
Katherine Forrest: Right, like last week we did an episode on why AI is not in a hype cycle as part of that, and we mentioned in passing some of the newest highly capable models such as the Llama Herd of models, OpenAI's o1 model, Falcon 2 and there are others. But just to name a few, there are these extraordinary developments in AI models.
Anna Gressel: Right, and the question on a lot of minds these days is whether AI models can continue to scale up or really scale across the globe.
Katherine Forrest: And by that you really mean increase in capability.
Anna Gressel: That's one of the issues. But also to advance capabilities, they have to add robustness to their existing capabilities.
Katherine Forrest: Well, and we do know that certain things are needed for these powerful AI models, and we start with an architecture that has to have an inherent ability to scale.
Anna Gressel: Yeah, and the transformer architecture is certainly able to scale.
Katherine Forrest: Right, and the transformer architecture is what the public knows as the GPT family of models or what the OpenAI o1 model is based on, the Llama herd of models.
Anna Gressel: Yeah, and Falcon 2 is based on Mamba, a different architecture we discussed in a prior episode. It's a structured state space sequence architecture that has a number of differences, but one is a different attention mechanism.
Katherine Forrest: But for all of these models, after we have the architecture, you have to have what we call “compute.” And I'm actually going to take compute and break it into two pieces. People often think about it only as chips, but I want to add in also the energy component that powers the whole thing. So I'm going to sort of take compute and talk about it both as energy that's needed to help scale up these AI models and chip processing.
Anna Gressel: Yeah, and we all know that to train these models takes an extraordinary amount of energy. Right now, most of that energy is from the electrical grid based on fossil fuels, but it can come from solar, wind, water, potentially nuclear energy.
Katherine Forrest: Right, and one study that was recently published that, as you know, I keep waving around to you. It's like the o1 system card. It's like the OpenAI o1 system card, which I wave around all the time. Right now, I'm really hot on the Epoch AI paper called, “Can AI Scaling Continue Through 2030?” And I highly recommend it to people. But it actually puts increases in compute as expanding at a rate of four times a year.
Anna Gressel: That is incredible. And there's a question about where all of this energy is going to come from. There are power plants being built specifically to support the energy that AI training and processing needs.
Katherine Forrest: All right, and this all takes an enormous amount of investment and financial resources.
Anna Gressel: Yeah, absolutely. And then the other aspect of compute, as many of us know, relates to chips. Chips’ capacity and their availability are two really significant inputs into scaling AI. And to get the processing levels that we have mentioned in prior episodes for frontier models, you need continued chip developments in packaging and memory, among other things.
Katherine Forrest: And there are some really amazing companies out there that are making serious advancements in these areas.
Anna Gressel: And with increased access to energy and chip developments, the Epoch AI article predicts that models could have training runs that are 5,000 to 250,000 times larger than GPT-4. I have to even just like pause to process that in my head. And perhaps some of those are going to be by companies we're familiar with today.
Katherine Forrest: Or new entrants but hold on before we go on. Did I call it “eh-pik” AI and it's actually “ee-pok” AI? Which one of us got that wrong?
Anna Gressel: I have no idea.
Katherine Forrest: Okay, well somebody will tell us.
Anna Gressel: Maybe they can sub in whoever got it right. We can both say it both ways now.
Katherine Forrest: No, let's just keep this whole colloquy in because maybe our audience will know.
Anna Gressel: These are the deep cuts on our banter that get cut normally.
Katherine Forrest: Yeah, don't cut this one out. Okay, but there really could be new entrants who will come in.
Anna Gressel: For sure, there will be companies that we have not even heard of yet today that will be entrants into the AI development area, but they'll either need access to enormous amounts of private capital in order to access the resources that are needed or to state-backed resources.
Katherine Forrest: Right, but there's another really, really large constraint on scaling that we have to mention as well, and that's data.
Anna Gressel: Yeah, definitely. GenAI only learns about the world through data that we feed it.
Katherine Forrest: And its ability to make conceptual connections and to have really some of the emerging capabilities that are making these models really special and that are surprising us comes from the enormous amount of data that's being fed into the models along with their extraordinary architecture. And the data is being fed in during the training process, and it takes an enormous amount of it to actually complete one of those training runs.
Anna Gressel: Yeah, and so the question that a lot of folks are asking is, is there even enough data in the world to continue to scale these models?
Katherine Forrest: Right, and one of the things that technologists are debating is whether we can actually use AI tools themselves to create what they're calling synthetic data and then feed that data, which is sort of created data— that's why they call it synthetic — into another model to use that synthetic data to train that other model.
Anna Gressel: Yeah, and one of the concerns with synthetic data is if that kind of synthetic data that's created by models contains errors of any sort, it will sort of bake those in permanently.
Katherine Forrest: Or that there'll be a kind of a white noise or fuzz around synthetic data that ends up with a lower quality data.
Anna Gressel: Yeah, and we know that models perform better when they're trained on higher quality data.
Katherine Forrest: I've started to hear of several companies that are actually contracting out for the creation of data by humans for the specific purpose of creating data that would then be used to train AI models.
Anna Gressel: Definitely, like assigning a PhD dissertation, but the real purpose of that dissertation is actually to train the model.
Katherine Forrest: Let's turn to multimodal models because they actually present a slightly different circumstance. We've talked about multimodal models or MLLMs in prior episodes, and we know that the “data” that they need for training includes a lot of different types of data — different modes of data if you will — video, audio, things like that.
Anna Gressel: Yeah, they have an enormous appetite for video. So there'll be a need to use that video in order to scale up model training, but also things like audio, still images, thermal data, GPS data. I mean, you and I could go on about multimodal data training all day.
Katherine Forrest: For all of these models, we've got energy, we have chips, we have data and those are just a few of the really big gating factors to scaling.
Anna Gressel: Yeah, but there's a lot that goes along with those things like buildings and physical infrastructure where the training runs occur and where the servers reside for processing. So that's, we're just kind of talking about the beginning of all this infrastructure that's needed.
Katherine Forrest: But the bottom line, I think, and my big takeaway, Anna, is that these resources are, while enormous, they're achievable.
Anna Gressel: Yes, but they do cost a lot of money.
Katherine Forrest: Right, that is absolutely correct, and that's going to be something that will be talked about, I think, by a lot of folks around the world. But it is achievable, so that scaling AI to these incredible levels is also achievable.
Anna Gressel: Right, and we've talked about frontier models being 125 or 126 FLOPs, depending on the regulatory regime that pulls in some additional reporting obligations. But by 2030, we could actually have models that are something like 229 FLOPs. As Epoch AI states in its paper, that would represent like a 10,000-fold scale up in the next six years.
Katherine Forrest: Okay, “ee-pok” AI or “eh-pik” AI? It's for you, the audience, to determine. Okay, well, but what we know is that a 10,000-fold scale up in the next six years, it's a lot.
Anna Gressel: Yeah, and there would be a lot that we would start to see in terms of capabilities if we get to that point, presumably.
Katherine Forrest: Right, okay. Well, that's all we've got time for today. I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel. Thanks for joining us.