Episode Summary

Up until very recently, most of the software and services using artificial intelligence haven’t gotten much public attention, and that’s because the primary users of AI historically have been governments, militaries, and giant corporations, all of which have one thing in common— huge amounts of data and huge responsibilities. In November of last year, however, the AI industry leaped into the media spotlight with a launch of ChatGPT, a text-based large language model program that can rapidly generate massive quantities of text in the form of articles, stories, programming code, and even poetry in response to typed human input.

The attention has been nothing less than massive. The website for ChatGPT received 1.1 billion visits in January, according to web statistics firm Similarweb. The technology industry is now in an AI arms race as Google, Amazon, and others have raced to get into the gold rush that Microsoft and its partners at OpenAI created with the launch of ChatGPT.

Artificial intelligence, or at least what we’re calling it, is everywhere now. But what does AI even mean and how does it work? It’s an uncharted territory and it seems that the technologies are racing ahead a manner that is unplanned and certainly unregulated.

We don’t know what the future will hold for these “large language model” (LLM) systems but do we even understand how they’re impacting things in the present? How much does it matter that programs like ChatGPT don’t understand the words they generate?

There’s a lot to talk about here, and we’re going to be discussing with two panelists in this episode. Gary N. Smith is a professor of economics at Pomona College and the author of the new book, Distrust: Big Data, Data-Torturing, and the Assault on Science. Jeff Schatten is an associate professor at Washington and Lee University in Virginia.



MATTHEW SHEFFIELD: Thanks for being here, gentlemen.

JEFF SCHATTEN: Glad to be here, thanks.

SHEFFIELD: All right. So let’s lay down the groundwork here first by talking about what AI is. There’s a lot of confusion, I think, in terms of what people think it is versus what it actually is. So let’s start with you, Gary. Why don’t you tell us all about it here?

GARY N. SMITH: Well, the term AI, artificial intelligence, was coined back in 1956 at a summer conference at Dartmouth. And there were a lot of luminary people, very smart people, and they were talking about how the human brain inputs information, processes, and then outputs results. And so why can’t computers do that?

And so these people were promising that within a few years, computers would be as smart as humans. In a few years, computers would take over all the jobs that humans do. And of course, that hasn’t happened. We’re nearly 70 years later, and we’re still struggling. And the human brain turns out to be a lot more complicated than they anticipated.

And also, we don’t understand exactly how the human brain does all the amazing things it does. And so AI has gone through these cycles of “AI winters” where they promised things and underdelivered and they lost funding. And then they came back, and then they over promised underdelivered, and (laughs) then another AI winter, and then they came back.

And recently there’s been an upsurge as you noted, in interest in AI, and I think it was 2019, the National Association of Advertisers in the US said AI was the marketing word of the year. And all these companies were saying ‘we do AI,’ and they’re basically just doing software programs. And it wasn’t really any, anything special, but as you noted the real bombshell that’s happened was in November when ChatGPT was released.

And it is absolutely astonishing what it does. And particularly given what it is, all it is, it takes a large database of text, looks for statistical patterns, and then generates text looking at what is most likely sequences of words to output. And it can do just amazing things that is absolutely astonishing.

But it isn’t an intelligence in any meaningful sense of the word. And it can’t, doesn’t know what any of the words mean. And so it flounders greatly when it has to do anything that requires actual intelligence.

SHEFFIELD: Yeah, that’s right. And in terms of how it functions though, let’s maybe talk about that. Like how does ChatGPT choose the words? And people have made the analogy, and I think it’s a good one, that it functions actually very similar to the way that your autocorrect on your phone works to choose what word comes next

SMITH: I believe that’s actually the origin of it. They were trying to improve the auto complete and there are things like if you say ‘I fell,’ well, probably down is a likely next word that you might have. And it just, it does that, but it takes it into the context and it doesn’t just look at the last word. It looks at several words and it doesn’t just do it one sentence at a time.

It remembers previous sentences. And so it is much, much improved in terms of the auto complete function, but it isn’t much more than that in terms of making coherent, even articulate sentences that it generates, it can generate sentences and generate paragraphs and generate essays, which sound like they’re written by humans.

But underneath the fluff, it often makes terrible mistakes. It often says biased things. It often goes off the rails. And because it doesn’t know what it’s saying, it doesn’t know what any of those words mean. And so it can’t judge whether what it says is true or false. It wasn’t trained to do that.

And so it has no way of judging whether its statements are true or false. It has no way of judging whether it’s racist or sexist. It has no way of judging whether it’s inflammatory or unhinged. And so that’s the fatal flaw in AI. And it’s been the, it’s been the fatal flaw, I’m sorry. It’s the fatal flaw of ChatGPT, and it’s been the fatal flaw of AI since the beginning in that these programs look for statistical patterns, statistical regularities, and they base their output on those regularities, but they have no way of judging whether it’s correlation or causation.

And so they’ll come up with statistical patterns which are temporary, fleeting, meaningless.


SCHATTEN: I have a question. I actually have a question for you, Gary. Yeah. So at what point is the system so effective? So let’s say there the next stage of ChatGPT is going to be where a lot of the errors are, actually the errors become fewer and fewer.

SMITH: Yeah. Yeah.

SCHATTEN: And so stage one will be just a more sophisticated system that’s just predicting the next word. I mean, that’s all it’s doing.

SMITH: Yeah.

SCHATTEN: As you said, it doesn’t actually understand anything. So is our point where that becomes so compelling and the error rate is down that we don’t actually care if it understands anything at all, if it’s really effective.

And then the second question is what if we do get to a point where there is understanding and what does that look like? Do we end up with are we, how far are we from an AGI?

SMITH: I’ll answer the second question first, which I don’t think scaling up the thing is ever going to create understanding as long as we don’t know work–

SHEFFIELD: Let’s, I’m sorry. Can we, let’s define AGI first for people who don’t know that acronym.

SMITH: Yeah. So AGI is so artificial intelligence that works right now, it does really well, does so on very specific focused tasks like winning a game of Go, or winning a game of chess, or scanning barcodes or robots on an assembly line.

But AGI is, well, intelligence in general is a highly debated thing. What, how do you measure intelligence? How do you quantify intelligence? Because there’s often many different kinds of intelligence. But the idea of AGI is you can learn things in one context and apply them in a different context. It’s what the human brain does.

And so we observe things. We make, try to make sense of the world because we live in the real world. And then we see something and we apply what we’ve learned previously, and we apply that knowledge to the current thing.

SCHATTEN: And so, so do you think we, do you think we get there to an artificial general intelligence where the machines are simply better at pretty much everything than human beings?

Or do you think I don’t have, so we’ve seen estimates, we’ve seen estimates that maybe it’s 2030 and maybe it’s 2080. Where do you come down on that?

SMITH: I don’t know what it’s going to come down. I mean, Doug Hoffsteader is a guy, one of the early pioneers who got frustrated with trying to model the human brain because the human brain is just so absolutely astonishing.

And he says it won’t be in his lifetime or his children’s lifetime, maybe his grandkids lifetime. In principle it could happen, but it’s not going to happen through just predicting words. And so I really–

SCHATTEN: But my first, but that’s my first question, which is at what point does the prediction mechanism get so good that the fact that it actually understands nothing falls off from being relevant?

SMITH: Yeah.

SCHATTEN: Let’s just imagine we have, so ChatGPT, so I started tinkering with the stuff a couple years ago. And with the early versions of GPT, it was ridiculously low level. They couldn’t even do middle school, couldn’t even do middle school writing.

SMITH: Yeah. Yep.

SCHATTEN: And then when I see what has happened in the last 18 months, which has culminated in ChatGPT, that’s what’s gotten all the headlines. But the advances were clearly happening week by week by week with GPT-3, the advances were astonishing.

So I was not actually surprised by ChatGPT, because I could see all the developments that have been happening. And we could have had this conversation 18 months ago, and you would’ve said, look, this thing doesn’t even write at the level of a middle schooler. And I would’ve said the same thing at that point, which is, no, it doesn’t.

But look at the rate of change. Look at how fast it is progressing, and maybe it doesn’t matter if it doesn’t understand anything. And now we’re a year, 18 months later, and now it can do a lot of writing I can’t do.

The sophistication, whether that’s in poetry, whether that’s in prose, whether that’s figuring out what I should cook for dinner and the recipe that I should use, or figuring out where I should travel in Europe this summer, I mean, I’m just going through some examples. I mean, it really lets you iterate back and forth.

At what point do we say, eh, it doesn’t understand anything, but it’s damn good. And for that reason it is.

SMITH: Yeah. It’s astonishing. It’s astonishing.

SHEFFIELD: Yeah. And actually, sorry, can I interject? Sorry, one thing to, to your point, Jeff, what you’re basically talking about here is kind of like a, it is a popular version of the Turing Test, which is named after the first real computer science scientist Alan Turing, who basically said that you could declare that a machine was intelligent if it could simulate output that a human would generate and it wouldn’t matter and if somebody would think that it was a human or couldn’t tell.

But Gary, you have written explicitly on this idea of whether that is an adequate test. Why don’t you talk (crosstalk) there in that regard.

SMITH: Yeah. Ironically, I think it’s actually too smart to pass the Turing Test, and you ask it questions and it’ll answer themselves.

SCHATTEN: It’s too smart.

SMITH: Smart, better than any human could ever answer them. So you’d have to dumb it down to pass the Turing Test oddly enough. And put in grammatical mistakes, and put in misspellings, and not answer questions, or answer them incorrectly in order to pass that test. And so it’s kind of an ironic thing.

Going back to Jeff’s point, I don’t like the word intelligence so much because it’s such a vague, amorphous thing. And I like the word competence.

And so you say something like, somebody applies for a job, 100 people apply for a job. Are you going to trust AI to decide who gets the job? Based on we look at the facial expressions, we look at the words on the resume, and does it have the competency to make a decision like that? Or to decide who gets a loan?

Or if somebody is convicted of a crime, to decide how long the prison sentence should be. Text generators can’t do that.

SCHATTEN: I mean, I mean, for all of those example, for every one of those examples, we’ve already shown that computers are better than humans, or in combination with humans, you have the ideal output.

So we can start with the sentencing question. And this has been, this is, this was covered in Malcolm Gladwell’s new book, Talking to Strangers, where he goes through, I mean, all of the evidence of, it’s not that it’s perfect, it’s that should an offender be given parole, right? Should they be given parole? Should an offend should they be given bail?

I mean, these are the main questions, core questions in our criminal justice system. And it’s already shown that it’s not that if you feed it into an algorithm that it’s perfect. It’s that it’s much better than the judges and it’s not as racist.

So it’s already been shown in that example on hiring, on mortgage applications, yes.

And if you take something like Lending Club, which it’s a small company that does, that does much more sophisticated lending inputs for consumers.

And it’s taking in a hundred different data points, and there’s really not a person involved. And yes, their decisions on who gets a loan, who doesn’t, is better than the person that’s sitting across from you who’s filled with bias and filled with their opinion of you and your character as opposed to the hundred different data points that are on paper.

I mean, we could pretty much point to anything at this point and show that either a combination of algorithms and humans or algorithms by themselves is superior.

SMITH: Yeah. I think the evidence is exactly the opposite. Like in the sentencing thing, the evidence I’ve seen is that algorithms don’t do any better than look at the age of the person, and the gender of the person, and how many times they’ve been arrested. And in addition, it is terribly biased, biased against Black people in particular.


SCHATTEN: There was, I had, so I had a, I had a, so just to kind of take your side as well to show that yeah, obviously these things are very complicated.

I had a podcast a couple years ago that I did for a couple years, and I had the CEO of Progressive Insurance on my podcast, and I asked her, I really wanted to drill down on their use of AI and algorithms for insurance purposes, for Progressive.

And she just shut it down and wouldn’t even discuss it. Because there actually are a lot of reports, especially with that company, in terms of bias in the algorithms.

Look, algorithms are only as good as garbage in, garbage out. It’s only as good as the information coming in. So if information coming in is biased, you’re going to have bias with them.

I don’t mean to stand to defend algorithms at this point. That’s why, I think the current best practice though is this combination of AI and humans. And I think of the best example that I’ve come across, and this is not in terms of deciding the fate of people.

But Beethoven’s 10th Symphony is a combination, right? So Beethoven wrote nine symphonies and he died. And when he died, the 10th symphony was 5 percent of the way done. And so that’s it, right? So for hundreds of years, we’ve had 5 percent of a symphony of Beethoven. And that got fed into a predictive algorithm that produced thousands of different potential symphonies, each of which was flawed, right?

So it’s pumping out this, it’s pumping out that, each one in of themselves has lots of flaws. So they took Beethoven scholars, and the Beethoven scholars pulled the best parts from the symphonies that the AI put out, and put it together for a full symphony that has been played by the Berlin Philharmonic.

And it’s astounding. I mean, it’s really amazing. And AI right now, sitting on its own is not always that compelling. It really does, for many things, take a person to curate it. My question is, we can see where this is going. 20 months ago it was, let’s say 80% humans, 20% AI for a lot of applications.

Today, maybe we’re at 50 50, but we can see the trend, right? We can see that for many areas of our society, we’re moving to where it’s more and more AI and less human. I’ll give you one other example. I’m a huge fan of ChatGPT and the AI art generative programs like Midjourney. So I had ChatGPT write a children’s story for my kids about how Sam and Milo were turned into chocolate by a magical dinosaur.

And then I had to take the input from the text that ChatGPT put out, and I created all of these images around it and I turned it into a kid’s book. Yeah. I’m a business professor and now I’ve written a kids’ book, right? (laughter)

Now, so that’s, but I had to curate it. And that was two months ago. Today that can be done without me.

I mean there are already forms where you can take the images, the text from ChatGPT on a story, and it automatically puts out these Midjourney-type AI images. And that’s just two months later. I’m just illustrating the trend.

SMITH: Yeah. Yeah.

SCHATTEN: So much getting rid of the, a lot of things,

SMITH: A lot of things talk about there.

A couple years ago, I think it was one year ago, Admiral Insurance, which is Britain’s largest insurance company, said it was going to use AI to price car insurance. And they had it all set up to go, and they were going to base it on the words you used on your Facebook. For example, did you like Leonard Cohen or did you like Michael Jordan?

SCHATTEN: Definitely Leonard Cohen. (laughter)

SMITH: And they boasted how it’s not a static thing, that they’re constantly changing the words that they used to price insurance, which tells me it was, they were just basing it on coincidental correlations. They weren’t basing on anything real. If it was real, it wouldn’t be changing all the time.

And it’s utter nonsense. And they actually didn’t do it. The day it was going to launch, Facebook stepped in and said, we have ‘Rule 34 subsection BC’ which prohibits anyone from using Facebook posts to price insurance.

SCHATTEN: Hmm, interesting.

SMITH: But the idea that you could use statistical correlations between words you use on Facebook and your driving habits is silly to me.

And it’s also likely to be discriminatory. I mean, whether you like Larry Cohen or Michael Jordan has something to do maybe with your gender, and your race, and things like that. So that’s discrimination.

SCHATTEN: So the fact that there are going to be some really bad algorithms, I think is a given. We can look at the Zillow algorithm where they used very sophisticated modeling to figure out what to purchase, what to pay to purchase homes across the country.

They overpaid and they almost went bankrupt doing it.

SMITH: Yeah.

SCHATTEN: It’s a bad algorithm. And so yeah, we certainly will see many really bad algorithms. I don’t think that changes the trajectory of what we’re seeing, where the good ones are pretty compelling and yeah, there’s going to be lots of failed experiments in the process, but I don’t think that changes the fact that we’re seeing this, like this split between humans and computers is shifting in one direction and that’s moving more and more to computers. Yeah.

SMITH: Oh, Zillow. Zillow is an example of how they were convinced, even though it was a bad algorithm, they were convinced it was real. Because it’s AI. If it’s AI, it’s got to be right. Same thing, Amazon had created these algorithms for scanning resumes for job applications, and then they had to abandon it when they figured out that it was sexist.

And it was looking through the resumes, and if you went to an all-women’s college, you got dinged. If you played on an all-women’s sports team, you got dinged because there weren’t a lot of Amazon engineers who went to all women’s colleges or played on women’s sports teams.

And my fear is that people are going to think ChatGPT is so human-like, we can just let any old algorithm go out there and decide who gets insurance, who gets a job, how many years you go to prison, price, things like that, that people have an over overestimation of how good AI is right now.

SCHATTEN: Well, I mean, I mean, I think that’s going to be the challenge going forward, right? I mean–


SCHATTEN: They’re only as good as their inputs are. I mean, I’ll give an example. Drawing from my discipline, I went down a rabbit hole with ChatGPT for an organizational behavior question.

And it took two core articles and I was able to ask it about statistical significance. I was able to ask it about theory development, about which authors agreed with this article, which authors disagreed with it, right?

And so, I had a full conversation with it, except one of the articles doesn’t exist.

So one of the articles exists, and the conversation was actually pretty good. And one of the articles just didn’t exist. And it was saying that it existed in one of the top management journals. And the author didn’t exist. So right now, yes, that is the current state of it. And another example, I asked ChatGPT, what is 5+5, and ChatGPT says 10.


SCHATTEN: And I said, no, it’s 8. And ChatGPT says, I’m sorry I was wrong. 5+5=8.

SMITH: It’s really terrible. It’s really terrible at math. The amazing math. It’s worse. It can do any math at all. It’s not trained on math. I mean,

SHEFFIELD: well actually no, that, that’s a, that’s an interesting topic to talk about Gary for a second. Because yeah, the way that AI, both image and text generation works is that they don’t actually process words. They process tokens. So they convert the words, parts of words.

SMITH: It could be a part of the word or whole word.

SHEFFIELD: Yeah, that’s right, into a part of a word. So they break it down and then they check for statistical significance with these tokens. And the way that that does create fundamental problems for math.

So like, I’ve had some fun, and I’ve talked about it on Twitter, about how ChatGPT cannot even count the number of characters in your input in a valid way. Like it would also even give you a different answer multiple times. That you can ask it, “How many characters are in the following?” And it sometimes it will say 6, or sometimes it will say 10.

But at the same time to what Jeff’s saying, it is true that you can make exceptions. The way that LLMs process mathematical instructions isn’t the same way, but you can create exceptions to fork out to another function.

And so if you do ask ChatGPT to solve calculus problems for you, it actually can do it correctly. And it can solve trigonometry and other things like that. But if they haven’t made exceptions, then it can’t get it correct.

So there’s this kind of a very basic fake encryption called ROT-13, which is where you just replace each letter of a statement with 13 characters later in the alphabet.

And so it’s like a very simple cipher of how to change text. And OpenAI never made an exception for ROT-13. So it actually can’t even do that, but at the same time, it can do calculus because they created the functions for it.

So the question is, and this gets into the other question of bias and the inputs because it is going to be possible that if you have this corpus of data that has trillions of words, we’ll say, and you can create, you can find statistical significance in predicting the next word.

Then if that output can be cross-referenced with a list of known facts, then it can be rejected if it contradicts with those known facts.

In other words, if the next word is dalmatian dogs are brown and it crosschecks that to known facts and it will say, no, that’s wrong. I will not generate that. And it goes back and does it again.

And so it is possible, to what Jeff’s saying, that it is possible to validate these inputs, but the question is, who is determining what is a known fact? And what is a known fact?

That’s where we get philosophical in all of this, right, Gary?

SMITH: Which is why I like competence rather than intelligence. Let me give a couple examples here.


SMITH: One’s a business example. So I’ve got a colleague here at Claremont. He’s at Claremont McKenna College, Matthew Kyle. And one of the things he does as a sideline is make economic predictions for the Inland Empire, which is Riverside County, San Bernardino County out here.

And so he asked ChatGPT to write a 500-word summary of the current economic condition in the Inland Empire and projections. And it generated a 500-word essay and was full of a bunch of economic words, and most of them were a bunch of BS.

And at one point it said the unemployment rate in the Inland Empire is 7.4%. And the very next sentence, it said, the biggest problem of the England Empire is the labor shortage. And so it had found two different things and it put them together, because it didn’t understand that high unemployment means there’s not a labor shortage.

It doesn’t know what the words mean. And so it made this egregious error. And I don’t see how training in larger databases is ever going to allow to recognize that, that logical reasoning, that high unemployment does not mean labor shortage.

SCHATTEN: So Gary, let me ask you this. Do you think if you and I, if the three of us return to meet together in five years, will these models be better than they are today or worse than they are today?

SMITH: They’ll be better conversational for sure, but I don’t think they’ll have logical reasoning. Let me give you another example.

SCHATTEN: Wait. But will they be better? Will they be will they be catching, making more errors or fewer errors in five years?

SMITH: Fewer errors, few errors for sure. And part of it is they got these human handlers in there who’re fine tuning.

And so you do so,

SCHATTEN: So my assumption, so, so let me just give you, so there’s 175 billion parameters, with the “B,” that ChatGPT is trained on.

We don’t know what GPT 4 is going to have. It’s estimated it’s going to be potentially 10 trillion.


SCHATTEN: And that, so between the expanse in the number of parameters, but then also the learning, the mass learning that’s going on from the billions and billions of dollars that’s being spent by Google, by now, Elon Musk.


SCHATTEN: By OpenAI, by China, by Russia. What I mean is every single organization and entity on the planet right now is spending hoards of cash on developing these models. I don’t mean that whether the valuations are there or not, but I mean, the money that’s going into this technology is going to make it more and more compelling.

And I need look no further than the advances that I’ve seen just in 18 months to see the trajectory that we are on. And every time that I think that these things can’t do something, I turn around and it’s actually doing them pretty well. And so I, I do think we can spend our time, and I imagine that in five years we’ll still look at things and say, oh, wow, it’s missing this, it’s missing that.

But we’ve effectively gone from crawling to being in a spaceship in just like 18 months. And so yeah, there’s many things that I wanted to work on. So one is just the factual part. We’ve also touched on a lot of the bias issues that are going to be inherent in all of these models.

But the part that I find so interesting is just the capacity to change our society that we’re looking at right now feels like the invention of electricity, for example.

I mean, AI, it feels like electricity to me. You don’t, we actually don’t know what the outcomes are going to be.

In my mind, I think that we’re going to be shocked and amazed at some of the amazing, compelling outcomes I imagine that, we, I don’t see why this, why all of the AI that’s going into genomics doesn’t make a world where my children live to be 130, 150, where we do start turning this onto energy issues, into all the, a lot of the pressing problems in society.

And at the same time, I imagine that this gets in the hands of people that are very scary. And they’re able to create pandemics that make Covid look like nothing. That it will be turned on weapons systems to create far more destructive weapons.

And it’s stuff like that why I think it’s like electricity. It empowers people to such a degree that we can only surmise at this point in the hands of 8 billion people what that looks like.

To view this content, you must be a member of Flux's Patreon at $3 or more
Already a qualifying Patreon member? Refresh to access this content.