I am still surprised that LLMs work at all.
Think of the problem they solve: with no prior knowledge of the world or any hints about grammar, an LLM inhales vast quantities of text. After this process, it is able to use language in the manner of an intelligent human being. How? I don't mean on a technical level how do LLMs work (though this is a fascinating question), I mean how on earth is it possible at all to learn language and culture just from a lot of text?
I'm stuck on just this one point. That is not what I thought language was. I didn't think that the concept of "apple" could be abstracted from simply the use of the word "apple" in ten thousand contexts. I thought that the thing was important, that my whole sensory experience of apples was the real thing and that the word was just a shorthand for that experience. I would have thought that maybe you could glean something from statistical language patterns, but that it would be limited.
What I'm trying to get at here is that I am astonished that culture is embedded in text in a way that even in principle can be decoded by a system. There is a Chinese section in my local library. Given infinite amounts of time, I would not be able to learn to write coherent paragraphs in Chinese just by passing my eyes over the text, though I might learn something, glean some patterns. What large language models do is crack the code of language not just without a dictionary, but without the common embedding in reality that we have as humans. Is language a closed system that can be understood without reference to any world outside text? It seems like the success of LLMs point to an answer "yes". Our patterns of thought are literally there, encoded in the way we put them into language. Although an LLM's concept of "apple" is necessarily completely devoid of sensory experience, it can reason coherently about apples.
I don't know how to get any purchase with this question about what LLMs mean about language, other than to just keep asking it. I feel that this is an interesting moment where we have these pure language models. Soon multimodal systems will be all the rage and the moment to marvel about what it means to crack language by itself will be lost.
On the "apple" tangent I gave GPT-3.5, GPT-4, and Pi AI a riddle I saw on Twitter the other day: "How do you divide two apples fairly between three people with just one cut?" GPT-4 got it in 5 tries, GPT-3.5 floundered, and Pi AI had me going with some creative responses, but ultimately couldn't get it.