Generative AI — Hype and Reality

Read the Text

Few technologies in living memory have arrived accompanied by such an oversupply of certainty. The advent of large language models has produced, almost simultaneously, the conviction that the machines are about to displace the white-collar economy entirely and the conviction that they are sophisticated autocomplete with no future of consequence. Both positions are stated with the authority of insider knowledge; both proceed largely by extrapolation from the same handful of demonstrations; and both, on inspection, mistake fluency for the sort of understanding only the most hedged claims have any business invoking. The honest stance, less rhetorically satisfying than either extreme, requires distinguishing what these systems actually do from what they appear to do, and treating the gap between the two as the central object of analysis rather than an embarrassment to be managed.

What a language model does, in the technically unromantic version, is predict probable continuations of text given vast statistical exposure to prior text. The remarkable fact is not that this procedure can produce strings indistinguishable from human writing — that, at sufficient scale, was perhaps less surprising than it felt at the time — but that the same procedure, with no architectural changes, can be coaxed into solving novel problems, summarising arguments, and explaining its own outputs in plausible terms. The temptation, faced with so capable a behaviour, is to attribute to the system the inner life one would expect of a human producing the same outputs. This temptation should be resisted not because it is impolite but because it is, at present, unsupported. Plausibility of output is not evidence of comprehension; it is, in the first instance, evidence only that the training distribution contained material similar to what the user requested.

Yet the cynical position — that fluency is mere surface, masking nothing — is equally underwritten by enthusiasm. Models trained at sufficient scale develop internal representations that, when probed, support genuinely surprising behaviours: arithmetic carried out without a calculator, transitive inferences over chains of fictional characters, the apparent extraction of stylistic conventions from a single example. Whether any of this amounts to ‘understanding’ in the philosophically demanding sense is precisely the open question, and to declare it settled, in either direction, is to mistake one’s prior for the available evidence. Were the systems merely stitching memorised fragments, certain failures should not occur; were they fully comprehending, certain other failures should not occur. The actual error pattern fits neither pure caricature.

The economic forecasts inherit the same imprecision. Predictions of mass displacement assume that current capability scales smoothly to the tasks that constitute most of human employment, while predictions of negligible impact assume the opposite without quite saying so. What can be observed already, however, is more interesting than either extreme: a sharply uneven pattern in which routine textual labour is partially absorbed into the systems while tasks requiring sustained context, accountability, or physical presence have so far proved more stubborn. The displacement, where it has occurred, has tended to be partial and reorganising rather than total — fewer junior copywriters, more senior editors checking machine drafts; fewer first-pass paralegals, more lawyers triaging machine outputs. Whether this pattern will hold or capitulate to a more dramatic discontinuity is, again, the open question, and the temptation to substitute confidence for analysis remains the chief impediment to thinking clearly about it.

What the situation requires, then, is the discipline of holding two propositions in mind at once: that the systems are genuinely impressive and that the gap between fluency and understanding is real, important, and not closed by any current method. To attend to one without the other is to slip into a familiar pattern in technology discourse, in which each cycle’s enthusiasts overstate near-term capability while its sceptics understate long-term trajectory, and the more accurate description — incremental, uneven, transformative in some domains and irrelevant in others — falls between two confident factions. Were the cost of getting this wrong purely intellectual, the matter could be left to academic argument. But policy decisions about education, employment, and the allocation of trust are being made now, and they are poorly served by the genres of certainty currently on offer.

Questions

1 / 12

What does the writer identify as the common error of both AI extremists in the first paragraph?