Thought Leadership·April 26, 2026·14 min read

Anaphora: from \

\

By Aléaume Muller

AI

Anaphora: from "I have a dream" to induction heads, the figure the Transformer can imitate mechanically

Fourth article in the series on rhetorical figures in the age of AI. After correctio, the mechanism of negation in LLMs, and the tricolon, anaphora -- the figure of opening repetition that structures both "I have a dream" and the outputs generated by large models.

Washington, August 28, 1963. Martin Luther King steps up to speak before the Lincoln Memorial. Two hundred and fifty thousand people have come. He reads the speech he prepared. Mahalia Jackson, two meters away from him, cries out: "Tell them about the dream, Martin!" King leaves his notes behind. He improvises.

Eight times in eleven minutes, he utters the same formula: "I have a dream." Eight identical openings, each followed by a different image -- his children judged by the content of their character, the red hills of Georgia, the little black girl and the little white boy holding hands.

Within the same speech, two other anaphoras layer over the first: "Now is the time" four times, "Let freedom ring" ten times. The entire speech is a scaffolding of opening repetitions -- fifteen in all, not counting the minor variations.

This August 28, 1963 is not an isolated feat of oratory. It is the application of a figure Cicero was teaching two thousand years earlier, that the attention mechanism of a Transformer reproduces today in near-literal fashion, and that modern AI uses massively without always mastering it. Anaphora.


The figure, as the Greeks named it

The word comes from the Greek ἐπαναφορά (epanaphora) -- literally "to carry back," to bring back, to take up again. Aristotle discusses it in the Rhetoric (III, 9) as a means of rhythmic parallelism. The Rhetorica ad Herennium -- long attributed to Cicero, now considered anonymous -- gives the canonical definition in book IV, 13, 19: "cum continenter ab uno atque eodem verbo in rebus similibus et diversis principia sumuntur." When several successive segments begin with the same word.

Quintilian, in the Institutio Oratoria (IX, 3, 30-31), rigorously distinguishes anaphora from three neighboring figures that are often confused with it.

Epiphora -- epistrophe in its Greek variant, the two terms coexist -- does exactly the opposite of anaphora: it repeats the same word or the same structure at the end of successive segments. A classic example: "Who decided it? They decided it. Who voted for it? They voted for it. Who bears the responsibility for it? They bear the responsibility for it." The refrain falls at the close, not at the opening. Symploce combines the two -- anaphora at the start, epiphora at the end -- to produce a double echo, very present in religious litanies and the refrains of popular songs. Anadiplosis, finally, takes up at the start of a new clause the word that ended the previous one, creating a chain of echoes ("Fear leads to anger, anger leads to hate, hate leads to suffering..." -- a structure used by Yoda in The Phantom Menace).

Anaphora alone remains the most present in the history of oratory, because it is the simplest to execute and the most accessible to the ear. It consists in saying the same thing in the same place, again and again, changing only what comes after.

Heinrich Lausberg, in his Handbook of Literary Rhetoric (Brill, 1998, §§ 629-630), catalogs hundreds of examples across two thousand five hundred years. The figure is stable. The figure is universal. The figure is, rightly, over-represented in the speeches that have marked history.


A distinction to establish at the outset: rhetorical vs. referential

Before going further, a terminological trap deserves to be named. In contemporary linguistics, the word "anaphora" designates two different things.

Rhetorical anaphora -- the one in this article, MLK's, Cicero's -- is a deliberate repetition at the start of successive segments. It is a stylistic figure.

Referential anaphora -- the one studied by Halliday and Hasan in Cohesion in English (Longman, 1976) -- is a grammatical mechanism of reference through pronouns ("John arrived. He was late."). It is a figure of cohesion.

Both use the word. Both concern repetition. They obey neither the same rules nor the same functions. The remainder of this article deals exclusively with the first.


The cognitive foundation: why the brain likes repetition

Opening repetition is not a mere phonic ornament. It fulfills a precise cognitive function, documented by psycholinguistics.

Amit Almor, in Noun-phrase anaphora and focus: The informational load hypothesis (Psychological Review, 1999, vol. 106, no. 4), formalizes the principle. Each time a reader or listener encounters a new element in a sentence, they must allocate working memory to construct its meaning. This allocation is costly. It occupies resources that are no longer available to understand the rest.

When the opening structure of a sentence is already known -- because it was laid down a first time and then repeated identically -- the cost of allocation drops drastically. The reader already knows where they will expect the variable information. They can mobilize their cognitive resources on what changes -- the content of the image, the angle, the nuance -- rather than on reconstructing the syntactic frame.

Morton Ann Gernsbacher, in Language Comprehension as Structure Building (Erlbaum, 1990), names this mechanism structure building. The first element of an anaphoric series builds the structure; the elements that follow reuse it. The human brain, sparing of its resources, loves this architecture.

A direct consequence: a well-composed anaphora does not tire the reader. It relieves them. It leaves them the cognitive bandwidth they need to feel the crescendo that the successive images construct.


The great anaphoras of history

The speeches that have marked history almost all exploit the figure.

Cicero, In Catilinam I, 1, before the Roman Senate in 63 BC: "Quousque tandem abutere, Catilina, patientia nostra? Quamdiu etiam furor iste tuus nos eludet? Quem ad finem sese effrenata iactabit audacia?" -- how long still, for how much longer, to what end. Three rhetorical questions hammering the same pressure.

Abraham Lincoln, Gettysburg Address, November 19, 1863: "government of the people, by the people, for the people" -- which is tricolon as much as anaphora, since the preposition varies over a repeated core, "the people."

Winston Churchill, House of Commons, June 4, 1940 (Hansard vol. 361, cc787-798). England has just evacuated Dunkirk. He delivers the sentence that will become iconic:

"We shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender."

Six "we shall fight" within a single syntactic period. Then the close -- "we shall never surrender" -- which breaks the repetition to drive home the resolve. The final break in the pattern is what gives the crescendo its force.

Martin Luther King, Lincoln Memorial, August 28, 1963. The phrase "I have a dream" appears eight times, interwoven with "Now is the time" (four times) and "Let freedom ring" (ten times). Complete rhetorical analysis in Keith Miller, Voice of Deliverance (1992).

François Hollande, runoff debate of the French presidential election, May 2, 2012, facing Nicolas Sarkozy. In the middle of an exchange that had been conventional until then, Hollande launches the tirade that will structure the rest of his campaign: "As President of the Republic, I will not... As President of the Republic, I will not treat... As President of the Republic, I will see to it that..." Fifteen occurrences of "As President of the Republic" over the space of three minutes. On the level of content, each clause sets out a behavioral commitment. On the rhetorical level, each repetition progressively boxes Sarkozy into the position of the opposite -- the one who, implicitly, did what Hollande says he does not want to do. Anaphora becomes a dialectical weapon. The next day, the entire press will headline this sequence, which will enter the manuals of political communication as a textbook case.

Barack Obama, New Hampshire concession speech, January 8, 2008. "Yes we can" closes each paragraph of the second third of the speech. This is not anaphora in the strict sense -- the repetition falls at the end of segments, not at the start -- but the sister figure, epiphora. The two structures produce a similar effect, by mirror-image means.

None of these speeches would have entered History with flat prose. Anaphora is what transforms a series of arguments into a melodic line, a melodic line into emotion, an emotion into lasting memory.


The mechanism on the LLM side: induction heads

Here is the fact that ties this series of articles together. When a Transformer -- the architecture behind all large language models -- processes text, it has a specific attention circuit, particularly sensitive to repetitive patterns, called an induction head.

Nelson Elhage and his colleagues at Anthropic identified it in 2021 in A Mathematical Framework for Transformer Circuits. Catherine Olsson et al. formalized it in 2022 in In-context Learning and Induction Heads (arXiv:2209.11895). The principle is precise.

An induction head is a two-attention-head circuit that learns to detect, then to complete, patterns of the form [A][B] ... [A] → [B]. In other words: when the model has already seen token A followed by token B in the context, and it encounters A again, it predicts B with increased probability.

Applied to anaphora, the mechanism is near-literal. Once the model has seen "I have a dream that my four little children...", it has learned locally that the opening "I have a dream" can be followed by a description of a vision. At the next occurrence of "I have a dream", the induction head actively pushes the probability distribution toward a structured continuation -- a second description of a vision -- rather than an arbitrary output.

Clark, Khandelwal, Levy, and Manning, in What Does BERT Look At? (arXiv:1906.04341, BlackboxNLP 2019), had already identified dedicated coreference heads. Vig and Belinkov, in Analyzing the Structure of Attention in a Transformer Language Model (arXiv:1906.04284, 2019), mapped the heads specialized in parallel patterns. Olsson and her colleagues synthesized these observations into an architecture: the induction head is a fundamental component of in-context learning, the ability of large models to learn patterns on the fly without modifying their weights.

Anaphora is, from a Transformer's point of view, the structure easiest to reproduce. It has a dedicated circuit for it. It has every incentive to use it.


The saturation characteristic of AI

Modern LLMs exploit the induction head at the output stage. They spontaneously produce anaphoric structures, often without the user requesting them. The recurring formulations "You need to... You want to... You are trying to..." at the end of ChatGPT messages are a direct illustration. So are bulleted lists all beginning with the same verb: "Analyze... Structure... Deliver..."

Liang et al. (2024, Monitoring AI-Modified Content at Scale, arXiv:2403.07183) document the over-representation of anaphoric patterns among the stable stylometric markers of AI writing. Juzek and Ward (2024, Why Does ChatGPT 'Delve' So Much?, arXiv:2412.11385) apply a similar methodology to lexical tics. The two studies converge: structured repetition is a maker's mark that automatic detectors use to identify generated text.

Why? Two hypotheses layer over each other.

The first is architectural. The induction head, optimized by training, naturally favors parallel patterns. The model does not decide to stack three anaphoras -- it does so because its attention circuit pushes it to follow the slope of least perplexity.

The second is pedagogical. The human annotators who evaluated RLHF alignment outputs tend to prefer responses that are structured, listable, symmetrical. This preference, propagated by reinforcement learning from human feedback, anchors anaphora in the model's rewarded behaviors. To date, no published study formally demonstrates it, but the convergence with the stylometric observations is striking.


Flat anaphora or crescendo anaphora

Not all anaphoras are equal. Jeanne Fahnestock, in Rhetorical Figures in Science (Oxford University Press, 1999, chapter 4), draws the crucial distinction between repetition that builds to a crescendo and repetition that flatlines into monotony.

A crescendo anaphora uses the fixed structure to free up cognitive resources that feed a semantic progression. The images that follow "I have a dream" do not merely fill a slot -- they rise in power, in generality, in emotion. The two children holding hands follow the red hills, which followed contemporary injustice. The form does not change; the substance climbs.

A monotone anaphora, by contrast, repeats the structure without progression. The variables that follow are on the same level, in the same register, with no elevation. The reader quickly understands that nothing is happening -- and loses interest.

AI mostly produces monotone anaphoras. Not because the architecture forces it to, but because semantic progression demands an authorial intent that a statistical prediction model does not naturally carry. It reproduces the form. It does not build the crescendo.

It is precisely there, in the gap between formal repetition and semantic progression, that the difference between masterful human writing and raw AI output is lodged. Anaphora is an invitation to climb. Not climbing turns the invitation into tiresome insistence.


Practical implications

For professional writing, three rules emerge.

Use anaphora only when a semantic progression accompanies the formal repetition. If the elements that follow are all equal in intensity, in level, in register, the figure adds nothing. It turns into a repetitive drumbeat that wears the reader down.

Calibrate the length. Three occurrences are generally enough. Five maximum for a short text. Beyond that, only a sustained oral performance carried by rhythm and intonation can bear the load -- which is why MLK stacks eight of them, but he does so with his voice, before two hundred and fifty thousand people. The written word does not forgive the same density.

Break the anaphora before the end to create resolution. Churchill's "we shall never surrender," which breaks the cadence of the six "we shall fight," is the signature of a masterful anaphora. The listener expects the repetition; the final break wins their assent. AI, deprived of this intent, repeats all the way through. Manually adding the break after the fact is often what turns a generated anaphora into a mastered figure.


What anaphora teaches us

Anaphora is one of the rare rhetorical figures for which a near-literal mechanical correspondence can be drawn between a Transformer's attention circuit and a human's practice of oratory. Both exploit the same mechanics: a fixed opening structure frees up cognitive resources to process the variation that follows.

Cicero had observed it. Martin Luther King had internalized it to the point of making it the central improvisation of his best-known speech. Anthropic's engineers discovered it while mapping the internal circuits of their models -- and gave the mechanism a technical name that would have made Quintilian smile: induction head. The vocabulary changes. The principle is the same.

What this tells us, those of us who write in the age of LLMs, is simple. Recognizing the figure, naming it, understanding what it does to the brain and to the model, lets you use it with precision. A well-placed anaphora is worth a thousand adjectives. A flat anaphora betrays an automatic production that no one reread.

The point is not to ban the figure. The point is to give it the progression it demands.


The next article in the series will explore the chiasmus -- the figure that inverts the order of terms to achieve an effect of reversed symmetry. A structure that AI produces far less naturally than anaphora -- and which, for that precise reason, deserves a stop.


Main sources

  • Aristotle, Rhetoric, III, 9 (1410a).
  • Rhetorica ad Herennium (anonymous, 1st c. BC), IV, 13, 19.
  • Quintilian, Institutio Oratoria, IX, 3, 30-31.
  • Lausberg, H. (1998). Handbook of Literary Rhetoric. Brill. §§ 629-630.
  • Lanham, R. (1991). A Handlist of Rhetorical Terms. 2nd ed.
  • Halliday, M. A. K. & Hasan, R. (1976). Cohesion in English. Longman. (Referential anaphora, not to be confused.)
  • Almor, A. (1999). Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review, 106(4), 748-765.
  • Gernsbacher, M. A. (1990). Language Comprehension as Structure Building. Erlbaum.
  • Givón, T. (1983). Topic Continuity in Discourse. Benjamins.
  • Jakobson, R. (1960). Linguistics and Poetics, in Style in Language (Sebeok, ed.).
  • Fahnestock, J. (1999). Rhetorical Figures in Science. Oxford University Press, ch. 4.
  • Miller, K. (1992). Voice of Deliverance: The Language of Martin Luther King, Jr., and Its Sources. Free Press.
  • Houck, D. & Dixon, D. (2006). Rhetoric, Religion and the Civil Rights Movement. Baylor University Press.
  • Clark, K., Khandelwal, U., Levy, O. & Manning, C. (2019). What Does BERT Look At? An Analysis of BERT's Attention. BlackboxNLP. arXiv:1906.04341.
  • Vig, J. & Belinkov, Y. (2019). Analyzing the Structure of Attention in a Transformer Language Model. arXiv:1906.04284.
  • Elhage, N. et al. (2021). A Mathematical Framework for Transformer Circuits. Anthropic. transformer-circuits.pub/2021/framework/.
  • Olsson, C. et al. (2022). In-context Learning and Induction Heads. arXiv:2209.11895.
  • Liang, W. et al. (2024). Monitoring AI-Modified Content at Scale. Stanford. arXiv:2403.07183.
  • Juzek, T. & Ward, Z. (2024). Why Does ChatGPT 'Delve' So Much?. arXiv:2412.11385.
  • Historical speeches: Churchill (Hansard vol. 361, June 4, 1940); Lincoln (Gettysburg Address, November 19, 1863); MLK (August 28, 1963, archives.gov); Obama (New Hampshire, January 8, 2008); Cicero, In Catilinam I.

Tags

#AI#LLM#rhetoric#anaphora#induction-heads#attention#cognitive-linguistics

Next step

Ready to transform your tender response?

Keep reading

Recommended articles