Optimal Cognitive Encoding: High-Precision Prompt Engineering
This article builds on The Information Revolution, where we laid the theoretical framework of signal and noise in tenders. Here, we dive a step deeper: how to encode information so that an LLM processes it with maximum precision? The answer lies at the intersection of Shannon, Grice, and transformer architecture.
Beyond Recipes
Give a role. Provide context. Show an example. These tips are useful but superficial. They don't address the fundamental question: why do some formulations produce radically superior results to others?
An LLM doesn't understand your intent. It calculates, token by token, the probability distribution of the next token, conditioned on all the previous tokens. Every word you write distorts the probabilistic landscape of the response. Optimizing a prompt is sculpting a field of probabilities.
This guide synthesizes the theoretical foundations into eight operational principles — for anyone who wants to move from the craft of prompting to its rigorous engineering.
Principle 0 — What the Model "Sees"
The LLM doesn't receive words. It receives tokens — text fragments cut by a statistical algorithm (BPE, SentencePiece). These tokens are converted into numerical vectors in a space of thousands of dimensions. Your entire prompt forms a context matrix that the transformer's attention mechanism processes in a non-linear way.
Direct consequence: a poorly chosen word not only creates local ambiguity. It propagates a distortion across the entire generation, token after token. The error compounds, like a systematic bias in a calculation chain.
Principle 1 — Reduce Conditional Entropy, Not Gross Entropy
Most practitioners confuse "being precise" with "using many descriptive words". In information theory (Shannon, 1948), the relevant metric is not the gross entropy of the prompt, but the conditional entropy — the uncertainty that remains in the space of possible responses after the model has processed your prompt.
False Precision:
"Give me a detailed, exhaustive, and in-depth analysis of the geopolitical situation."
Each adjective adds tokens but doesn't help the model converge. "Detailed", "exhaustive", "in-depth" are quasi-synonymous from the vector space perspective. The informative signal is null.
True Precision:
"Analyze the geopolitical situation from the angle of Russia-EU energy flows since 2022. Structure: causes → current state → 3 scenarios in 5 years."
Here, each token constrains the space of possible responses. "Energy flows" excludes domestic politics, military, culture. "Since 2022" bounds the time frame. The imposed structure eliminates organizational uncertainty.
The Mental Test: for each word in your prompt, ask yourself: "Does this word eliminate responses I don't want?" If the answer is no, the word is noise. Remove it.
Principle 2 — Exploit the Tree-Like Structure of Attention
An LLM generates sequentially, but its "comprehension" mechanism is hierarchical thanks to the multi-head attention layers. It is more efficient to structure information from general to specific, in a tree-like fashion, rather than in a flat list.
Attention mechanisms allocate their resources based on positional and semantic relevance. Framing information placed first "colors" the interpretation of everything that follows. The same information buried in the middle of a list will be underweighted.
Optimal Prompt Hierarchy:
| Level | Function | Example |
|---|---|---|
| 1. Ontological | What the requested thing is | "Produce a strategic memo" |
| 2. Teleological | Why it is needed | "for the executive committee" |
| 3. Bounding | What is included AND excluded | "Scope: EU only; exclude Asia" |
| 4. Form | Structure, length, format | "2 pages, 3 sections, bullet points" |
| 5. Calibration | Level of detail, register | "Factual tone, expert level, no vulgarization" |
This order follows the logic of progressive reduction of the possible space: each level divides the remaining space. Inverting this order forces the model into costly retroactive adjustments via the attention mechanism.
Principle 3 — Define by Exclusion
In information theory, a signal is more informative the more it excludes alternatives. Counterintuitive corollary: saying what you don't want is often more informative than saying what you want.
The model, during generation, is attracted towards attractors — overrepresented response patterns in its training data. An open-ended request like "explain quantum mechanics" will almost invariably converge towards the Schrödinger's cat analogy and the wave-particle duality, as these motifs are statistically dominant.
Negative constraints ("without using the Schrödinger's cat analogy; start from the Hilbert space formalism") block low-value attractors and force the model towards less probable but cognitively richer paths.
Negative constraints have a higher information-to-token ratio than positive constraints when they target the model's statistical attractors.
This is exactly what a good technical specification does: the most discriminating requirements are often those that exclude — "no SaaS solution", "no subcontracting for lot 2".
Principle 4 — Minimize Anchor Semantic Distance
The model organizes its knowledge in a vector space where semantically close concepts form clusters. If you use a term at the boundary of two clusters, you introduce ambiguity that propagates noise into the response.
The goal is to use anchor terms — words that sit at the center of a dense semantic cluster:
- Canonical technical terms in a domain (exact names of theories, methods)
- Proper nouns (authors, named frameworks, reference publications)
- Terms frequently encountered in coherent, unambiguous contexts
Saying "the Kahneman thing about the two modes of thought" is semantically fuzzy. Saying "the Dual System 1 / System 2 theory (Kahneman, Thinking, Fast and Slow)" anchors the model on a precise cluster. The token overhead is marginal; the gain in precision is disproportionate.
Derived Rule: Jargon as Compression. Technical jargon is not noise. It's a high-density encoding: a technical word compresses an entire definition into one or two tokens. In bid management, saying "MECE" activates a dense semantic network that "exhaustive and mutually exclusive structuring" takes 30 tokens to describe.
Principle 5 — Control Implicit Temperature via Syntax
Beyond the explicit temperature parameter, the very syntactic structure of your prompt influences the "effective temperature" of the response — the degree of exploration versus convergence of the model.
In the training corpus, factual texts use short declarative sentences in the present tense (SVO), while speculative texts employ complex conditional structures. The model has internalized these correlations.
| Syntactic Structure | Effect on Generation | Optimal Use |
|---|---|---|
| Imperative / Present Indicative | Convergence, determinism | Facts, lists, data |
| Paratactic Sentences (juxtaposed) | Concision, focus | Instructions, specifications |
| Conditional, Subordinates | Exploration, nuance | Analysis, creativity |
| Modals (may, seems) | High entropy, fuzziness allowed | Avoid if precision is required |
For Maximum Precision: present indicative, paratactic structures, zero modals. Each "maybe" in your prompt is a permission given to the model to be vague.
This is what distinguishes a punchy executive summary from one that obfuscates. Syntax is an implicit instruction with no token cost.
Principle 6 — The Non-Linear Positional Attention
Transformer research (Vaswani et al., 2017) reveals that the model's attention is not uniformly distributed. It exhibits a marked bias towards the beginning and end of the context — the so-called primacy-recency effect — with a significant dip in the middle, particularly pronounced in long contexts (Liu et al., "Lost in the Middle", 2023).
Implications for Your Prompts:
- The most critical instruction must be at the very beginning OR recalled at the very end
- Voluminous context information (reference documents, raw data) goes in the middle
- Never bury a crucial instruction in a long block of context
This phenomenon has major consequences in tender response systems that inject hundreds of pages of tender documents into an LLM's context. Information buried in the middle is literally underweighted by the attention mechanism — a P0 requirement on page 37 of a 150-page CCTP has less chance of being processed than a trivial requirement on page 3.
Principle 7 — Prompt-Response Isomorphism
The model tends to reproduce the structure of what it receives. This is a powerful statistical bias that can be strategically exploited.
The effect goes beyond mere format imitation. The level of abstraction, granularity, lexical register of the prompt directly calibrate those of the response. If you ask a question with academic vocabulary, the model will reply at that level. The same question phrased in high school lingo will produce a proportional simplification.
The register of your prompt is an implicit instruction with no token cost. It's the most underestimated lever in prompt engineering.
Corollary: if your prompt is a disorderly stream of consciousness, the response will inherit that disorder. If your prompt is structured in clearly delineated sections with explicit markers, the response will adopt comparable rigor. This is why a technical memo written by an well-trained agent is structurally different from one produced by a generic chatbot — the prompt acts as a mold.
Principle 8 — Semantic Compression via Named Reference
The most powerful principle for experts. Rather than describing a concept, name it. Each named reference — a theorem, a framework, a canonical author — is an extreme semantic compression: two or three tokens activate in the model a dense network of thousands of associated knowledge.
| Long Formulation (~30 tokens) | Compressed Reference (~5 tokens) | Ratio |
|---|---|---|
| Structure the analysis in an exhaustive and mutually exclusive way, without omissions or overlaps | Use the MECE framework | 6:1 |
| Update beliefs in proportion to the strength of new evidence, in a Bayesian manner | Reason in a Bayesian way | 6:1 |
| Assume agents are rational and maximize their expected utility in a context of strategic interactions | Game theory framework | 5:1 |
Efficacy Condition: this mechanism only works if the reference is well-represented in the training data. For obscure concepts, combine the reference with a brief operational definition: "Use the MECE framework (exhaustiveness + mutual exclusivity of categories)".
Meta-Principle — The Optimal Prompt Is a Program, Not a Conversation
The synthesis of the eight principles leads to a paradigm shift. The optimal prompt does not resemble natural language conversation. It is akin to a declarative program: it specifies a desired state (the output), constraints, exclusions, a priority order, and a result structure.
This does not mean writing in pseudocode. But each sentence must have an identifiable function:
| Function | Informational Role | Example |
|---|---|---|
| Frame | Define the ontological space | "Produce a summary note" |
| Constrain | Reduce the possible space | "800 words, expert register" |
| Exclude | Block attractors | "No clichés or simplistic analogies" |
| Structure | Impose the output architecture | "Structure: diagnosis → options → recommendation" |
| Calibrate | Adjust level and tone | "For a senior data scientist audience" |
If a sentence in your prompt doesn't serve any of these functions, it is informational noise. Remove it.
Conclusion: The Prompt as an Act of Thought
These eight principles are not "tips and tricks". They are the logical consequences of transformer architecture and the mathematical theory of information. Applying them is about moving from a naive relationship with the model ("I talk to it like a human") to an instrumented one ("I configure an information processing system").
The final paradox is elegant: to extract the maximum from an artificial intelligence, one must first rigorously exercise one's own. The quality of a prompt reflects the quality of the thought that precedes it. No model, however powerful, will compensate for a fuzzy technical specification.
The optimal prompt does not request. It specifies. It does not suggest. It constrains. It does not chatter. It encodes.
What TenderGraph Does with These Principles
The eight principles described in this article are not just theory for us. They are in the code. Every instruction our system sends to the model is built according to these rules — conditional entropy reduction, ontological hierarchy, negative constraints, semantic anchoring, syntactic temperature control, attentional positioning, structural isomorphism, reference-based compression.
But TenderGraph goes further. Our architecture applies mechanisms this article doesn't address: adaptive editorial compression of source documents, strategic context pre-injection into each work phase, prompt caching to maintain coherence across hundreds of iterations, working memory management with fresh context per phase and persistence of user decisions.
The result: a cognitive system that reads a 200-page tender document, extracts the strategic signal, constructs a value proposition grounded in the facts, and writes a technical memo where each argument is traceable to a requirement, each commitment is substantiated, each section is calibrated to maximize the score.
This is the difference between a tool that generates text and a system that thinks the tender. And it's why the proposals produced with TenderGraph bear no resemblance to what the market currently offers.
Read also:
- The Information Revolution: Why AI Amplifies Noise as Much as it Can Eliminate It
- Case Study: When AI Produces Industrial Mediocrity
- How to Write a Technical Memo That Wins Tenders
Theoretical References:
- Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal.
- Zipf, G. K. (1935). The Psycho-Biology of Language. Houghton Mifflin.
- Grice, H. P. (1975). Logic and Conversation. In Syntax and Semantics, Vol. 3.
- Levy, R. & Jaeger, T. F. (2007). Speakers optimize information density through syntactic reduction. NIPS.
- Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS.
- Deletang, G. et al. (2024). Language Modeling Is Compression. ICLR.
- Liu, N. F. et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. arXiv.