The Hallucination Control Playbook for Mining Professionals

If hallucinations are the reason you refuse to use AI in mining, you should probably stop trusting engineers too.

Every mining capital project execution system is built around the assumption that humans make mistakes. That is why every deliverable runs the gauntlet — discipline review, lead review, interdisciplinary review, project management review, client review. The QA/QC stack is not a courtesy. It is operational reality, and it exists because we already know humans misread scope, carry the wrong revision forward, mistype values, miscalculate, and miss critical details. None of that is controversial. It is normal engineering execution.

Treat AI the same way.

The right question is not “can AI hallucinate?” It is “can we prompt and review AI well enough to materially improve the workflow?” The answer is yes — but only when prompting is treated as operational control, not as typing fast into a chat box.

This article is the framework I use to reduce hallucinations to the point where AI outputs survive the same level of technical review an engineer’s work survives.

Where hallucinations actually come from

Before the rules, the diagnosis. Most hallucinations I see in mining work trace to one of five causes:

No grounding context. The user asks the model to recall a number, a flowsheet detail, or a regulatory citation from memory instead of pasting the source. The model fills the gap with a confident-sounding fabrication.

Vague scope. The prompt does not define the deliverable, the audience, the boundaries, or the acceptable level of uncertainty. The model decides for you, and decides badly.

Implicit assumptions. The user assumes the model knows what “typical” means in a porphyry copper context, or what “standard” looks like for an AACE Class 4 estimate. The model picks generic defaults that look right and aren’t.

No instruction to refuse. The model has not been told it is allowed to say “I do not know.” Faced with a question, it answers. That is the default behaviour. You have to override it.

Compound questions. The user stacks three or four asks into one prompt. The model gets the first one right and drifts on the rest.

Every rule that follows is a counter-pressure on one of these five causes.

The six core rules

These are the six rules I put at the top of any mining prompt where accuracy matters more than fluency. Short, declarative, and they materially change output behaviour.

Do not invent facts

The plainest instruction in the stack. It tells the model: if something is not in the context I have given you, treat it as unknown. This single rule prevents most fabricated citations, made-up project names, and invented testwork results.

Never fabricate values

Specific to mining work and far stronger than Rule 1 alone. Numbers are where fabrication does the most damage. A model will confidently produce a recovery, a head grade, a CAPEX intensity, or a reagent consumption rate that looks plausible and isn’t. Telling it explicitly not to invent values flips the behaviour: the model now returns “not stated in source” instead of an attractive guess.

If information is missing, state it explicitly

The partner rule to Rule 2. Telling the model not to invent values is half the work. Telling it what to do instead is the other half. “If a value is missing, return ‘not provided in source’ and continue” gives the model a clean failure mode that does not look like silent omission.

Cite the assumption behind every value

For technical work, this is the rule that catches the most errors during review. Every recovery, every cost, every throughput figure should be traceable to either a source document or a stated assumption. The model is fully capable of writing “assumes 92% Cu recovery per Table 14-3 of the PFS” or “assumes AACE Class 4 contingency of 25% applied to direct costs.” Force it to.

State a confidence level

Borrowed from how senior engineers actually talk. High confidence — source-backed. Medium confidence — interpolated from analogous data or peer projects. Low confidence — extrapolated from generic industry ranges. Confidence labels do two things at once: they make weak claims visible during review, and they discipline the model to think about its own basis before answering.

Refuse to draw conclusions the data does not support

The most under-used rule in mining prompting. Models default to producing a conclusion because they have been trained to be helpful. For technical work, the right answer is often “the data provided is insufficient to conclude.” Tell the model that this is an acceptable answer. Otherwise it will manufacture one.

The four mining-specific rules

The six above cover the core. The next four are where mining work gets specific, and where I see the most expensive errors get through.

Quote the source when extracting a value

For NI 43-101, SK-1300, and JORC work, every extracted number should travel with its source quote. “Initial CAPEX of US$412 M, per Section 21.1, Table 21-1.” If the model cannot produce the surrounding sentence, it probably invented the number.

Distinguish between source values and derived values

Mining studies are full of derived numbers — NSR, cut-off grade, capital intensity per annual tonne, all-in sustaining cost. The model should label which numbers are direct quotes from the source and which are calculated from other inputs. A derived value built on a wrong input compounds quickly, and the only way to catch it during review is to see the calculation laid out.

Flag every unit, currency, and basis conversion

USD vs CAD. Short tons vs metric tons. Dry basis vs wet basis. Recovered metal vs contained metal. LOM total vs annual average. The model will silently convert if you let it. Force it to declare every conversion and the conversion factor used. In my own review of AI outputs on mining content, this is the single highest-frequency error category.

Time-stamp every reference where time matters

Commodity prices, regulatory references, peer project disclosures, and tax rates all have an as-of date. The model should attach a date to every value where time is material. “Cu price of US$4.20/lb, LME spot, 15 May 2026.” Without the date, the value is unreviewable.

The production-ready anti-hallucination block

This is the block I paste at the top of any mining prompt where accuracy matters. Copy it, tune it to the task, keep it standing in your prompt library.

Operating rules for this task: 1. Do not invent facts. If something is not in the context provided, treat it as unknown. 2. Never fabricate numerical values. If a value is missing, return "not provided in source" and continue. 3. For every value cited, state the source: either the document and section, or the assumption. 4. Distinguish between values quoted directly from source documents and values derived through calculation. For derived values, show the inputs. 5. Declare every unit, currency, and basis conversion explicitly, including the conversion factor used. 6. Time-stamp every reference where time matters (commodity prices, regulatory citations, peer project disclosures, tax rates). 7. State a confidence level for any interpretive answer: High (source-backed), Medium (interpolated from analogous data), Low (generalized industry estimate). 8. Where the data provided is insufficient to support a conclusion, say so explicitly. Do not manufacture a conclusion to be helpful. 9. If any part of this brief is unclear before you start, ask. Do not assume. 10. Ask clarifying questions if required.

This costs nothing to paste and changes outputs materially from the first response.

Why these rules work

A short note for the engineers who want to know the mechanism.

Large language models are trained to be coherent and helpful. Those two objectives can quietly outrank accuracy when the prompt does not push back. Every rule above is a counter-pressure: it gives the model permission, and instruction, to break coherence in favour of correctness. “I do not know” is a coherent answer only when the model has been told it is allowed to give it.

This is the same mechanism behind why a junior engineer who has been told “if you are not sure, flag it” outperforms one who has been told only “deliver by Friday.” The instruction set shapes the failure mode. A model with no permission to be uncertain will manufacture certainty. A model with explicit permission to flag gaps will flag them.

Where prompting alone is not enough

Prompt rules reduce hallucinations. They do not eliminate them, and they cannot manufacture knowledge the model does not have.

For technical mining work, the next layer of control is retrieval. The model should be working from your source documents, not from its training data. That is the architecture behind MineGPT and it is why purpose-built mining systems will increasingly outperform general-purpose chat for technical workflows. When the model is constrained to answer from a defined corpus of NI 43-101, SK-1300, and JORC reports, the entire class of “confident fabrication from memory” disappears. What remains is interpretive error, and that is what the ten rules above and the technical review stack are designed to catch.

Until that retrieval architecture is in place for the workflow in front of you, the prompt rules are the most reliable lever you have.

The review still happens

The closing point from where this started.

AI outputs go to review. So do engineer outputs, consultant outputs, and vendor outputs. The QA/QC stack does not disappear because the input is AI. It absorbs AI as another contributor — one that is fast, broadly capable, and prone to a specific class of error that prompting can substantially mitigate.

Refusing to engage with AI because it makes mistakes, while operating inside workflows specifically designed to catch human mistakes, is a position that gets harder to defend every quarter.

The companies that build a working discipline around AI prompting will not replace their engineers. They will produce more, faster, with the same review rigour the industry already runs. That is the only standard that matters.

One closing move

If you are starting today, paste the production-ready block into your next prompt and watch what changes in the first three outputs. That is where the discipline begins. The rest is iteration.

Author

Francisca Lombard — Founder, LOMexcel · Mining Consulting and AI Advisory.

Strategic Engagement

LOMexcel runs working sessions and prompt-design clinics for mining teams that want AI outputs to survive technical review the first time, not the third. Book a service.

The Hallucination Control Playbook.