Article 12 Logging Is a Build-Time Obligation, Not a Documentation One
Article 12 is an engineering obligation, not a filing one. The logs it requires are a property of the running code. If the code does not write them, they do not exist, and no document describing the logs that should have been written changes that. This is the compliance-automation thesis applied to its most concrete example: compliance is design work, not paperwork.
The deadline for high-risk systems already on the market is August 2, 2026. Retrofitting Article 12 into a system that was never designed for it takes a quarter, not a sprint. Teams treating logging as a documentation task in spring 2026 will discover the problem in summer 2026, with no time to fix it.
What Article 12 actually requires
Article 12(1) requires high-risk AI systems to “technically allow for the automatic recording of events (logs) over the lifetime of the system.” Three words do most of the work.
Automatic. The system records events itself. Logs that require a human to remember them, or that only exist when someone exports a report, are not Article 12 logs. The recording has to run whether anyone is watching or not.
Events. Not just errors. Not just outputs. The Act expects a record of what the system did, when, what it was given, what it produced, and how confident it was. The breadth of “events” is what most teams underestimate.
Over the lifetime. From placing on the market until withdrawal, continuously. This is not a debug facility you enable for audits.
Article 12(2) requires “a level of traceability of the AI system’s functioning that is appropriate to the intended purpose.” That is a design standard, not a template. A log line that says inference complete is not traceability.
Article 12(3) adds specific content requirements: the period of each use, the reference database against which inputs were checked (where relevant), the input data, and the natural persons involved in verifying results. For systems under Annex III point 1(a) (remote biometric identification), the requirements are tighter still.
None of this can be added by writing a policy about it. It has to be in the code.
Why you cannot paper this over
A logging policy describes the logs you intend the system to produce. Article 12 requires the logs themselves. An auditor who reads your policy and then asks for a week of production logs is performing the most basic verification: do the two match? If the policy says the system records confidence scores and the log rows contain no confidence field, you have a finding.
This is how post-market monitoring under Article 72, serious incident investigations under Article 73, and market surveillance under Article 74 will all work. They consume logs. Thin logs, thin investigations.
Writing a page that says “we log inputs, outputs, confidence scores, overrides, and anomalies” takes twenty minutes. Instrumenting an inference service to capture all of that reliably, persist it durably, and keep it retrievable for six months takes a quarter. Teams that have written the page and not done the project are walking into 2026 convinced they are ready.
What “events” actually covers
The regulation does not enumerate every event type; the appropriate set depends on the system. But the practical floor looks like this:
Inputs. A reference to the input data, respecting data protection law. For many systems you will want a hash plus structured metadata rather than the input itself — six months of raw inputs will violate data-minimisation norms under the GDPR. The GDPR–AI Act overlap is a live design constraint here.
Outputs and decisions. What the system produced, in a reconstructable form. For a classifier, the class and the probability vector. For a generative system, the output text or a content-addressed reference to it. For a scoring system, the score and thresholds applied.
Confidence levels and probability scores. The numerical values actually used to make or inform a decision — not the user-facing summary.
Anomalies, errors, and unexpected behaviour. Anything that would be “warn” or “error” in your observability stack, plus model-level anomalies: inputs outside the training distribution, outputs that tripped safety filters, fallbacks to defaults.
Start and end of each period of use. Required for Article 12(3), and for reconstructing who was using the system when.
Human oversight actions. Overrides, interventions, escalations, approvals. The field that connects Article 12 to Article 14, and the one most often missing.
Each is a decision in the code — schema, storage, retention, redaction. None appear because you wrote a policy.
The Article 14 connection
Article 14 requires high-risk systems to be designed so natural persons can effectively oversee them. Effective oversight depends on logs. An overseer who cannot see why the system produced a given output, what confidence it had, or what the alternatives were, cannot oversee it. An overseer whose interventions are not recorded cannot prove they intervened, or be audited on whether they intervened correctly.
The Article 12 log schema has to be designed together with the Article 14 oversight interface. The fields the overseer needs are the fields the system has to log. If oversight is designed late, the log schema will not contain the data the overseer needs, and the interface will be built around the limitations of the logs — a common and avoidable failure mode.
Article 19 and what retention implies
Article 19 requires providers of high-risk AI systems to keep Article 12(1) logs for at least six months — longer where Union or national law requires it. Six months is the floor, not the target.
Retention cannot be added after the fact. Large log volume plus expensive storage makes six months a capacity-planning decision. Logs containing personal data make it a lawful-basis question under the GDPR, and the architecture has to support minimisation and subject-access rights without losing the audit trail.
Teams routinely discover, at six months and one day, that they cannot produce the logs for month one because the log aggregation service had a 30-day retention default nobody reconfigured. The policy said six months. The infrastructure said thirty days. The infrastructure won.
Retention also implies immutability. Logs that can be silently edited are not evidence. Serious implementations have tamper-evident storage, write-once retention, and access control separating operators from auditors.
What build-time actually looks like
None of this is exotic engineering. It is the standard observability stack applied with Article 12’s event types and Article 19’s retention. A defensible implementation has most of the following:
Structured logging at the inference boundary. Every call into the model is wrapped in a layer emitting a structured record with a stable, validated schema. Not printf statements. Not whatever the framework happened to log.
An event taxonomy versioned with the system. Event types are documented, versioned, and tested. New ones are reviewed like schema migrations.
Persistent, append-only storage. Logs flow into a store guaranteeing durability, ordering, and tamper-evidence for the retention period. The technology matters less than the guarantees.
Retention aligned to Article 19. Six months of queryable logs, with a plan for what happens after (archive, redact, delete). Configured as infrastructure, not as an operational practice.
Access control separating operators from auditors. People who can write to the logs cannot silently edit them. Auditors can read them without running SQL on the production database.
A redaction layer for personal data. Logs support subject-access and erasure requests under the GDPR without losing the Article 12 audit trail.
Tests that verify logging. Unit tests asserting every inference path emits the expected event. End-to-end tests exercising the oversight flow and confirming the override was captured. This is where “automatic” becomes enforceable internally.
A rehearsed runbook for authority requests. A documented process for producing a specified date range of logs in a specified format. If you have never done it, you cannot do it under the time pressure of an Article 73 investigation.
Each item is a task on an engineering backlog. Each takes real time. This is the work a workflow tool does not do for you.
Common traps
Logs that exist but are unreadable. Free-text lines in a pile of files with no schema, no index, no search. An inspector asking for “all decisions for this user in the last three months” will not accept “we have the files.”
Logs with gaps. Retries not logged. Fallbacks not logged. The cache hit path skips logging because it was optimised. The oversight override captured in the UI but not persisted. Every gap is an audit finding.
Logs deleted by mistake. A migration drops the log table. A rotation policy truncates at 30 days because the default never got updated. A GDPR deletion job erases the log record along with the personal data it referenced. Retention is a property you have to test.
Logs that contain data they should not. Raw inputs with special-category personal data under GDPR Article 9. Full user text for a sensitive application. Six months of this is a data-protection incident waiting to be discovered.
Logs that work in staging and fail in production. Enabled in development, disabled by a feature flag in production, and nobody notices for a year. Or enabled, but the log-shipping container keeps crashing and backfilling never gets reconciled. Observability of the logging itself is part of the obligation.
What a policy document can legitimately do
This is not an argument against documentation, but against documentation as a substitute for engineering. A good logging policy describes the event schema, retention, redaction rules, access control, and production process for authority requests, and references the design documents and tests that prove the system implements them. It is short, because it describes a concrete thing rather than inventing one. A policy that stands alone, with no artefact behind it, is a liability — under Article 16 for providers and Article 26 for deployers, documented commitments the system fails to honour are not harmless.
The take away
Article 12 is enforced by reading logs, not policies. The gap between a logging policy and a logging implementation is the difference between a paragraph and a quarter of engineering work, and an auditor can close it in a single query. If the logs are not being written by the production system, they are not being written at all. A policy that claims otherwise is a record of the discrepancy the auditor will find.