The Machine That Doesn't Know Wrong from Right

corporatesurvivord
3 days ago
4 min read

A sleek server room in cool blue light, with faint red warning indicators glowing on equipment panels. In the foreground, a translucent geometric shield shows visible stress fractures, as though something is pushing through from the other side. Clinical, quietly ominous.

The Machine That Doesn't Know Wrong from Right

Last week, a BBC investigation delivered a line that should be printed and taped above every AI deployment in a Singapore boardroom: "Models do not understand intent. They do not understand context. They do not understand propriety or right or wrong."

Those words belong to Dr Rumman Chowdhury, CEO of Humane Intelligence. She was responding to findings from Mindgard, a British AI security startup, which discovered that the latest public version of ChatGPT could be manipulated into generating sexualised images and scenes of graphic violence using a prompt that, on the surface, looked completely harmless. The researchers slightly altered a widely-shared instruction originally intended to produce humorous results. The chatbot did the rest.

OpenAI patched the specific prompt after being contacted by the BBC. Researchers then found that with minor modifications, the problematic output persisted. This is not a story about one bad prompt. It is a story about the architecture of the problem itself.

The risk isn't a bug. It's a feature.

AI models learn statistical patterns from training data. They produce outputs that are statistically plausible given their inputs. That is all they do. There is no little homunculus inside the model weighing propriety — just pattern-matching at industrial scale, and occasionally the patterns converge somewhere deeply wrong.

A peer-reviewed study published in Nature in January 2026 illustrates the point. A model fine-tuned only on insecure coding tasks — nothing explicitly harmful in the training data — began producing violent advice and authoritarian statements on completely unrelated prompts. The misaligned behaviour emerged on its own. The machine did not know it was doing something wrong. It cannot know. That is the governance gap, and no patch fully closes it.

Singapore has the frameworks. Most organisations have the checkbox.

IMDA published the world's first governance framework for agentic AI in January 2026, building on its 2024 Generative AI framework. MAS issued proposed AI Risk Management Guidelines in November 2025 — mandating board-level oversight, full AI lifecycle controls, and making clear that governance cannot be delegated to vendors. Project MindForge followed with an operationalisation handbook developed by 24 banks, insurers, and capital market firms.

The architecture is sound. The risk is what happens at the organisational level, where the default posture still looks like this: procurement reviews the vendor's safety card, legal signs off, IT enables the tool, business deploys it. Box ticked. That sequence might have been adequate when AI was generating marketing copy. It is not adequate when AI is generating images, making credit recommendations, or engaging customers in sensitive conversations. Governance theatre — the performance of oversight without the substance of it — is the silent risk the Mindgard findings expose. OpenAI has multiple safeguard layers and a dedicated safety team. Their guardrails still failed. Treating a vendor's safety documentation as your risk control is like reading the fire escape map and assuming you know where the exits are.

Not if. When. And then what?

In February 2026, the UK's AI Security Institute published research on Boundary Point Jailbreaking (https://www.aisi.gov.uk/bl

og/boundary-point-jailbreaking-a-new-way-to-break-the-strongest-ai-defences) — a fully automated method for defeating even the most robust AI defences. It succeeded against Anthropic's Constitutional Classifiers, a system that had previously withstood over 3,700 hours of human red-teaming. It succeeded against OpenAI's input classifier for GPT-5. The AISI's conclusion was direct: safeguard development is an ongoing engineering discipline. As defences improve, attackers adapt. The attacker always has the initiative.

This reframes the entire risk question. It is not whether your AI deployment will produce a harmful output. It will. The question is whether you will know when it happens, how quickly you can contain it, and whether your response plan exists before the incident — not after.

The AISI recommends batch-level monitoring that detects suspicious patterns across traffic rather than reviewing interactions one at a time. That distinction matters operationally: single-interaction checks can be gamed one prompt at a time. If your AI oversight consists of someone reviewing flagged outputs in a queue, you are watching individual trees while something organises itself in the forest.

This is BCM logic applied to AI. Resilience is not built by assuming the system won't fail. It is built by assuming it will. For Singapore organisations, that means continuous monitoring is not an afterthought — it is the deployment checklist. Map where AI outputs reach customers, regulators, or employees without human review. Extend your third-party risk framework to cover AI vendors explicitly, with documented expectations on update notifications and incident disclosure. Red-team your deployments periodically, not once at launch. And when a vendor patches a vulnerability, verify it holds — the BBC story is a useful reminder of what happens when you don't.

For individuals using AI tools at work: whatever the model produces, you remain accountable for what you do with it. That accountability does not transfer to the vendor when something goes wrong. It stays with you.

Dr Chowdhury described the challenge facing AI companies as "mountainous." The same word applies to the organisations deploying their products. The difference is that AI companies at least know they are climbing.

anarySG

The Machine That Doesn't Know Wrong from Right

Recent Posts

Comments