Garbage In, Garbage Out: Why Data Quality Is the Biggest Unsolved Problem in AML
Institutions spend millions on detection technology and comparatively little on the data infrastructure that determines whether that technology can perform. The hierarchy needs to be inverted.
Priya Naidoo
Senior Data Scientist
The Unsexy Problem
Data quality is not a glamorous topic. It does not appear on conference keynote slides. It is not the subject of vendor press releases. And yet, in our experience working with financial institutions across multiple continents, it is consistently the single largest determinant of whether a financial crime intelligence programme succeeds or fails.
The pattern is depressingly familiar: an institution invests in a sophisticated detection platform, deploys it against their transaction data, and discovers that the results are far below expectations. The platform is blamed. The vendor is blamed. Sometimes both are replaced.
In the majority of cases, the underlying issue is data quality.
What Bad Data Looks Like in AML
Data quality problems in financial crime detection manifest in several ways:
Missing or incorrect customer data: If a customer's date of birth, address, or identity information is missing or incorrect, risk scoring and KYC processes are fundamentally compromised. Network analysis that depends on linking related parties requires accurate data about those parties.
Inconsistent transaction categorisation: Many institutions have multiple core banking systems with different transaction coding conventions. Without harmonisation, the same type of transaction may appear differently across systems, creating blind spots in behavioural analysis.
Stale entity data: Customer risk profiles based on data that was accurate at account opening but has not been updated are a liability, not an asset. Periodic review cycles are insufficient in high-risk customer segments.
Fragmented counterparty data: Most institutions have a clear view of their own customers. Far fewer have a consistent, structured view of counterparties — the external parties their customers transact with. This is a critical gap for network analysis.
The Data Governance Prerequisite
Before investing in detection technology, institutions should honestly assess their data governance maturity:
- Is customer data systematically captured, validated, and maintained?
- Are there documented standards for transaction data — mandatory fields, coding conventions, completeness requirements?
- Is there a process for identifying and remediating data quality exceptions?
- Are data lineage and provenance documented, such that data can be traced from its source to its use in compliance decisions?
Without affirmative answers to these questions, detection technology investment is premature.
How Intellidata Approaches This
We are not neutral on this topic. We have had enough conversations with institutions operating Themis to know that data quality is where programmes live or die.
Our implementation methodology includes a structured data quality assessment phase before deployment. We assess completeness, consistency, and currency across all data sources that will feed Themis. Where gaps are identified, we work with institutions to define remediation plans — because deploying sophisticated detection technology against poor data produces sophisticated-looking poor results.
This is not an obstacle to implementation. It is the work. And institutions that invest in data quality before detection technology consistently outperform those that do not.