AML Risk Scoring Engines: How Automated STR Recommendations Work

April 18, 2026

Every compliance officer in a busy East African bank knows the feeling of opening the Monday morning transaction monitoring queue. Hundreds of alerts. Maybe thousands during a high-volume period around month-end, public holidays, or during tax season when cash movements spike. Somewhere in that queue are the two or three genuinely suspicious cases that deserve a Suspicious Transaction Report to the Financial Intelligence Unit. The rest are noise — legitimate transactions that happened to trigger a rule threshold.

The false positive rate in conventional transaction monitoring systems runs between 85 and 95 percent. That means for every hundred alerts a compliance team investigates, 85 to 95 of them will be documented as legitimate business activity and closed without further action. The remaining 3 to 15 percent may warrant an STR. This is the alert fatigue problem, and it has real consequences: compliance teams burn their analytical capacity on noise, genuine suspicious activity takes longer to identify, and the quality of STR submissions suffers because officers are exhausted.

Risk scoring engines exist to invert this equation. Instead of treating every alert as equally deserving of manual investigation, a risk scoring engine assigns each alert a quantified risk score based on multiple factors, filters out low-risk noise algorithmically, and directs human attention only to the alerts that the evidence suggests genuinely warrant it. Done correctly, this reduces the false positive investigation burden by 50 to 70 percent while improving the identification rate for true positives.

This article explains how AML risk scoring engines work, how they drive STR recommendations, and what it takes to calibrate them for the East African financial environment.

The Alert Fatigue Problem in AML

Typical False Positive Rates in Transaction Monitoring

The 85–95 percent false positive rate in rule-based transaction monitoring is not an anomaly — it is the expected outcome of how threshold-based monitoring works. A rule that flags all cash transactions above, say, USD 7,500 equivalent will catch every instance of genuine structuring below Kenya's USD 15,000 CTR threshold, but it will also flag every legitimate large cash withdrawal by a business owner, every real estate agent receiving a deposit, and every market trader who happens to deal primarily in cash.

Rules are designed to be over-inclusive because the cost of a missed true positive (failing to file an STR for genuine money laundering) is considered worse than the cost of investigating a false positive. The result is a system where the analytical burden is systematically front-loaded onto human investigators, who must rule out legitimate activity for the vast majority of alerts.

For a compliance team of three officers managing a mid-sized Kenyan bank, processing 500 alerts per month at 30 minutes per alert consumes 250 hours of combined investigative time. At an 8 percent true positive rate, 40 of those investigations result in STR consideration. The remaining 460 hours are spent documenting that normal business occurred.

Cost of Investigating Every Alert Manually

The economic cost of manual alert investigation in East Africa is substantial. A qualified AML analyst with certification (CAMS or equivalent) commands a salary of KES 150,000 to 250,000 per month in Nairobi. Including benefits, office cost, and management overhead, the fully burdened hourly cost of an AML investigation is approximately $18 to $28.

At 250 hours per month spent on false positive investigations, the direct wasted cost is $4,500 to $7,000 monthly — $54,000 to $84,000 annually — simply to arrive at the conclusion that most transactions are legitimate. This does not account for the opportunity cost of the genuine risk analysis work that was not done because staff were occupied with noise.

How Regulators View Alert Management

FATF Recommendation 20 requires financial institutions to file STRs when they suspect money laundering or terrorist financing, but it also requires that institutions have systematic processes for identifying those suspicions. Regulators increasingly examine not just whether institutions are filing STRs, but whether their alert management process is rational, risk-based, and capable of identifying the alerts that matter.

An institution that generates 10,000 alerts and investigates all of them without prioritisation is not demonstrating risk-based compliance — it is demonstrating that it lacks the analytical sophistication to distinguish high-risk from low-risk. FATF Recommendation 1 requires a risk-based approach; a risk scoring engine is the technological implementation of that requirement in transaction monitoring.

What Is an AML Risk Scoring Engine?

Definition: Quantified Risk Assessment for Transactions and Cases

An AML risk scoring engine is a system that takes a set of input factors — characteristics of a transaction, the customer behind it, their behavioural history, geographic context, and relevant typology indicators — and produces a numerical risk score that represents the probability that the transaction or case warrants compliance action.

Unlike a binary rule (triggered / not triggered), a risk score is continuous. A transaction scoring 82 out of 100 is being flagged as almost certainly warranting investigation. A transaction scoring 31 is almost certainly legitimate but is being logged for completeness. A transaction scoring 56 sits in the grey zone where human judgement adds the most value.

Difference Between Transaction Monitoring Rules and Risk Scoring Engines

Transaction monitoring rules ask binary questions: "Did this transaction exceed KES 500,000?" "Did this customer make more than 10 transactions today?" "Did this account receive funds from a high-risk jurisdiction?" Each question produces a yes or no. When the answer is yes, an alert is generated.

A risk scoring engine asks: "Given everything we know about this transaction, this customer, this channel, this timing, and this pattern, how suspicious is this?" It aggregates multiple weak signals into a single quantified assessment. A transaction that barely exceeds a threshold in a normal customer context scores low. The same amount, from an account opened three weeks ago, transacted through an agent in a rural area at an unusual time, with a pattern that resembles structuring, scores much higher.

The power of risk scoring is that it can identify suspicious patterns that individually would not trigger any single rule — but collectively constitute a strong signal.

Input Factors for Risk Scoring

Risk scoring engines draw on four categories of input data:

Transaction data: Amount, currency, type, channel, timing, counterparty, originating location, and frequency within the current period.

Customer profile data: Account age, KYC tier, occupation category, declared income or business revenue, historical transaction baseline, prior STR filings, PEP status, and adverse media flags.

Geographic data: Country risk classification for transaction counterparties, branch or agent location risk, and cross-border transaction patterns.

Behavioural analytics: Deviation from the customer's own historical transaction pattern (statistical anomaly detection), velocity changes, channel switching, and structuring pattern detection.

Weighted Risk Factor Architecture

Risk Factor Categories

A well-designed risk scoring engine organises its input factors into weighted categories, each representing a dimension of AML risk:

Transaction Risk covers the nature of the transaction itself: its amount relative to the customer's profile, the transaction type (cash is higher risk than electronic), and the channel used.

Customer Risk covers who is conducting the transaction: PEP status, high-risk industry classification (money service businesses, jewellers, car dealers), account age, and prior compliance history.

Geographic Risk covers where the money is coming from and going: FATF high-risk or grey-listed country involvement, high-risk corridor transactions, and agent location risk.

Behavioural Risk covers whether the pattern of activity is anomalous for this specific customer: is this a sudden spike in activity? A change in transaction type? A pattern that matches known typologies like structuring or layering?

How Weights Are Assigned and Calibrated

Weights are assigned based on the relative predictive power of each factor for the institution's specific customer base and business model. Initial weights are typically informed by FATF typology guidance, regional intelligence reports, and the institution's own historical STR data — which factors were consistently present in cases that resulted in STRs?

Calibration is an ongoing process. After initial deployment, the risk scoring engine should be evaluated quarterly: what percentage of high-score alerts resulted in STRs? What percentage of low-score alerts were later found to be suspicious? If high-score alerts are rarely genuine, the weights need adjusting downward for the dominant scoring factors. If low-score alerts are frequently later found suspicious, a factor is missing from the model.

Threshold Bands: ESCALATE / REVIEW / IGNORE

Score Band	Recommendation	Action Required
75–100	ESCALATE	Automatic STR recommendation; priority queue
50–74	REVIEW	Compliance officer investigation; human decision
25–49	MONITOR	Logged and monitored; low-priority queue
0–24	IGNORE	Auto-closed; audit record retained

These bands are configurable per institution and should be calibrated based on the institution's capacity, risk appetite, and historical false positive rates. An institution with a large, experienced compliance team may set a lower ESCALATE threshold to capture more potential cases. A smaller institution with limited capacity may set a higher threshold to focus only on the clearest cases.

Score Normalisation

All raw risk factor scores are normalised to a 0–100 scale before combination. This prevents any single factor from dominating the composite score simply because its raw value is on a different magnitude scale. The weighted combination of normalised factor scores produces the composite score, also on a 0–100 scale, which maps directly to the threshold bands above.

Risk Factor Scoring Table

Risk Factor	Weight	Scoring Logic
Transaction amount vs. customer profile	25%	Greater than 3x the customer's average transaction = high; 1–3x = medium; below average = low
Geographic risk (FATF high-risk country involvement)	20%	Any counterparty in FATF grey-listed or high-risk country = high; regional high-risk corridor = medium
Transaction frequency anomaly	20%	Greater than 2 standard deviations above 90-day baseline = high; 1–2 SDs = medium
Mobile money structuring pattern	15%	Detected fragmentation pattern below CTR threshold across multiple days = high
Customer PEP status	10%	Confirmed PEP = high; PEP-adjacent (family/associate) = medium
Account age	10%	Account less than 6 months old with high-value transactions = elevated

How Risk Scores Drive STR Recommendations

Score 75 or Above: ESCALATE

When a composite risk score reaches 75 or above, the engine generates an automatic STR recommendation. This does not mean an STR is automatically filed — it means a pre-populated STR case is created in the case management system, assigned to a compliance officer, and flagged as a priority investigation. The case contains all the supporting data that contributed to the score: the specific transactions, the customer profile, the factor scores, and suggested indicator codes from the FIU's typology library.

The compliance officer reviews the pre-populated case, adds context that the system cannot know (customer relationship background, recent interactions, supporting documents), makes the final determination on whether to file, and either approves the STR for submission or closes the case with documented reasoning.

Score 50–74: REVIEW

Cases in the REVIEW band are assigned to the compliance queue for human investigation, but at normal priority rather than elevated priority. These are the cases where the evidence is mixed and professional judgement adds the most value. The risk scoring report explains which factors contributed to the score and why, enabling the compliance officer to conduct a focused investigation on the highest-signal areas rather than a broad-based trawl through all available data.

Score Below 50: IGNORE or MONITOR

Low-scoring alerts are auto-resolved without manual investigation. An audit record is retained for every alert, including its score, the factors that drove it, and the auto-resolution decision. This audit trail is essential for two purposes: demonstrating to regulators that the institution has a systematic, documented process for alert disposition; and enabling retrospective analysis when a customer later becomes high-risk (the prior alerts provide context).

Human-in-the-Loop: Technology Recommends, Humans Decide

A well-designed risk scoring system is emphatically not an auto-filing system. No STR is ever submitted to the FIU without a qualified compliance officer reviewing the case and making the professional judgement that the transaction or pattern is suspicious. The technology narrows the field and prioritises the work. The human makes the compliance decision.

This distinction matters legally and practically. A compliance officer who signs off on an STR is attesting that they have reviewed the evidence and formed a professional belief that the activity warrants reporting. No automated system can substitute for that professional judgement. What automation can do is ensure that the officer's time is spent on the cases that genuinely need their attention — not on documenting that the market trader's monthly cash deposit is, in fact, their monthly cash deposit.

Calibrating Your Risk Engine for East African Typologies

Kenya-Specific High-Risk Factors

East Africa has typological patterns that differ meaningfully from those addressed by global transaction monitoring vendor rule sets. Key Kenya-specific risk factors that should be weighted explicitly in the scoring engine include:

Mobile money layering: A pattern of receiving multiple M-PESA payments from different senders below the KES 150,000 informal reporting threshold, aggregated to a total that significantly exceeds it, followed by bank deposit or RTGS transfer. This is the dominant structuring typology in Kenya and is systematically missed by systems that treat M-PESA and bank transactions as separate data streams.

Cross-border cash: High-volume cash transactions in border towns (Busia, Malaba, Namanga, Taveta) warrant elevated geographic risk scoring, particularly for customers without declared cross-border business activity.

Informal sector cash concentration: Kenyan SME customers in informal trade sectors legitimately transact in cash at volumes that would be flagged as suspicious in other contexts. The risk scoring engine must account for customer industry context — a grain trader in Eldoret with high cash turnover is different from a salaried employee in Nairobi with the same transaction pattern.

Country Profiles Configure Risk Weights Differently

A Zambian bank's risk scoring configuration must reflect Zambia-specific typologies — mining sector cash, cross-border flows with the DRC, and copper trade financing — with different weights than a Kenyan bank's configuration. Country profiles in the risk scoring engine allow these weights to be configured per country without requiring separate codebases.

Machine Learning vs. Rules-Based Scoring

Rules-based scoring engines, where each factor and weight is explicitly defined and documented, are the dominant approach in East African AML compliance for a compelling reason: explainability. When a regulator asks why a specific transaction was escalated as high-risk, a rules-based scoring system can produce a complete, auditable explanation: "The composite score was 82 due to: transaction amount 4.2x customer average (high, weight 25%), FATF grey-listed counterparty country (high, weight 20%), frequency 2.3 SDs above baseline (high, weight 20%)."

Machine learning approaches can improve predictive accuracy, particularly on complex pattern recognition, but they introduce model explainability challenges. If the model cannot explain its recommendation in terms a compliance officer and a regulator can evaluate, the recommendation is professionally and legally suspect. For institutions considering ML-enhanced scoring, explainable AI (XAI) techniques — particularly feature importance attribution — are mandatory, not optional.

Explainability — Showing Your Work to Regulators

Every Risk Score Must Be Auditable

Regulatory examiners conducting AML/CFT effectiveness assessments now routinely ask compliance officers to walk them through specific alert dispositions. "This customer made three transactions last month that were flagged as high-risk. Show me how you decided each one." The ability to produce a clear, documented explanation for every alert disposition — why it scored as it did, what the compliance officer found, what decision was made, and why — is a core competency of effective AML compliance.

Risk scoring engines that produce unexplained outputs or black-box scores cannot satisfy this requirement. Every score should be accompanied by a factor breakdown showing the contribution of each scoring element to the composite result, the raw data values that drove each factor score, and the recommendation that resulted.

Audit Trail Requirements

The audit trail for risk scoring and STR recommendation decisions must capture: the alert trigger event, the score calculated at that time (with factor breakdown), any score recalculations if new information arrived, the compliance officer who reviewed the case, the actions taken, the decision made, and the date and time of each step. This audit trail must be immutable — once written, it cannot be altered. It forms part of the institution's AML compliance evidence in regulatory examinations.

Model Risk Management for AI/ML

Institutions deploying statistical or ML-based risk scoring are subject to model risk management requirements under CBK guidance. This requires documented model governance including: model design documentation, validation testing results, ongoing performance monitoring, change management processes, and fallback procedures if the model fails. The compliance team and the risk management function must jointly own the model governance framework, not just the IT team that built it.

ROI of Risk Scoring Automation

Reduction in False Positives

Institutions deploying well-calibrated risk scoring engines consistently report reductions in the false positive investigation burden from the industry-wide 85–95 percent down to 40–50 percent. This does not mean the system generates fewer alerts — it means fewer alerts require manual investigation, because the scoring engine resolves the clearly low-risk cases automatically. The compliance team's investigation queue shrinks by 50 to 70 percent, while coverage of genuinely high-risk alerts improves.

Faster STR Triage

In a manual alert investigation process, the time from alert generation to STR submission decision ranges from several days to several weeks, depending on investigation complexity and compliance team capacity. With risk scoring and pre-populated case management, the triage step — "is this worth a full investigation?" — happens automatically. For high-score alerts that are clearly warranting investigation, the compliance officer begins with a structured case, pre-populated data, and a suggested indicator code set. The time from alert to STR draft drops from days to hours.

Compliance Officer Capacity

The combination of false positive reduction and faster triage effectively increases compliance team capacity without adding headcount. An institution with a team of three compliance officers handling 500 alerts per month at 85 percent false positive rate is spending most of their time on noise. The same team, with risk scoring reducing their investigation queue to 200 genuinely ambiguous cases per month, can conduct deeper investigations, produce higher-quality STR narratives, and build stronger relationships with the FIU — the activities that deliver genuine compliance value.

Take the Next Step

Creodata's goAML AML Reporting Platform includes a fully integrated risk scoring engine calibrated for East African typologies. Weighted scoring across transaction, customer, geographic, and behavioural risk factors drives automatic STR recommendations with complete audit trails and factor-level explainability — giving your compliance team the intelligence to focus on what matters.

See the risk scoring engine in a live demonstration: Request a Demo at creodata.com/demo