"Ghost farmers" — people listed as beneficiaries who never received the seedlings, or who were double-enrolled across districts — are the single biggest reputational risk for institutional funders of agroforestry programmes. When EU and World Bank auditors ask "where did the money go?" what they're really asking is "did the people you said exist, actually exist, and did they get what you said they got?"
Most distribution programmes can't answer that question. Records sit in parish books, beneficiary lists are entered into Excel after the fact, and there's no cross-referencing across programmes that operate in the same district. Nurseryz.io's ghost-farmer detector closes that gap with five rules that run automatically every hour.
The five detection rules
Each rule is implemented as a separate Rule class under App\Services\Ghost\Rules\, so we can tune thresholds per programme without touching the detector orchestrator. The five rules in production:
| Rule | Signal | Severity |
|---|---|---|
| Duplicate phone | Same phone number on ≥3 distributions in one programme | Medium |
| Duplicate national ID | Same NID across two or more distributions | Critical |
| Uncontacted >60 days | Distribution with no survival report after 60 days | Medium |
| Calendar anomaly | Distribution date outside the programme's start/end window | Critical |
| Suspicious concentration | 3+ distributions per farmer in the same programme | Medium |
The two critical rules — duplicate national ID and calendar anomaly — block resolution by programme managers. Only super-admin can dismiss them, because they're the rules that most directly indicate either fraud or data corruption that an external auditor would flag.
False positives, and what we do about them
Across the first 200 flags raised by the live detector, roughly 8% were genuine fraud, 24% were legitimate (e.g. the same farmer enrolled in two programmes through different cooperatives), and 68% were data-entry errors — phone numbers transposed, NIDs duplicated due to copy-paste, dates entered in the wrong format.
That last bucket is the most valuable. Catching data-entry errors before they reach the donor PDF is the entire point. If a programme has a 6% data-error rate on enrolment, that's the difference between a donor signing off on the report and an audit team flagging it back six months later.
The resolution flow
Every flag has four states: open, verified (confirmed fraud, escalated), resolved (fixed and closed), or false positive (rule fired but data is legitimate). Each state change requires a resolution note that lands on the audit trail and surfaces in the donor PDF.
- Programme managers can resolve the three medium-severity rules.
- Super-admin is required for critical rules — duplicate NID and calendar anomaly.
- Funders see the entire queue read-only, including resolution notes.
The complete audit chain — when the flag was raised, what rule fired, what evidence it was based on, who resolved it, and what note they left — is queryable per programme and exportable to CSV for donor compliance.
Calibration per programme
The thresholds in the table above are defaults. Each programme can tune them up or down via the super-admin Programme view — for example, a Lamwo District programme during the long rainy season might want the "uncontacted >60 days" threshold relaxed to 90 days, because monsoon access disruptions can legitimately delay first survival reports. Calibration changes are themselves logged.
Detection runs on a 60-minute cron schedule plus a "Re-scan now" button on the funder Ghost Flags page. Re-runs are idempotent — the same (programme, rule_type, farmer_id, distribution_id) tuple never produces duplicate flags.
What this isn't
The detector doesn't determine intent. It surfaces data inconsistency for human review. It's not an AI; it doesn't predict; it doesn't score farmers. It runs five deterministic rules and produces a queue of records that humans then triage.
That's by design. Institutional buyers — the EU's Green Deal partnerships, the World Bank's Agri-Industrialisation framework — explicitly require that automated systems making accountability decisions be auditable and explainable. Rule-based detection is auditable; ML scoring is not, at least not yet at the bar these partners require.