Methodology
This dashboard is a statistical screening tool, not a finding of wrongdoing. Each provider is scored against five signals identified during a working session with Secretary Allison Bragg, AR Department of Inspector General, on May 5, 2026.
Data source
Provider records are pulled directly from the public AR DHS Childcare Licensing portal. We capture name, address, primary contact, capacity, complaint count, corrective action count, and license status for every active center. Data is publicly licensed and refreshed via a scripted scrape.
The five signals
1. Street view AI
up to 35 pointsWe pull a Google Street View image of each licensed address and ask Gemini 3 Flash to classify the visible structure into ten categories. A facility classified as a residential house, abandoned/vacant building, industrial warehouse, or vacant lot — with confidence above 0.6 — flags. School-based and church-based programs are detected via license subtype + name regex and discounted 80% on this signal because mistaken classifications (a school playing field reading as 'vacant lot') are common.
2. Multi-facility contact clustering
up to 22 pointsWe group providers by normalized primary contact (name plus phone). A contact tied to two or more facilities flags every facility in the cluster, with weight scaling by cluster size. School-district / Head Start / Boys & Girls Club coordinators legitimately run dozens of branches under one name and are discounted sharply.
3. Capacity vs. town population
up to 18 pointsA facility's licensed capacity divided by the population of its town (Census ACS 2022 place-level estimates) is compared against the statewide distribution. Facilities above the 95th percentile flag (the AR statewide p95 is 6.55%). The canonical pattern is the Arkansas case Secretary Bragg prosecuted: a facility in Marianna (population ~3,800) claiming 150 children daily.
4. Repeat contact across multiple towns
up to 12 pointsWhere a contact's cluster spans two or more distinct towns — especially small towns — we add additional weight on top of the basic clustering signal. This matches the typical fraud workflow described by Bragg, where one operator opens fake facilities in adjacent towns.
5. Facebook presence
up to 13 pointsFor each provider, we run a Google search for `"facility name" "city, AR" site:facebook.com` (Brave fallback), score top results by token-set similarity to the licensed name, and visit www.facebook.com to read follower count and the "talking about this" (FB's 7-day rolling activity proxy). Outcomes: not found (no plausible page) scores the full 13 pts; stale (likes ≥ 50 but 0 talking, or talking-about-this <0.5% of likes) scores 11 pts; tiny (<50 followers for a center with capacity 30+) scores 7 pts; weak match scores 5 pts; active or uncheckable score 0. School / Head Start / church-operated providers are discounted 50% on this signal. Coverage caveat: search engines aggressively rate-limit automated checking; this build covers a representative 32-provider sample. Remaining providers display 'Not yet checked' and are not penalized on this signal.
Limitations
- Street View imagery may be outdated, occluded, or unavailable. Where Google has no image, the provider is flagged for manual review but receives no automated score.
- Geocoding can place rural addresses imprecisely. We use US Census Geocoder as the primary source and Google Geocoding as a fallback.
- The score is intended to surface facilities for additional review, not as evidence. Many flagged facilities will turn out to be entirely legitimate.
- In-home (family) daycares are licensed at residential addresses by design. The Street View signal applies only to center-type licenses unless other signals also flag.
- The Facebook check uses unauthenticated scraping. Search engines (Google, Brave) rate-limit automated requests aggressively, capping each session at roughly 30 queries before requiring a cooldown. This build's Facebook signal is therefore applied to a 32-provider sample; the rest of the catalog displays "Not yet checked" and is not penalized. Achieving full 1,845-provider coverage requires either the paid Google Custom Search API (~$9 one-time) or a residential proxy pool to rotate IPs, both of which are out of scope for this MVP.
What is not (yet) included
Future signals from the same conversation: complaint-history NLP, corrective-action recidivism, and phone-based enrollment verification (calling each facility to ask whether spots are available next semester). Those require a Twilio/Vapi integration and additional review workflow and are tracked separately.