Methodology

This dashboard is a statistical screening tool, not a finding of wrongdoing. Each provider is scored against five signals identified during a working session with Secretary Allison Bragg, AR Department of Inspector General, on May 5, 2026.

Data source

Provider records are pulled directly from the public AR DHS Childcare Licensing portal. We capture name, address, primary contact, capacity, complaint count, corrective action count, and license status for every active center. Data is publicly licensed and refreshed via a scripted scrape.

The five signals

1. Street view AI

up to 35 points

We pull a Google Street View image of each licensed address and ask Gemini 3 Flash to classify the visible structure into ten categories. A facility classified as a residential house, abandoned/vacant building, industrial warehouse, or vacant lot — with confidence above 0.6 — flags. School-based and church-based programs are detected via license subtype + name regex and discounted 80% on this signal because mistaken classifications (a school playing field reading as 'vacant lot') are common.

2. Multi-facility contact clustering

up to 22 points

We group providers by normalized primary contact (name plus phone). A contact tied to two or more facilities flags every facility in the cluster, with weight scaling by cluster size. School-district / Head Start / Boys & Girls Club coordinators legitimately run dozens of branches under one name and are discounted sharply.

3. Capacity vs. town population

up to 18 points

A facility's licensed capacity divided by the population of its town (Census ACS 2022 place-level estimates) is compared against the statewide distribution. Facilities above the 95th percentile flag (the AR statewide p95 is 6.55%). The canonical pattern is the Arkansas case Secretary Bragg prosecuted: a facility in Marianna (population ~3,800) claiming 150 children daily.

4. Repeat contact across multiple towns

up to 12 points

Where a contact's cluster spans two or more distinct towns — especially small towns — we add additional weight on top of the basic clustering signal. This matches the typical fraud workflow described by Bragg, where one operator opens fake facilities in adjacent towns.

5. Facebook presence

up to 13 points

For each provider, we run a Google search for `"facility name" "city, AR" site:facebook.com` (Brave fallback), score top results by token-set similarity to the licensed name, and visit www.facebook.com to read follower count and the "talking about this" (FB's 7-day rolling activity proxy). Outcomes: not found (no plausible page) scores the full 13 pts; stale (likes ≥ 50 but 0 talking, or talking-about-this <0.5% of likes) scores 11 pts; tiny (<50 followers for a center with capacity 30+) scores 7 pts; weak match scores 5 pts; active or uncheckable score 0. School / Head Start / church-operated providers are discounted 50% on this signal. Coverage caveat: search engines aggressively rate-limit automated checking; this build covers a representative 32-provider sample. Remaining providers display 'Not yet checked' and are not penalized on this signal.

Limitations

Street View imagery may be outdated, occluded, or unavailable. Where Google has no image, the provider is flagged for manual review but receives no automated score.
Geocoding can place rural addresses imprecisely. We use US Census Geocoder as the primary source and Google Geocoding as a fallback.
The score is intended to surface facilities for additional review, not as evidence. Many flagged facilities will turn out to be entirely legitimate.
In-home (family) daycares are licensed at residential addresses by design. The Street View signal applies only to center-type licenses unless other signals also flag.
The Facebook check uses unauthenticated scraping. Search engines (Google, Brave) rate-limit automated requests aggressively, capping each session at roughly 30 queries before requiring a cooldown. This build's Facebook signal is therefore applied to a 32-provider sample; the rest of the catalog displays "Not yet checked" and is not penalized. Achieving full 1,845-provider coverage requires either the paid Google Custom Search API (~$9 one-time) or a residential proxy pool to rotate IPs, both of which are out of scope for this MVP.

What is not (yet) included

Future signals from the same conversation: complaint-history NLP, corrective-action recidivism, and phone-based enrollment verification (calling each facility to ask whether spots are available next semester). Those require a Twilio/Vapi integration and additional review workflow and are tracked separately.