Definition
For each person, the Ethnicity Model generates a probability score for each of five ethnicity categories: AAPI (Asian American and Pacific Islander), Black, Hispanic, Native American, and White. Based on these scores, the model then assigns a predicted label corresponding to the category with the highest probability.
Technical Details
The model was trained on a sample of 3M ethnicity labels derived from county-reported ethnicity descriptions available in select states where voters self-report race/ethnicity information during voter registration. These labels classify voters into five categories: Asian American and Pacific Islander (AAPI), Black, Hispanic, Native American, and White.
The training dataset employed stratified sampling with oversampling of non-white voters to ensure more balanced representation across all five ethnic categories during model training. Compared to the estimated registered voter population, the training sample undersamples the White category by 12 percentage points and oversamples the AAPI, Black, and Hispanic categories by 3.7, 5.6, and 2.7 percentage points, respectively. This sampling strategy significantly improved prediction accuracy for non-White ethnic categories while maintaining strong performance for White voters, as confirmed by the model's performance on the held-out validation set.
The model was trained using a neural network architecture which learns ethnicity patterns from voters' first, middle, and last names, and a series of demographic features drawn from the Atlas by Murmuration dataset (age, gender, voter status, party affiliation, education, income, etc.), American Community Survey (ACS) census data at the block group level (racial composition, education levels, household income, employment rates, etc.), and PL94-171 census data at the block level (racial/ethnic proportions from the 2020 Census).
The model outputs six pieces of information for each voter: five probability scores (0-100 scale) indicating the likelihood that an individual belongs to each ethnic category (AAPI, Black, Hispanic, Native American, White), and a single predicted ethnicity label corresponding to the highest probability. For voters in states where self-reported ethnicity information is available through voter registration records, we use the reported ethnicity label directly, setting the corresponding probability score to 100 and all other probabilities to 0.
We validated the model's accuracy using a held-out dataset of 35,959 respondents from historic Murmuration surveys whose data was not used during model development with the proportions: 59% White, 17% Black, 15% Hispanic, 9% AAPI, and 1% Native American. The model achieved 88.5% accuracy in correctly predicting self-reported ethnicity on this holdout set.
When we evaluate how well a model works, we often look at how much better it is than random chance for the top 20% of people it ranks. But this approach can be misleading for smaller demographic groups, simply because they make up a smaller share of the overall population (for more details, please refer to the Technical Appendix). A more appropriate metric for smaller demographic groups is to examine precision at the population prevalence level:
- AAPI (9% base rate): Among voters in the top 9% of AAPI probability scores, 85.8% are actually AAPI–representing 10.0× the random baseline
- Black (17% base rate): Among voters in the top 17% of Black probability scores, 88.3% are actually Black–representing 5.3× the random baseline
- Hispanic (15% base rate): Among voters in the top 15% of Hispanic probability scores, 81.3% are actually Hispanic–representing 5.4× the random baseline
- Among voters in the top 1% of Native American probability scores, 21.6% are actually Native American–representing 21.7× the random baseline
- White (59% base rate): Among voters in the top 59% of White probability scores, 91.7% are actually White–representing 1.6× the random baseline
This approach ensures fair comparison across groups regardless of their population size, showing how well the model concentrates each group in its highest-scoring segment of equivalent size.
These results demonstrate strong discriminative ability across all ethnic categories, with particularly impressive concentration for underrepresented groups relative to their baseline prevalence.
Partner Guidance
The ethnicity model provides both a predicted ethnicity label (AAPI, Black, Hispanic, Native American, or White) and five probability scores (0-100) showing confidence in each category. In states where voters self-report ethnicity during registration, we use that official data. For all other voters, these are model predictions. We recommend using:
- the ethnicity label for demographic reporting, ensuring representative outreach across communities, or when you need a single classification for each voter
- the probability scores when precision matters–targeting specific communities for culturally relevant messaging, or finding voters likely to belong to multiple communities
Use Cases
-
Culturally Targeted Campaign Outreach: Reach voters from specific ethnic communities with culturally relevant messaging, multilingual materials, or community-specific issue priorities. Set a high probability threshold (80+) for the target ethnicity to focus on voters most likely to be members of that community, then layer with other scores based on your campaign goal, for example:
- GOTV: Hispanic probability >80 and Turnout Score 40-70 to target Hispanic voters who vote sometimes and need a turnout nudge
- Persuasion: Black probability >80 and Turnout Score >70 to target Black voters who are likely to vote and may be persuadable on issues or candidates (partners can layer with any issue scores, e.g., Education Voter Score 70+ for education-focused messaging, to further refine your targeting within each community)
- Coalition Building and Endorsement Strategy: Identify neighborhoods or precincts with high concentrations of specific ethnic communities for coalition outreach or endorsement events. Aggregate probability scores at the precinct level to find areas where 40%+ of voters likely belong to a particular ethnic group. Example: Find precincts with Hispanic probability >60, then layer with high Education Voter Score (70+) to identify Hispanic-majority areas where education issues resonate strongly.
-
Representative Sampling for Focus Groups and Ensuring Inclusive Outreach:
- Build a demographically representative sample that reflects your district's diversity. Use the ethnicity labels to stratify your sample proportionally across all five groups, or weight by probability scores if you want to account for prediction uncertainty. Example: In a district that is 15% Black, 10% Hispanic, and 70% White, select voters proportionally from each label category to ensure your focus group reflects the community.
- Review your contact lists to ensure you're not inadvertently excluding communities from your campaign. Use ethnicity labels to audit your volunteer lists, door-knocking universes, or digital ad targeting to confirm you're reaching all communities proportionally. If you're significantly under-reaching any group, expand your targeting to include more voters from that category.
Targeting Guidelines
When targeting specific ethnic communities, select probability score thresholds based on your desired precision level using the tables below. The tables show the expected precision (percentage of voters who actually belong to the target ethnicity) at different probability score ranges for each ethnic group. For example, for voters with an AAPI ethnicity probability of 90 or above, 92% actually self-identify as APPI. These precision estimates are based on a national validation sample; actual performance may vary by state or region due to differences in demographic composition and data availability.
| AAPI | ||
| score range | count | precision |
| <50 | 32805 | 1% |
| 50-69 | 253 | 44% |
| 70-89 | 402 | 64% |
| 90+ | 2499 | 92% |
| Black | ||
| score range | count | precision |
| <50 | 29896 | 2% |
| 50-69 | 474 | 51% |
| 70-89 | 794 | 71% |
| 90+ | 4795 | 94% |
| Hispanic | ||
| score range | count | precision |
| <50 | 30161 | 3% |
| 50-69 | 503 | 53% |
| 70-89 | 1186 | 71% |
| 90+ | 4109 | 85% |
| Native American | ||
| score range | count | precision |
| <50 | 35825 | 1% |
| 50-69 | 41 | 15% |
| 70-89 | 20 | 20% |
| 90+ | 73 | 78% |
| White | ||
| score range | count | precision |
| <50 | 15793 | 14% |
| 50-69 | 1186 | 71% |
| 70-89 | 2700 | 86% |
| 90+ | 16280 | 97% |
Targeting Tables
We report targeting tables indicating what score threshold to set to target the top X% of the population. Please note that for ethnic groups with lower population prevalence (AAPI, Native American, Hispanic, and Black), the score thresholds for broader targeting tiers (e.g., top 30-50%) will be relatively low. This is expected: when a group represents 5% of the population, targeting the top 10% of scores will necessarily include voters with moderate probabilities alongside the highest-confidence predictions. For precision targeting of these groups, we recommend using the confidence-based thresholds shown in the previous tables rather than percentage-of-population cutoffs.
The tables below show the score values associated with each decile for each of the five specific ethnicity scores, to help you more easily target. Note: these score cutoffs will be different in your local districts, particularly given the specific ethnic composition of the areas you are working.
Targeting Table - AAPI registered voters
| To target the top... | Set the minimum score value as... |
| 10% | 9 |
| 20% | 3 |
| 30% | 1 |
| 40% | 1 |
| 50% | 0 |
| 60% | 0 |
| 70% | 0 |
| 80% | 0 |
| 90% | 0 |
| 100% | 0 |
Targeting Table - Black Registered Voters
| To target the top... | Set the minimum score value as... |
| 10% | 84 |
| 20% | 9 |
| 30% | 2 |
| 40% | 1 |
| 50% | 0 |
| 60% | 0 |
| 70% | 0 |
| 80% | 0 |
| 90% | 0 |
| 100% | 0 |
Targeting Table - White Registered Voters
| To target the top... | Set the minimum score value as... |
| 10% | 100 |
| 20% | 99 |
| 30% | 98 |
| 40% | 96 |
| 50% | 93 |
| 60% | 81 |
| 70% | 29 |
| 80% | 3 |
| 90% | 0 |
| 100% | 0 |
Targeting Table - Native American Registered Voters
| To target the top... | Set the minimum score value as... |
| 10% | 3 |
| 20% | 2 |
| 30% | 1 |
| 40% | 1 |
| 50% | 1 |
| 60% | 0 |
| 70% | 0 |
| 80% | 0 |
| 90% | 0 |
| 100% | 0 |
Targeting Table - Hispanic Registered Voters
| To target the top... | Set the minimum score value as... |
| 10% | 83 |
| 20% | 3 |
| 30% | 1 |
| 40% | 0 |
| 50% | 0 |
| 60% | 0 |
| 70% | 0 |
| 80% | 0 |
| 90% | 0 |
| 100% | 0 |