Election Data FAQ
Immediately before and after elections there is intense competition to create and dispute narratives about why candidates and parties performed well or poorly or did or didn’t beat expectations. We strongly urge interested parties to take these immediate reactions with a grain of salt and to compare multiple data sources over time, with a particular emphasis on analysis conducted after election administrators publish vote history data in the weeks and months following an election.
To that end, we have compiled some brief explainers about what different types of election data can and can’t tell us, organized by when that data generally becomes available.
When is it available? Throughout the election season.
Where does it come from? Media outlets, academic researchers, polling firms and campaigns. Some analysts aggregate results from multiple polls to produce a polling average.
What does it tell us? Polls take samples of voters and weight them based on projected demographic and turnout trends. They are often a reliable indicator of partisan preferences, engagement in elections, and broad sentiment for public opinion, but with margins of error around all estimates.
What are the limits? Response rates for polling have dropped dramatically in recent years and can vary significantly over time based on respondents’ partisan preferences, availability and interest in responding to a poll. Pollsters are trying many approaches to weighting samples and results to account for these changes, including matching polling data to voter files, which can give pollsters greater insight into which respondents are registered voters, their partisanship and demographic information. In close elections, perceived polling “misses” are also more likely to occur as the results fall just within or just outside the margin of error of a poll. Large margins of error are common for smaller polls and margins of error in polls are also greater for demographic and geographic subgroups within larger polls. On top of this, the reported margin of error for most polls is substantially smaller than the “true” margin of error in the poll. Poll aggregates are helpful for developing bigger picture views of polling results, but vary in the quality and types of polls they include. Finally, polls are susceptible to question wording bias, respondents’ bias toward saying “yes” to questions, and many other factors.
Voter Registration Data
When is it available? All year, with large spikes in voter registration ahead of major elections.
Where does it come from? State and local election offices regularly publish updated voter registration data.
What does it tell us? Voter registration data tell us who among eligible voters have registered to vote and include data that voters share with election offices. Nearly every state publicly reports’ voters names, addresses, age and gender. Some states also report voters’ self-identified race. Analysts can use these data to gauge how engaged various demographic groups are with elections ahead of Election Day, though doing so with precision is difficult.
What are the limits? Registering to vote does not guarantee that someone will actually vote. In some states, registration rates are relatively higher due to automatic voter registration, usually through state departments of motor vehicles. Registration data early in an election cycle can also be biased toward older and wealthier registrants, while younger and less wealthy registrants tend to engage in much larger numbers in the final weeks before an election. In 2018, for instance, 73% of new voter registrations for the cycle occurred between September and November of 2018. Finally, many states allow voters to register on Election Day, meaning new voter registrations won’t be evident until after votes are already cast.
When is it available? Before Election Day.
Where does it come from? State and local governments, depending on state.
What does it tell us? Who you vote for is private, but whether or not you vote and how you choose to vote is public information. State and local governments release information about who voted early in-person or by mail ahead of Election Day. As Election Day approaches, campaigns use this data to focus their efforts on people who have not yet voted. Organizations with voter files can produce modeled estimates of early voting, including results by party preference, age, race and other factors. These estimates allow practitioners to identify audiences for get-out-the-vote efforts and allow analysts to estimate how relatively engaged groups of early voters are compared to other elections.
What are the limits? These data can’t reliably predict overall election results. Early voting data only includes people who have voted early in-person or by mail, two audiences that in recent elections have leaned toward Democrats. Historically, this was not the case, but in 2020, Democrats promoted early and mail voting in response to the COVID-19 pandemic while Republicans criticized the practice. The partisan differences in voting method have not been as strong in subsequent elections, but still persist, and are changing as rules around early voting and the response to the pandemic evolve. Early voting rates also vary significantly by state, often based on a state’s history and local laws.
When is it available? On Election Day, after polls close, then updated over time to reflect the estimated composition of the actual electorate.
Where does it come from? Media outlets and research firms contribute to the exit poll by interviewing voters who have already cast ballots early or by mail as well as outside polling locations on Election Day.
What does it tell us? For decades, exit polls have been the fastest way to get data to pundits and analysts, to help understand the election at a time when everyone is desperately seeking information. Because of this, they are often the default, first step in understanding demographic trends in elections. Media outlets also use exit poll data to support election night projections.
What are the limits? Providing data on election night is a massively difficult task. As with other polling, exit polls should be used in tandem with other data sources – such as precinct-level election results, AP VoteCast, pre-election surveys, and voter file analysis – to truly understand demographic trends in elections. Early exit polls offer an immediate snapshot of poll respondents, but are not weighted to actual election results, something that analysts do over time. It’s often a mistake to over-interpret specific results on election night before a significant amount of actual voting data is available. This is particularly true for margins of error for demographic subgroups, which are very high. However, exit polls are often the only information available about the composition of the electorate on election night.
When is it available? Shortly after polls close and over the course of several weeks as all votes are received and counted, with additional time needed in many jurisdictions to receive and count mail ballots.
Where does it come from? State and local governments count votes and aggregate results for public reporting, including votes cast by mail, early in-person, and on Election Day. These results are also compiled by media outlets and elections analysts.
What does it tell us? Precincts are relatively small areas where people can cast votes, so they’re helpful for shedding light on results that are deeply tied to geography. For instance, American voters are often geographically sorted by race, income and education level, so examining subsets of precinct level results can help analysts deduce national-level shifts in turnout and partisan preferences, such as shifts in turnout and vote preference among predominantly white, Latino, Black, or Asian-American and Pacific Islander precincts across the country. Additionally, the subset of young people who attend four-year colleges are concentrated in campus precincts.
What are the limits? Precinct results don’t shed light on demographics such as gender and marital status since those demographics are not as split geographically. Precincts can also contain a mix of voters internally and their composition changes over time, so representative precincts in one election may become less representative in the next. In data science, analysts also have to account for ways in which combined data from multiple, similar precincts, can mask underlying trends among individual voters or jurisdictions. For instance, a group of super-majority Latino districts that appear to have significant shifts in voter turnout or partisan preference over time could actually be showing a trend driven by disproportionate changes among non-Latinos in those districts. Comparing precinct data over time can also be difficult, as precinct lines change, and data cleaning and organization is done at the local level.
When is it available? States publish updated voter files over time. These updates cover new registrations, changes to existing registrations, party registration, participation in primaries and general election and other data points that vary by state and jurisdiction. It takes several weeks — and sometimes months — after a national election for enough jurisdictions to update data to allow for a statistically reliable national estimate of the electorate.
Where does it come from? Core voter files are produced by state and local governments. The two major parties, data firms, and academic researchers compile this data into their own voter files for analysis and supplement it with Census, commercial and field data, such as information from canvassing, texting and phone calls. Catalist operates the longest-running voter file outside the two major parties.
What does it tell us? Voter files are the most robust tool we have for understanding the electorate, including demographics, vote choice, and party preferences. Importantly, they include vote history for millions of voters, allowing analysts to measure trends over time, including changes to party registration, voters moving across states and jurisdictions, and shifting levels of participation in specific elections. Voter files can be used to build universes for campaign communication as well as studying the electorate. Voter files have evolved from paper records maintained by the parties to robust digital databases with more than a decade of data.
What are the limits? Voter-file based analysis takes time and is often only available after partisan narratives have already been established based on limited data. Voter files are generally very good at representing data such as age since one’s date of birth is publicly reported, with some exceptions. Other demographics, such as race, are available via a combination of direct reporting by voters and modeling by analysts. While voter files do not have margins of error like polls do, viewing smaller demographic groups and small geographic areas will come with more uncertainty.