What Happened™ in 2022
Constituency Reports Data Guide
The What Happened 2022 constituency reports include several types of data sourced from publicly available voter files as well as modeled data based on polling and other sources.
Registration Data
Voter registration data is public and routinely compiled by political parties and voter file companies, including Catalist.
Registration data on voter files. Generally, states publish voter rolls that include basic data such as names, addresses, voting precincts, age, and gender. However, voter registration laws and practices vary widely by state. For instance, some states require voters to proactively register and re-register for elections after moving. Meanwhile, other states automatically register voters as they interact with state offices, such as the department of motor vehicles, and also assist voters with moving their official address when they move in-county or in-state.
Age data on voter files. Voting is an age-based right and 41 states report dates of birth, birth years or age on their public voter files. For jurisdictions that do not report age data, age is often represented via commercial or other public data and can also be estimated through modeling.
Generational cohorts in electoral analysis. Catalist tends to rely on birth years and generational cohorts for its analysis. By contrast, most polls divide voters into specific age buckets, such as 18 to 29 or 65+, meaning different generational cohorts will fit into different age categories over time. Normally, these approaches do not yield very different conclusions, but the Baby Boomer generation is very large, so the raw number of people moving into the 65+ category over the past several cycles has increased. This does not mean that older voters are becoming generally more engaged overall — but rather that more people are moving into that age category. Similarly, voters between the ages of 18 and 29 enter the electorate unevenly as presidential election years have higher turnout but only occur every 4 years, which makes apples-to-apples comparisons among the youngest voters across midterms more difficult.
Gender data on voter files. Gender is well-represented on voter files, with 30 states providing gender data on their public voter files. In states without such data, gender is represented via commercial data as well as modeled based on surveys, naming conventions, and other inputs. A few states allow voters to identify as non-binary when they register to vote or get a drivers license, but these data are not reported on voter files.
Marriage data on voter files. Marriage data is modeled based on name and address conventions, polling, and other factors.
Population density. Population density is derived from Census data and calculated at the Census tract level. Catalist counts the bottom 25% of tracts as rural, the middle 50% as suburban and the top 25% as urban, relative to each state and county.
Education. Voter files can reliably model aggregate education levels thanks to robust Census data and the degree to which college- and non-college graduates geographically sort themselves. Education is not a direct stand-in for class, income or wealth. A college educated teacher with a great deal of student debt, for instance, may have lower income and wealth than a non-college graduate who owns a small business.
Race data on voter files. Seven states collect self-identified race data on their voter registration forms and include that data on their public voter files. These states include Alabama, Florida, Georgia, Louisiana, North Carolina, South Carolina and Tennessee. Additionally, voter files include historical self-reported race data from Kentucky and Mississippi, which no longer include this data on their public file. Why these states and not others? Surprisingly, there is no comprehensive historical record as to why this is the case in each state, however some of these states used to hold segregated primary elections which were ruled unconstitutional by the Supreme Court in 1944. During the Civil Rights Era, these data were useful for measuring the impact of the Voting Rights Act in covered jurisdictions, but were not mandated by the act itself.1These practices are discussed in Hacking the Electorate by Eitan Hersch (2015). Today, these data remain valuable for precisely estimating the racial composition of the electorate; some state legislators have suggested adding race data to their registration forms to help shed light on how representative their electorates are. When self-identified data is available, Catalist prioritizes that information over modeled data, including when a voter identifies their race in a state that collects race data then moves to a state that does not. When self-reported data is not available, Catalist’s Race Model is used. It relies on Census data, naming conventions, poll results and other factors to estimate and assign a race to a given record. Modeling race on voter files has allowed campaigns to reach more voters of color outside of so called “majority-minority” precincts, an important development as suburbs have become more diverse.
What Happened 2022 Report Series
About Catalist
Catalist operates as a trust, with a board comprised of major unions, progressive organizations and data specialists with years of experience in Democratic and progressive campaigns. Catalist is the first and only unionized data firm in its field (Catalist Union, Communications Workers of America Local 2336).
About Voter Files
As long as there have been lists of people qualified to vote, there have been voter files. In the pre-digital era, election administrators and state parties kept physical lists of voters tucked away in filing cabinets. After the 2000 election, with support from federal legislation, states began to digitize their voter registration records, allowing election administrators to standardize and digitize registration data across local and county-level geographies. The major parties, as well as independent voter file companies like Catalist, have used these public data as the baseline for their own voter files and have combined them with commercial, Census and other data sources to glean insights about the electorate. Catalist has compiled an overview of different types of election data. Further reading includes Hacking the Electorate (2015) by Eitan Hersh and The Victory Lab (2013) by Sasha Issenberg.
Modeled Data and Other Source
Registered Population Estimates
Calculating registration rates requires comparing the registered population to the total population.
While estimates for age and gender remain robust, estimates by race have grown more complicated. Starting in 2020, the Census changed their questionnaire, encouraging respondents to list their “origin” as part of their race self-reporting. This change led to the Census coding many more respondents as biracial, which is usually coded as “Other Race” in election analysis. This makes comparisons over time difficult and fraught with potential problems. The 2020 Census was also impacted by COVID, which may have led to undercounting, including disproportionate undercounting for people of color. Catalist will integrate newer Census data into its election analysis over time and will revisit this topic in greater depth in future reports.
Vote share
Vote share estimates rely on Census data, polling and other sources to calculate the portion of the electorate composed by a specific group. Vote share data are provided at the national and statewide level for groups and sub-groups. Because vote share is based on multiple high quality data sources, Catalist has high confidence in these estimates, including for relatively small population sizes, such as the portion of Gen Z or Millennial Black voters in New Hampshire (0.45% of the 2022 New Hampshire general electorate) or AAPI men in Ohio (0.55% of the 2022 Ohio general electorate). By contrast, modeled support for these groups is much more uncertain.
Modeled Support
Modeled support is expressed as two way Democratic vote share — meaning the percentage of the vote Democrats won among all the votes cast for Democrats and Republicans. This modeled support is developed through precinct-level results, polling data and other sources. Catalist is careful to caveat estimates or trends that may not be statistically exact. For small population sizes — for instance, AAPI women voters in Michigan — variation in polling samples from cycle to cycle may show more movement in modeled support than is actually evident in real-world election results.
When interpreting two-way Democratic support, it is important to note that shifts from election to election can result from individuals switching their vote from one party to another as well as compositional changes to the electorate, including Republican or Democratic voters in a subgroup turning out in different numbers from election to election.
Clients and partners who are interested in analyzing more sub-group level data should contact Catalist.
House vote and Heavily Contested vs. Less Contested Statewide Races
Normally, midterm elections involve a national swing in one direction or another, usually against an incumbent party. But the 2022 election broke from this pattern, with Democrats overperforming in states with heavily contested statewide elections. Catalist’s analysis calculates modeled support for House races, which largely reflect national trends, as well as top-of-ticket statewide races for governor and Senate. While there is no perfect definition of heavily contested races, this analysis relies on the final pre-election ratings from the non-partisan Cook Political Report. We define “heavily contested” elections as those which were rated Tossup or Lean for either party. These analyses include data about redistricting, polling, fundraising and on-the-ground reporting and have a strong historical track record.2The Cook Political Report ratings for the House, Senate, and Gubernatorial races included nearly all of the closest races in the country and the ones where parties heavily contested the outcome. We also examined other methods for identifying the closest races, including replicating our analysis using different cut points from election forecasting models, most notably from FiveThirtyEight. While some results change, the top contours of our analysis remain consistent using any reasonable definition. Further, it’s worth noting that any definition formulated before the election itself will miss outlier races, such as Democrat Adam Frisch coming within a few hundred votes of unseating Republican Rep. Lauren Boebert (CO-03) in a district that would otherwise be expected to lean 7 points in Republicans’ favor according to the Cook Partisan Voting Index. While it’s tempting to include such races in an analysis, doing so can unintentionally bias findings toward electorates that had exceptionally weak or strong candidates. Finally, it’s always difficult to analyze counterfactuals. Perceptions of a race as highly contested often lead to it being even more heavily contested as parties and outside groups seek to tip the race. Conversely, when races seem less contested, parties and outside groups tend to invest less, further fueling perceptions that other races should be priority.
These races include 64 House races; Senate races in Arizona, Georgia, Colorado, Nevada, New Hampshire, Ohio, North Carolina, Pennsylvania, and Wisconsin and Gubernatorial races in Arizona, Georgia, Kansas, Maine, Michigan, Nevada, New Mexico, Oregon and Wisconsin.
Primary Participation
Voter files reliably record which elections people cast ballots in. These records allow us to analyze primary electorates as well as general electorates. For program managers, primary participation is a valuable touch point for relatively engaged voters. Relatively high or low primary participation does not necessarily help campaigns predict overall turnout in a general election, but primary voters are much more likely to participate in general elections than non-primary voters.
Vote Method
Vote methods have shifted dramatically in the past few election cycles, due to expanded early and mail voting as well as the response to the COVID-19 pandemic. These shifts have important implications for groups conducting voter registration and get-out-the-vote efforts as they concentrate end-of-cycle resources on potential voters who have not yet cast their ballots.
Generally, program managers focus on how voters interact with the electoral system, for instance, Election Day voting at a local polling place, early voting at a designated location, and voting by mail. However, vote method is not recorded consistently across states. For instance, in some states, early voting and Election Day voting are recorded similarly as casting an in-person ballot on a specific date. But in other states, early voting may take the form of filling out an absentee ballot in person at an election office, which means an early in-person vote is recorded similarly to requesting an absentee ballot via mail. Catalist combines multiple sources of data, including Absentee Voting and Early Voting (AVEV) data from election offices and the U.S. Election Association Commission’s Election Administration and Voting Survey (EAVS) to estimate vote methods across the voter file.
Cross Cycle Comparisons
Different electoral cycles are relevant for evaluating changes to the shape of the electorate and changes in Democratic support. When considering how the shape of the electorate has changed, recent midterm elections are most relevant because midterms and presidential elections draw different electorates. In this report, we draw comparisons to the 2014 and 2018 midterms – with the former representing a modest national “red wave” election and the latter representing a major national “blue wave” election. By contrast, when considering changes to Democratic support, the 2020 presidential election is the most relevant comparison because the central focus is understanding how the Democratic coalition has shifted over the first two years of the Biden / Harris administration.
Contributors
Lead authors. Kirsten Walters, Data Scientist; Ben Gross, Analyst
Project lead. Hillary Anderson, Director of Analytics; Haris Aqeel, Senior Advisor
Editor. Aaron Huertas, Communications Director
Graphics and data engineering. Kirsten Walters, Data Scientist
Catalist Executives. Michael Frias, CEO; Molly Norton, Chief Client Officer
Catalist Analytics Team. Janay Cody PhD, Senior Advisor for Data Equity; Jonathan Robinson, Director of Research
Catalist Data Team. Russ Rampersad, Chief Data Officer; Lauren O’Brien, Deputy Chief Data Officer; Dan Buttrey, Director of Data Acquisition
Many current and former staff members have also contributed to this report through their work building and maintaining the Catalist file. These insights would not be possible with the long-term investment Catalist has made in people and data since 2006.
Finally, Catalist is deeply grateful to the clients, partners and other community leaders who offered thoughtful review and feedback throughout the process, especially:
- EMILY’S List: Melissa Williams, Vice President Independent Expenditure
- EquisLabs: Carlos Odio, Co-Founder and Senior Vice President of Research; and team
- HIT Strategies: Terrance Woodbury, CEO and Co-Founder; and team
- NAACP in partnership with GSSA: Derrick Johnson, President and CEO NAACP; Dr. Albert Yates, Principal GSSA, Catalist board member
- Planned Parenthood Action Fund: Rachel Hall; Director, Data, Analytics and Research
- Strategic Victory Fund: Stephanie Schriock
What Happened™ Catalist © LLC. All Rights Reserved.
Proprietary data and analysis not for reproduction or republication.
Catalist’s actual providing of products and services shall be pursuant to the terms and conditions of a mutually executed Catalist Data License and Services Agreement.