It’s common knowledge that political parties in the United States collect information about potential voters, but exactly how comprehensive is the data they collet?

To explore that question, we set out to find detailed descriptions of 176 data points the Republican National Committee (RNC) has been gathering about voters since at least 2008. The data points were published on June 19 by a cybersecurity company after one of its analysts found a trove of voter records that had been inadvertently left on a public Amazon server. Our search led us to previously unreported data sources, providing a more complete view of what the Republican party tracks about American voters.

The voter records were discovered on June 12 by Chris Vickery, a risk analyst at the security firm UpGuard. While scanning the internet for misconfigured systems, Vickery came across an 11-terabyte cache of election-related data that he later learned had been compiled by three Republican contractors: TargetPoint Consulting, the Data Trust, and Deep Root Analytics. In a public statement, Deep Root took responsibility for leaving the data on the Amazon server where Vickery found it, saying the data had only been used to “inform local television ad buying.”

Included within the files Vickery found were 102 massive spreadsheets–two for every state and the District of Columbia. For each state, one file contained voter data based on the 2008 election, and the other based on the 2012 election. In its blog post, UpGuard listed the 176 categories that made up the column headers in those spreadsheets. Some were self-explanatory, such as “FirstName” and “OfficialParty,” but others were not, such as “VH12PP” and “RNCCalcParty.” A few were somewhat clear, such as “ModeledEthnicGroup,” which suggests data about a voter’s ethnic group as determined by predictive modeling, but what those groups were was less clear. UpGuard declined to share further details about the data, citing the inherent privacy violations that could come with such a disclosure.

However, we were able to match up the categories revealed by UpGuard with other sources and obtain detailed descriptions of most of them.

First, we found that the unique field names listed in UpGuard’s blog post match up with those used in a now-offline API that appears to have been built by the Data Trust for the RNC. The RNC’s API, which was previously hosted at docs.api.gop.com, is no longer online, and cached versions of it only show an Amazon AWS login page. But very specific Google searches, such as site:docs.api.gop.com VH12G matched 137 of the 176 categories UpGuard listed, and most of those revealed the category’s descriptions. Some fields were slightly different, listed in UpGuard’s post as “RegistrationAddr1” and in the API as “Registration_Addr1,” for example, but the added underscores were the only inconsistencies.

Google A Google search revealing the description of “VH12G”

Additionally, a GitHub account owned by the Data Trust includes a repository called “direct-api-examples” that also references many of the field names, and includes example uses of what appears to be an early version of the API, which it calls the “GOP Data Trust API.”

GitHub A repository in the Data Trust’s GitHub account

Of course, the link between this API and the data found by Vickery is unclear and unconfirmed, but it is apparent that the matching fields describe the same data. Asked for a comment about the API, the Data Trust referred us to the RNC, and the RNC did not respond to our questions.

Further insight into the nature of the data came from a post on Stack Overflow that included JSON data, which also used many of the field names. The provenance of the data is unclear, but nearly all of the 59 categories it contains match the categories in the RNC’s API and those shared by UpGuard, including uniquely named fields like “RNCCalcParty” and “MADR_LastCleanse.” Because the data on Stack Overflow contained actual values, it helped us to expand the descriptions of some of the columns.

Stack Overflow A post on Stack Overflow containing JSON data that shares field names with the voter data found by Vickery

In aggregate, these clues allowed us to compile the lists below. They contain descriptions of 137 data points the Republican party knows, or at least wants to know, about every American voter. (According to UpGuard, the database that Vickery found did not contain data in every field for every voter.) All of the descriptions came from the RNC’s API, except in cases where the category names had a match in the API, but where the descriptions of those categories did not show up. In those cases, we put the descriptions we inferred from the field names in italics. Some descriptions include “sample data,” which came from the data posted on Stack Overflow. The 39 fields we were unable to identify were those ranging from “PG01” to “PG39.”

Your likely religion and ethnicity

The RNC and the Democratic National Committee both pay millions of dollars to data analysis firms like Deep Root to combine information provided by states with data gathered from cold-calls, canvasing efforts, campaign contributions, and social media. Then those datapoints are synthesized to determine how you’re likely to vote and what kind of messaging you’ll respond to. The fields below refer to data derived through that kind of analysis. Notably, the codes for “ModeledEthnicGroup” are limited to “H” for Hispanic and “B” for black, but the field in the Stack Overflow data was populated with a “Z.”

RNCCalcParty: RNC Calculated Partisan score: 1=Hard Rep, 2=Lean Rem [SIC], 3=Swing/Ind, 4=Lean Dem, 5=Hard Dem

RNC Calculated Partisan score: 1=Hard Rep, 2=Lean Rem [SIC], 3=Swing/Ind, 4=Lean Dem, 5=Hard Dem StateCalcParty: Likely a state-level partisanship score similar to RNCCalParty

Likely a state-level partisanship score similar to RNCCalParty ModeledEthnicity: Modeled Ethnicity – Ethnicity Code. See supplemental documention [SIC] for code values. Sample data: “E1”

Modeled Ethnicity – Ethnicity Code. See supplemental documention [SIC] for code values. Sample data: “E1” ModeledReligion: Modeled Religion – Ethnicity Religious Affiliation Code: B = Buddhist, C = Catholic, G = Greek Orthodox, H = Hindu, I = Islamic, J = Jewish, K = Sikh, L = Lutheran […information cuts off here]. Sample data: “P”

Modeled Religion – Ethnicity Religious Affiliation Code: B = Buddhist, C = Catholic, G = Greek Orthodox, H = Hindu, I = Islamic, J = Jewish, K = Sikh, L = Lutheran […information cuts off here]. Sample data: “P” ModeledEthnicGroup: Modeled Ethnic Coding (H=Hispanic, B=Black). Sample data: “Z”

Your voting history

The voting data retained by each state varies, but is generally considered public information. These fields list which party citizens voted for in each election going back to 2002.

LastActiveDate (last_activedate): Last Active Date – Date of Last Voter Activity (if provided on source data)

Last Active Date – Date of Last Voter Activity (if provided on source data) VoterStatus: Voter Status – Current Status of registration as observed by jurisdiction. A – Active, I – Inactive, C – Cancelled, D – Deceased.

Voter Status – Current Status of registration as observed by jurisdiction. A – Active, I – Inactive, C – Cancelled, D – Deceased. VH12G: Vote History 2012 General – 2012 General Election

Vote History 2012 General – 2012 General Election VH12P: Vote History 2012 Primary – 2012 Primary Election

Vote History 2012 Primary – 2012 Primary Election VH12PP: Vote History 2012 Presidential – 2012 Presidential Primary Election

Vote History 2012 Presidential – 2012 Presidential Primary Election VH11G: Vote History 2011 General – 2011 General Election

Vote History 2011 General – 2011 General Election VH11P: Vote History 2011 Primary – 2011 Primary Election

Vote History 2011 Primary – 2011 Primary Election VH10G: Vote History 2010 General – 2010 General Election

Vote History 2010 General – 2010 General Election VH10P: Vote History 2010 Primary – 2010 Primary Election

Vote History 2010 Primary – 2010 Primary Election VH09G: Vote History 2009 General – 2009 General Election

Vote History 2009 General – 2009 General Election VH09P: Vote History 2009 Primary – 2009 Primary Election

Vote History 2009 Primary – 2009 Primary Election VH08G: Vote History 2008 General – 2008 General Election

Vote History 2008 General – 2008 General Election VH08P: Vote History 2008 Primary – 2008 Primary Election

Vote History 2008 Primary – 2008 Primary Election VH08PP: Vote History 2008 Presidential – 2008 Presidential Primary Election

Vote History 2008 Presidential – 2008 Presidential Primary Election VH07G: Vote History 2007 General – 2007 General Election

Vote History 2007 General – 2007 General Election VH07P: Vote History 2007 Primary – 2007 Primary Election

Vote History 2007 Primary – 2007 Primary Election VH06G: Vote History 2006 General – 2006 General Election

Vote History 2006 General – 2006 General Election VH06P: Vote History 2006 Primary – 2006 Primary Election

Vote History 2006 Primary – 2006 Primary Election VH05G: Vote History 2005 General – 2005 General Election

Vote History 2005 General – 2005 General Election VH05P: Vote History 2005 Primary – 2005 Primary Election

Vote History 2005 Primary – 2005 Primary Election VH04G: Vote History 2004 General – 2004 General Election

Vote History 2004 General – 2004 General Election VH04P: Vote History 2004 Primary – 2004 Primary Election

Vote History 2004 Primary – 2004 Primary Election VH04PP: Vote History 2004 Presidential – 2004 Presidential Primary Election

Vote History 2004 Presidential – 2004 Presidential Primary Election VH03G: Vote History 2003 General – 2003 General Election

Vote History 2003 General – 2003 General Election VH03P: Vote History 2003 Primary – 2003 Primary Election

Vote History 2003 Primary – 2003 Primary Election VH02G: Vote History 2002 General – 2002 General Election

Vote History 2002 General – 2002 General Election VH02P: Vote History 2002 Primary – 2002 Primary Election

What messages you’ll respond to

These fields are a bit ambiguous, but are clearly based on a micro-targeting campaign conducted in 2010, which appears to have examined voter sentiment on several factors.

MT10_Party: MT10 Party – 2010 Regional Microtargeting – Party Model.

MT10 Party – 2010 Regional Microtargeting – Party Model. MT10_GenericBallot: MT10 Generic Ballot – 2010 Regional Microtargeting – Generic Ballot Model

MT10 Generic Ballot – 2010 Regional Microtargeting – Generic Ballot Model MT10_Turnout: MT10 Turnout – 2010 Regional Microtargeting – Turnout Model

MT10 Turnout – 2010 Regional Microtargeting – Turnout Model MT10_ObamaDisapproval: MT10 Obama Disapproval – 2010 Regional Microtargeting – Obama Disapproval Model

MT10 Obama Disapproval – 2010 Regional Microtargeting – Obama Disapproval Model MT10_Jobs: MT10 Jobs – 2010 Regional Microtargeting – Jobs Model

MT10 Jobs – 2010 Regional Microtargeting – Jobs Model MT10_Healthcare: MT10 Healthcare – 2010 Regional Microtargeting – Healthcare Model

MT10 Healthcare – 2010 Regional Microtargeting – Healthcare Model MT10_SoCo: MT10 SoCo – 2010 Regional Microtargeting – Social Conservative Model

What kind of voter you are, where you live, and how to contact you

Each state keeps track of its citizens’ voting records, party registrations, and contact details, and all of that data is generally considered public information. Some states sell the information to campaigns and other organizations; others give it away for free. The fields listed below include that kind of data, which every voter should assume their state keeps track of. One notable discovery here is that when voters move, there’s a field that describes whether it’s an “individual” or “family” move, presumably to account for cases where children move out of their parents home. Another is that telephone numbers appear to be obtained or otherwise verified with reverse-lookups using voters’ addresses.