Hugo Chávez dominated the Venezuelan electoral landscape since his first presidential victory in 1998 until his death in 2013. Nobody doubts that he always received considerable voter support in the numerous elections held during his mandate. However, the integrity of the electoral system has come into question since the 2004 Presidential Recall Referendum. From then on, different sectors of society have systematically alleged electoral irregularities or biases in favor of the incumbent party. We have carried out a thorough forensic analysis of the national-level Venezuelan electoral processes held during the 1998–2012 period to assess these complaints. The second-digit Benford's law and two statistical models of vote distributions, recently introduced in the literature, are reviewed and used in our case study. In addition, we discuss a new method to detect irregular variations in the electoral roll. The outputs obtained from these election forensic tools are examined taking into account the substantive context of the elections and referenda under study. Thus, we reach two main conclusions. Firstly, all the tools uncover anomalous statistical patterns, which are consistent with election fraud from 2004 onwards. Although our results are not a concluding proof of fraud, they signal the Recall Referendum as a turning point in the integrity of the Venezuelan elections. Secondly, our analysis calls into question the reliability of the electoral register since 2004. In particular, we found irregular variations in the electoral roll that were decisive in winning the 50% majority in the 2004 Referendum and in the 2012 Presidential Elections.

Funding: This work was supported by the Spanish Ministry of Economy and Competitiveness (Projects ECO2011-25706 and CSO2012-35852). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2014 Jiménez, Hidalgo. This is an open-access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

This paper proceeds as follows. In the next section we describe the election data under study. Then, we apply a battery of election fraud forensic tests, which provide consistent and complementary results. Thereafter, we turn to a discussion on the integrity of Venezuelan elections and present some final conclusions.

Some electoral irregularities may leave traces in the form of numerical anomalies. If this is the case, they can be detected by appropriate statistical methods. The main idea underlying these methods is the comparison between observed values of statistics based on the vote count and their expected values. When we say expected value, we usually mean the regular value in a free and fair election. Therefore, large discrepancies between observed values and expected ones (outliers) are usually interpreted as statistical evidence regarding the fairness of an election. Benford's test [4] and many other tools used in election forensics [5] are examples of these methods. The application of statistical mechanics concepts has helped notably in the understanding of statistical regularities in the vote count [6] – [8] , providing new insights for the forensic analysis of elections [9] . But the mere presence of outliers is not a proof of fraud, even less of an outcome-determinative fraud, “where the fraud affects the outcome of the election such that the winners and losers are different from what they would have been had the fraud not be committed” [3] . Elections are complex processes where errors and unforeseen events frequently occur. Some of them may even constitute serious irregularities and may generate outliers but may not, however, affect aggregate results. Nevertheless, the presence of electoral irregularities that systematically favor one electoral option is another issue. The political implications may be serious when the overall results are affected. For this reason, we are not only interested in detecting outliers that may be the trace of a fraud, but also in evaluating if they are correlated with a bias in the vote count and if this could have been a determining factor in Chávez's electoral victories.

Despite the frequent use of the term, there is ambiguity regarding what is and what is not electoral fraud. What may constitute fraud in one country, or at a particular moment, may not be considered as such in another. Nonetheless, any irregular action that is performed with the intention of altering the development of an election or election-related materials, with the aim of affecting its results, may be considered a fraud [3] . In Venezuela, allegations of fraud are not new, but they have become more frequent since 2004. Apart from allegations of manipulations of the vote count, the opposition has made other claims, including manipulation of the electoral register, coercion of public servants and the electorate, and misuse of public resources and funds for electioneering. There have also been some accusations of polling station violations and the destruction of electoral material. A summary of the alleged electoral irregularities under Chavismo can be found in http://www.americasquarterly.org/electoral-irregularities-under-chavismo-tally . Links to several dozens of documents about them are available at http://esdata.info and http://www.sumate.org .

The electoral law, approved in Venezuela in 1997, established the automation of the vote count. In the period between 1998 and 2000, the vote count was carried out both manually and automatically. However, since 2004 the results come exclusively from a computer center, where the data from the voting machines distributed throughout the country are centralized. Another important characteristic that differentiates the electoral processes before and after 2004 is the composition of the governing body of the elections, the National Electoral Council (CNE in Spanish). The National Assembly, which was controlled by the ruling coalition, appointed an openly pro-government management body. Four out of the five current CNE's rectors lean strongly towards the ruling party and only one to the opposition forces. Although the CNE has improved the transparency and reliability of the electoral system, particularly since 2006, the fact is that the Venezuelan electoral authority has taken controversial decisions that have only ever favored the government and never the opposition [2] .

Hugo Chávez was elected President of Venezuela in 1998 and ruled the country until his death in 2013. He won four consecutive presidential elections (1998, 2000, 2006 and 2012) and a recall referendum (2004), convened against him by opposition forces. He also proposed several major reforms that were approved in national referenda (two held in 1999, one in 2000 and another in 2009). In addition, his party won an overall majority in the National Assembly in three parliamentary elections that took place during his presidency (2000, 2005 and 2010), and in all regional and local elections. His sole election defeat came in the 2007 constitutional referendum, when he attempted a radical socio-political reform. This electoral record could be overshadowed, however, by the allegations of fraud made by opposition sectors since the 2004 Recall Referendum [1] .

For each election, we consolidated these data in one set, labeled with the year of the election, except for the 1999 and 2007 referenda and the 2010 Parliamentary elections, for which there are two data sets. 1999a, 1999b, 2007a and 2007b are the abbreviations to refer to the data associated to the two questions considered in the referenda of 1999 and 2007. The 2010 Parliamentary elections were preceded by an electoral reform. Under the approved system, 70% of the 165 deputies of the National Assembly were elected on a first-past-the post system and 30% on a party list. The results are considered in two separate sets, labeled 2010a and 2010b, respectively. Each polling center is identified by a code. The numbers were re-labeled. We used the old labels for elections and referenda previous to 2005 and the new ones for elections and referenda from 2005 onwards. The conversion table and the election data under consideration are available at http://elecionforensincs.com.es/ . Table 1 shows the percentages of votes for Chávez and the voter turnout of the elections under study.

Unlike in an earlier version of this paper [11] , where we analyzed only some of the elections under consideration, we do not distinguish between data coming from automated polling stations or not. But we look at the same variables per electoral unit. Namely:

For our analysis, we have taken into account data at the least aggregation level. The polling cluster that collects this data has been denominated differently in diverse elections: voting table, electoral notebook, voting machine, etc. To avoid confusion, we will refer to it as electoral unit [10] . For all the presidential elections and referenda, a small number of electoral units outside of the country were excluded. We did this to standardize the data set. On the one hand, these units were peculiar and negligible for total results. On the other hand, there were no electoral units abroad in parliamentary elections. We also excluded a very small number of electoral units with missing data or without valid votes that could arise from technical problems. Thus, the average of registered voters by electoral unit is very similar in the data set under study. Roughly this figure comes down to 500, except for the 2000 Presidential Elections, which is 1126. However, the number of electoral units almost doubled between 1998 and 2012, from 20,026 units to 38,853, showing a strong growth in voter registration.

Therefore, we took into account every year of national-level elections since Chávez first won the presidency of Venezuela until his death. However, for the 2000 general elections, known as ‘Mega-elections’ because every single official was re-elected, we only considered data from the presidential elections. In 1999 there were two referenda, one in April and one in December, and one election in July for the seats of the National Constituent Assembly (NCA). During the April referendum, two queries were made: about the convening of the NCA to draft a new constitution and about the approval of the basis for this constituent process. In December, the new constitution was adopted by national referendum. We only considered the April referendum due to the lack of available data for the July elections and the December referendum at the level of breakdown we require for our analysis. The official data (available at http://www.cne.gov.ve/web/index.php ) has been downloaded and stored in spreadsheets in http://esdata.info/ , where the reader can also find additional information on each election.

Data Analysis

Statistical detection of irregular support As we have already discussed [21], Venezuelan voters can choose the polling center where they vote. But, in polling centers with two or more electoral units, the voters are assigned to the units according to a pseudorandom criterion. Therefore, conditioning on the results by polling centers, the number of votes per electoral unit follows a Hypergeometric distribution. Specifically: Denote by V the number of votes favoring Chávez in a given electoral unit.

Let p be the proportion of votes favoring Chávez over the number of registered votes at the center to which the unit belongs.

Denote n and m be the number of voters registered in the electoral unit and in the polling center. Then, given p, n, and m, V follows a Hypergeometric distribution with expected value equals to pn and variance equals to Thus, a standardized measure of regularity of the number of votes favoring Chávez in the electoral unit is the Z-score Z-scores far from zero imply irregular support in the electoral unit, no matter how “special” or “standard” is the polling center to which the unit belongs. When n is large, and m much larger than n, the distribution of the Z-score should be approximately a standard normal N(0,1). However, some irregularities may generate large values of Z, out of any normal confidence interval. Examples of these irregularities are ad hoc decisions on the final allocation of voters, taken on the election-day to solve fails on touch-screen machines. We will call non-fraudulent irregularity any unforeseen action that affects the vote distribution of the electoral units in a polling center without affecting significantly the vote distribution at the center. Non-fraudulent irregularities may occur with high probability due to the complexity of the electoral processes. Therefore, the distribution of the Z-scores should have heavier tails than the normal distribution. In fact, the Z-scores of the elections collapse on a t-student. With the possible exception of the 2000 elections, the goodness of fit is extremely good for a t-student with 3 degrees of freedom (Fig. 9), hereafter denoted by t(3). As we commented, 2000 was a mega-election, where every elected office in the country was elected. Thus, we expect more non-fraudulent irregularities in these elections than in any other and, consequently, heavier tail distribution for their Z-scores. But, leaving aside some minor loss of accuracy for the 2000 case, we can assume that the Z-scores of any election are approximately distributed according to a t(3). This fit will be used to simulate Z-scores for a bootstrap model, which is employed only for illustrating the asymptotic normality of the test statistics that we discuss below. These statistics, that we will name standardized differences, are based on the Z-scores but their asymptotic distribution does not depend on the goodness of the fit of the t(3)-distribution. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 9. The distributions of Z-scores of different elections collapse on a t-student with 3 degrees of freedom. Only the 2000 elections show slightly heavier tails. https://doi.org/10.1371/journal.pone.0100884.g009 If an election is fair, including that election resources are distributed with equity among the polling centers, Z-scores farther from zero should be product of chance. This covers extreme Z values generated by non-fraudulent irregularities on a random set of electoral units. Hence we consider: The set of the k electoral units with Z-score farther from zero, which we will denote by M k . The null hypothesis H 1 : all the electoral units have the same probability to be in M k . We propose a test for H 1 based on one developed for the study of the 2004 Recall Referendum [21]. It relies on the classical confidence intervals for the ratio estimator [25]. Let be the proportion over valid votes of votes for Chávez on M k . Denote by R the same proportion but computed on all the electoral units under study. Let T i be the total valid votes at the electoral unit i, V i the number of votes for Chávez, and Denote by K the total number of electoral units, and by the average of valid votes per electoral unit. Now consider the estimated variance of defined by Then, if k is large, K-k is much larger than k, and H 1 is true, the standardized difference is distributed approximately as a standard normal N(0,1) [25]. We test H 1 by computing the for large values of k. Values far away from normal confidence intervals, for a wide range of large values of k, are considered strong presumptions against H 1 . In estimating proportions, standard large sample sizes (k, in our case) are above 1000. We consider values of k between 500 and 1500, thus we are covering, from below and from above, standard large sample sizes. For all the cases, K-k is large enough. We also illustrate the asymptotic normality of under H 1 from a model of fair elections based on a hierarchical bootstrap. Specifically, we generate random samples of size K of Z-scores from a t(3) distribution. Then we assign the k Z-scores farther from zero to a random sample of units. Thus, is computed from the above equations, keeping the observed values of p, m, n and T i , per electoral unit and polling center in each election or referendum. Figures 10 and 11 display the standardized differences computed from the official results of all the elections and referenda. For each year, we also consider the standardized differences of 100 fair elections computed from the bootstrap model discussed above. Fig.10 shows the 1999 referendum and the 1998, 2000 and 2005 elections. Fig. 11 shows the rest. We also plotted the 99% normal confidence interval ( ) in all the figures. The simulations show regular fluctuations as we expect under H 1 . Although some of them go outside of the confidence interval, they are mainly embedded within it. The curves based on official results of Fig 10 show a similar behavior. Even the 2000 elections, which make a tour above the 2.58-level at moderate values of k, are well embedded within/in the confidence interval at large sample sizes. The standardized difference series from official results of Fig. 11 reach values higher than any simulation. They are well above the confidence interval, providing strong evidence against H 1 for elections of this group. Except for 2005, we firmly reject H 1 from 2004 onwards. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 10. The standardized differences of the 1999 referenda and the 1998, 2000 and 2005 elections (wide black lines) are well embedded within/in the 99% normal confidence interval at large sample sizes. Standardized differences of fair elections computed from a hierarchical bootstrap model (thin blue lines) also verify the expected behavior under H 1 . https://doi.org/10.1371/journal.pone.0100884.g010 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 11. The standardized differences based on official results from 2004 onwards reach values higher than any simulation. They are well above the 99% normal confidence interval. These elections provide strong evidence against H 1 . The more irregular distributions of votes occurred on electoral units where the vote counting was significantly favorable to Chávez. https://doi.org/10.1371/journal.pone.0100884.g011 The alternative hypothesis to H 1 does not imply necessarily that there were fraudulent irregularities in the units with outlier values of Z; only that the extreme results occurred on a non-random set of electoral units. On this set, the vote counting has a significant bias in favor of Chávez. It is possible that there are non-fraudulent mechanisms that can explain this phenomenon. In fact, it is not unreasonable to think that some electoral districts have a greater chance of presenting non-fraudulent irregularities than others. But it is suspicious that it is only observed from 2004 onwards, with the sole exception of the 2005 parliamentary election. Inevitably, this points again to the 2004 Recall Referendum as a watershed regarding the integrity of the Venezuelan electoral processes.