Before delving into a detailed discussion, let us make the distinction between parameters, which are part of the model being evolved, and hyper-parameters (also called meta-parameters), which are not part of the model and need to be set by the user before running the evolutionary process, either manually or algorithmically. Examples of parameters are synaptic weights in deep neural networks and ephemeral random constants (ERCs) in genetic programming (GP), while examples of hyper-parameters include the number of hidden layers in deep neural networks and several standard parameters in GP: population size, generation count, crossover rate, mutation probability, and selection type.

Brest et al. [8] noted that there are two major forms of setting (hyper-)parameter values: parameter tuning and parameter control. Tuning refers to the common practice of seeking good values for the parameters before running the algorithm, then running the algorithm using these values, which remain fixed during the run. Parameter control means that values for the parameters are changed during the run.

An early work by [9] looked into the problem of VLSI layout, aiming to minimize the overall connection length between the components (cells) of a circuit. They devised a genetic algorithm that used three operators: crossover (order, cycle, or partially mapped—PMX), mutation (pairwise interchange of genes), and inversion (taking a random segment in a solution string and flipping it). They used a meta-GA to optimize crossover rate, inversion rate, and mutation rate. The individuals in the meta-GA population consisted of three integers in the range [0,20] (the overall search space was thus quite small, comprising 8000 combinations). An individual’s fitness was defined as the quality of the best layout found when a GA was run with the parameters it embodied. The meta-GA was run on four test circuits, had a population size of 20, ran for 100 generations, and used uniform crossover (select parameter from either parent at random) and mutation (add random noise; no inversion used). They noted that crossover rate converged to values in the range 20–40%, with little difference in best fitness as long as the rate was within this range. The mutation rate evolved by the meta-GA was 0.5–1.5%, and the inversion rate was 0–30%. The authors then adopted a crossover rate of 33%, an inversion rate of 15%, and a mutation rate of 0.5%. These were used to run the optimizing GA on circuits of interest (different than those used in the meta-GA phase) and compare it with other techniques.

Another early work by [10] described meta-evolutionary programming. Their flavor of evolutionary programming used mutation only to evolve solutions to two functions in \(\mathbb {R}^{2}\). Mutation took a single parent and generated an offspring by adding Gaussian random noise with zero mean and variance equal to F(x,y)—the function under investigation; this was applied to all members of the population. The meta-algorithm attached a perturbation term to each individual in the population, which was used as variance during mutation. This term was then evolved along with the solution. They compared both these algorithms (meta- and non-meta) with each other, and with a standard, Holland-style genetic algorithm, concluding that both versions of evolutionary programming outperformed the GA. They also concluded that the standard evolutionary-programming method attained better results over the two functions studied, but that the meta-version was more robust to changes in the function.

Wu and Chow [11] applied a genetic algorithm to nonlinear constrained mixed discrete-integer optimization problems, using a meta-GA to optimize population size, crossover probability, mutation probability, and crossover operator. The total number of parameter combinations was 19,200. The fitness of an individual in the meta-GA population was taken as the optimum found by a GA run with the parameters defined by the individual. Their findings showed insensitivity to crossover rate but high sensitivity to mutation rate. Four-point crossover outperformed one-, two-, and three-point crossover.

Hinterding et al. [12] (see also [13]) noted that, “it is natural to expect adaptation to be used not only for finding solutions to a problem, but also for tuning the algorithm to the particular problem.” Their paper provided a short survey of adaptation techniques in evolutionary computation. They defined four categories of adaptation: static—constant throughout run and tuned externally; and dynamic, refined further into deterministic—parameter altered by a deterministic rule, adaptive—some feedback from the evolutionary algorithm determines the change in the parameter, and self-adaptive—the parameters to be adapted are encoded into the chromosomes and undergo crossover and mutation. They also defined four levels of adaptation: environment—such as changes to the fitness function (e.g., weights), population—parameters that apply to the entire population are adapted, individual—parameters held within an individual affecting only that individual, and component—parameters specific to a component or gene within an individual (such as self-adaptation of component-level mutation steps sizes and rotation angles in evolution strategies).

Ong and Keane [14] presented meta-Lamarckian learning in the context of memetic algorithms (MA), which incorporate local improvement procedures within traditional GAs. Their paper investigated the adaptive choice of local search (LS) methods to ensure robustness in MA search. In addition to Darwinian evolution they also studied Lamarckian learning, where the genotype reflected the result of improvement through placing the locally improved individual back into the population to compete for reproductive opportunities. They studied two adaptive meta-Lamarckian learning strategies, a heuristic approach based on subproblem decomposition, and a stochastic approach based on a biased roulette wheel. They tested their system on continuous parametric benchmark test problems and on a wing-design problem. They concluded that, “the strategies presented are effective in producing search performances that are close to the best traditional MA with a LS chosen to suit the problem in hand. Given that such knowledge is often not available a priori, this ability to tackle new problems in a robust way is of significant value.”

Ramos et al. [15] proposed the utilization of logistic regression for tuning the parameters of a “transgenetic” algorithm—an evolutionary algorithm that deals, basically, with a population of chromosomes and a population of transgenetic vectors. They cited symbiogenesis as their inspiration, a theory of evolution according to which new cell organelles, new bodies, new organs, and new species arise from symbiosis, wherein independent organisms merge to form composites. The chromosomes of a transgenetic algorithm do not share genetic material directly. There are no crossover and mutation operations but rather transgenetic vectors that obtain and insert information into the chromosomes. They used logistic regression to set two main parameters of their algorithm (population size and maximum length of transgenetic vector that met certain constraints), their problem of interest being Traveling Salesman. They showed that their algorithm outperformed a standard memetic algorithm.

Brest et al. [8], mentioned above, described an efficient technique for adapting control parameter settings associated with differential evolution (DE). DE uses a floating-point encoding for global optimization over continuous spaces, creating new candidate solutions by combining the parent individual and several other individuals of the same population. A candidate replaces the parent only if it has better fitness. DE has three parameters: amplification factor of the difference vector, crossover control parameter, and population size. In [8], the parameter control technique was based on the self-adaptation of the first two parameters, which were encoded within an individual’s genome. Their testbed consisted of twenty-one benchmark functions from [16]. They concluded that self-adaptive DE is better or comparable to the original DE and some other evolutionary algorithms they examined.

De Jong [17]—in his chapter in the book Parameter Setting in Evolutionary Algorithms ([18])—provided a thirty-year perspective of parameter setting in evolutionary computation. He wrote that, “It is not surprising, then, that from the beginning EA practitioners have wanted to know the answers to questions like:

Are there optimal settings for the parameters of an EA in general?

Are there optimal settings for the parameters of an EA for a particular class of fitness landscapes?

Are there robust settings for the parameters of an EA that produce good performance over a broad range of fitness landscapes?

Is it desirable to dynamically change parameter values during an EA run?

How do changes in a parameter affect the performance of an EA?

How do landscape properties affect parameter value choices?”

He went on to review static parameter-setting strategies, where he mentioned a two-level EA, the top level of which evolved the parameters of the lower-level EA. [17] stated that the, “key insight from such studies is the robustness of EAs with respect to their parameter settings. Getting ‘in the ball park’ is generally sufficient for good EA performance.” Our study herein not only confirms this observation through numerous experiments, but also presents the novel finding that the ballpark can be quite large. Of dynamic parameter-setting strategies he opined that, “it is difficult to say anything definitive and general about the performance improvements obtained through dynamic parameter setting strategies.” He added, interestingly, “My own view is that there is not much to be gained in dynamically adapting EA parameter settings when solving static optimization problems. The real payoff for dynamic parameter setting strategies is when the fitness landscapes are themselves dynamic...” [17] also discussed the different aspects of setting various standard parameters: parent population size, offspring population size, selection, reproductive operators; adapting the representation; and parameterless EAs.

Kramer [19] provided a survey of self-adaptive parameter control in evolutionary computation, where—as noted above—control parameters are added to the (evolving) genome. He complemented the taxonomy offered by [13], dividing parameter setting into two main categories: tuning and control. Tuning was further divided into tuning by hand, tuning by design of experiments, and tuning by meta-evolution; while control was divided as previously into deterministic, adaptive, and self-adaptive. This paper mainly focused on function optimization techniques, such as the covariance matrix self-adaptation evolution strategy (CMSA-ES). A mention of meta-evolution noted that they, “have to be chosen problem-dependent, which is the obvious drawback of the approach.” He concluded with the observation that most theoretical work on self-adaptation concentrated on mutation, stating that, “A necessary condition for the success of self-adaptation is a tight link between strategy parameters and fitness.”

Eiben and Smit [6] (see also [20, 21]) presented a conceptual framework for parameter tuning based on a three-tier hierarchy of: a problem, an evolutionary algorithm (EA), and a tuner. They argued that parameter tuning could be considered from two different perspectives, that of configuring an evolutionary algorithm by choosing parameter values that optimize its performance, and that of analyzing an evolutionary algorithm by studying how its performance depends on its parameter values. Furthermore, they distinguished between analyzing an evolutionary algorithm by studying how its performance depends on the problems it is solving, and analyzing an evolutionary algorithm by studying how its performance varies when executing independent repetitions of its run. They noted the existence of two types of parameters, qualitative (e.g., crossover type) and quantitative (e.g., crossover rate). They opined that, “using tuning algorithms is highly rewarding. The efforts are moderate and the gains in performance can be very significant. Second, by using tuning algorithms one does not only obtain superior parameter values, but also much information about parameter values and algorithm performance. This information can be used to obtain a deeper understanding of the algorithm in question.” The paper discussed a wide range of tuning algorithms, which they classified as sampling methods, model-based methods, screening methods, and meta-evolutionary algorithms. Of interest in their discussion of meta-evolutionary GAs was [22], an early (possibly first) though limited work; and the description of multi-objective meta-GAs, which tuned for more than a single objective, e.g., speed and accuracy. They opined that, “parameter tuning in EC has been a largely ignored issue for a long time... In the current EC practice parameter values are mostly selected by conventions, ad hoc choices, and very limited experimental comparisons.”[italics added] This latter observation—with which we wholly concur—forms part of our motivation for the current study.

Arcuri and Fraser [23] carried out “the largest empirical analysis so far on parameter tuning in search-based software engineering.” They performed experiments in the domain of test generation for object-oriented software using genetic algorithms. The objective was to derive sets of test cases (suites) for a given class, such that the test suite maximized a chosen coverage criterion while minimizing the number of tests and their length. A test case in this domain was a sequence of method calls that constructed objects and called methods on them. Because their goal was to study the effects of tuning, they analyzed all the possible combinations of the selected parameter values. They concluded that, “tuning can improve performance, but default values coming from the literature can be already sufficient.”

Veček et al. [24] introduced a new tuning method—CRS-Tuning—that is based on meta-evolution and their novel method for comparing and ranking evolutionary algorithms, Chess Rating System for Evolutionary Algorithms (CRS4EAs). They discussed the approach’s advantages over other tuning methods.

Bergstra and Bengio [25] studied neural networks, showing that random experiments were more efficient than grid experiments for hyper-parameter optimization in the case of several learning algorithms on several datasets. They wrote that, “random experiments are more efficient because not all hyperparameters are equally important to tune... Random experiments are also easier to carry out than grid experiments for practical reasons related to the statistical independence of every trial.” This paper partly motivated our choice of random search in “Searching for parameters using random search” section.

Smit and Eiben [26] is perhaps the most relevant paper to our current research, presenting a meta-EA called REVAC (Relevance Estimation and Value Calibration), which they used on a suite of 25 real-valued benchmark functions (real-parameter optimization functions defined for the CEC 2005 Special Session on Real-Parameter Optimization, including five unimodal functions and twenty multimodal functions [27]). They chose to improve G-CMA-ES, which they considered a hard-to-improve evolutionary algorithm, cycling through parent selection-recombination-mutation-survivor selection-evaluation over a population of G-CMA-ES parameter vectors. They were indeed successful in improving the algorithm’s performance.

Our aim is to go further, casting our net much wider in terms of problem domains, seeking to better understand parameter space.