Simulations

A simulation run comprised of a learning period with a million trials (training phase) where clusters updated their positions in relation to the agent’s position as it explored the environment. After learning, we quantitatively assessed the regularity of the cluster position arrangements (test phase). We ran 1000 simulation runs for each condition (number of clusters).

Simulation procedure and model specifications

At the beginning of the learning phase of each simulation run, we set the number of clusters, number of learning trials, the environment (square, circle), the learning rate, and the learning update batch size. The number of clusters were set (ranging from 10 to 30) and were initiated at random locations in the environment. The shape of the environment was defined by a set of points that could be visited by the agent. The square environment was 50 by 50, where each point was a location specified by a value on the x- and y-axis. The circular environment was defined by drawing a circle in Matlab with a radius of 50, and selecting the points within the bounds of the circle. The starting position and movement trajectory of the agent was then determined as a random walk over one million trials. The agent started at a random position and steps in the horizontal and vertical axes were computed separately. On each trial, the agent could go up, down, or stay on the vertical axis, and left, right, or stay on the horizontal axis. The step was sampled from [−4, −2, −1, −1, 0, 1, 1, 2, 4], where negative values are steps to the left, positive steps are steps to the right, and zero means stay. Movement on the vertical dimension was determined in the same way, but negative values were upward steps and positive values were downwards steps. If the generated step brought the agent out of the environment, the step was cancelled and a new step was generated as above.

We considered a simple winner-take-all network in which only the cluster at position pos i closest to stimulus x (agent’s location) had a non-zero activation. Bold type is reserved for vectors. The distance between pos i and x is defined as:

$${\mathbf{dist}}_{\mathbf{i}} = \left\| {{\mathbf{pos}}_{\mathbf{i}} - {\mathbf{x}}} \right\|$$ (1)

In the Kohonen learning rule, cluster i updates its position pos i to move toward stimulus x according to:

$$\Delta {\mathbf{pos}}_{\mathbf{i}} = {\eta }_{t} \cdot ({\mathbf{x}} - {\mathbf{pos}}_{\mathbf{i}}),$$ (2)

where η t is the learning rate at time t. In the present simulations, we used batch updating to increase numerical stability in which 200 updates were performed simultaneously. The learning rate η for batch time t followed an annealing schedule:

$${\eta }_{t} = \frac{{{\eta }}_0}{{1 + {\rho } \cdot {t}}},$$ (3)

where η 0 is the initial learning rate set to 0.25 and ρ is the annealing rate set to 0.02.

Assessing regularity of cluster positions

To assess whether cluster positions formed a regular hexagonal structure with learning in a comparable manner way to grid cells found in the medial entorhinal cortex (mEC), we followed the method of Hafting et al.5 and Perez-Escobar et al.35.

In Hafting et al.5, rodents traversed circular and square environments whilst they recorded electrophysiological signals from mEC neurons. They found cells that displayed multiple firing fields and resembled a grid of regularly tessellating triangles spanning the recorded environment. To assess this regularity quantitatively, they computed the spatial autocorrelogram of the firing rate map. If the fields were arranged in a regular grid, the center peak of the autocorrelogram should be surrounded by six equidistance peaks, forming a regular hexagon. The spatial autocorrelogram was computed as follows. With \(\lambda _1\left( {x,y} \right)\) denoting the cluster activation at location \(\left( {x,y} \right)\), the autocorrelation with spatial lags of τ x and τ y was estimated as:

$$r( {\tau _x,\tau _y}) = \frac{{\begin{array}{*{20}{c}} {n {\sum } \lambda _{1}( {x,y} )\lambda _{2} ( {x - \tau _x,y - \tau _y})} \\ { - {\sum } \lambda _1( {x,y} ) {\sum } \lambda _2( {x - \tau _x,y - \tau _y} )} \end{array}}}{{\begin{array}{*{20}{c}} {\sqrt {n {\sum } \lambda _{1} ( {x,y})^{2} - ({\sum }\lambda_{1}( {x,y}))^{2}} } \\ { \times \sqrt {n {\sum } \lambda _{2} ( {x - \tau _x,y - \tau _y} )^{2} - ( { {\sum } \lambda _{2}( {x - \tau _x,y - \tau _y})} )^{2}} } \end{array}}},$$ (4)

where \(r( {\tau _x,\;\tau _y} )\) is the autocorrelation between bins offset of τ x and τ y , \({\uplambda}_1\left( {x,y} \right)\) and \({\uplambda}_2\left( {x,y} \right)\) are equivalent for an autocorrelation indicates the average firing rate of the cell in each location \(\left( {x,y} \right)\), and n is the number of spatial bins over which the estimation was made.

To quantify the degree of this regularity, a ‘grid score’ is commonly used35 by computing the correlation between the center region of the spatial autocorrelogram (a masked region including the six surrounding peaks but excluding the centre peak) and a 60° and 120° rotated version (to assess the six-fold hexagonal symmetry) minus the correlation between the spatial autocorrelograms and a 30°, 90°, and 150° rotated version (where there should be a low correlation):

$$\frac{{({r}_{60^\circ } + {r}_{120^\circ })}}{2} - \frac{{({r}_{30^\circ } + {r}_{90^\circ } + {r}_{150^\circ })}}{3}$$ (5)

To assess the regularity of the cluster positions in a given environment in the current study and compare our results with empirical findings, we followed the method described above. We first computed activation maps to emulate firing rate maps in empirical neuronal recordings, and computed the spatial autocorrelogram to obtain the grid score.

Assessing change in gridness during and after learning

To characterize how cluster positions changed over time in the learning phase, activation maps were computed over trials during learning in a set of 200 simulation runs. Trials were binned into 20 equally spaced time bins with 50,000 trials in each time bin. We assumed that the activation strength of the winning cluster was a Gaussian function of distance from the agent:

$${act}_{i} = \frac{1}{{\sqrt {{2{\pi }}^2}} }{e}^{ - \frac{1}{2}{dist}_{i}^2},$$ (6)

where act i is cluster i’s activation strength. To compute activation maps for each time bin, activations were computed at each location and normalized by the number of visits by the agent (as done in empirical studies) to create a normalized activation map. The maps were smoothed (Gaussian kernel, SD = 1), spatial autocorrelograms were computed, and grid scores were computed for each time bin. As the clusters moved continuously over time (not defined by the time bins), activation maps changed over each time bin.

To test whether gridness increased over time, we used a linear model to estimate the slope (beta value) of the grid score of activation maps over each time bin (20 bins) for each simulation run during the learning phase. For each condition (number of clusters), we estimated the slope for 200 simulation runs, giving 200 beta values. We computed the mean and bootstrapped 95% confidence intervals (CIs) over all conditions and simulation runs to test if the grid score increased over time. We also computed the mean and bootstrapped 95% CIs over the 200 beta values for each condition.

To assess gridness at the end of learning, a new movement trajectory was generated with 100,000 trials and cluster positions were fixed. Grid score after learning was assessed for all 1000 simulation runs. The activations and normalized activation map were computed over all test trials, the activation map was smoothed (Gaussian kernel, SD = 1) and the spatial autocorrelogram of the activation map was computed following Hafting et al.5, except firing rates were replaced with normalized cluster activation values at each location. Grid scores were then computed based on the spatial autocorrelograms using Eq. (5). We computed the mean grid scores and bootstrap 95% CIs over all conditions and simulation runs. We also computed the mean and bootstrap 95% CIs over each condition.

Classification and percentage of grid cell-like maps

To assess whether activation maps showed a regular hexagonal pattern that would be classified as a ‘grid cell’ according to criteria set in empirical studies, and to compare the percentage of grid-like activation maps from our clustering model to the percentage of grid cells found in the mEC, we used a shuffling procedure to find the statistical threshold of the grid score that passes the criterion for a ‘grid cell’ described in Wills et al.52.

The procedure was performed on spatial autocorrelograms of the activation maps produced on the test phase, where cluster positions were fixed. Since cluster activations were generated in relation to the agent’s location during movement, they were temporally correlated. Therefore, to break the location-activation relationship, the vector of activations were randomly shuffled in time, and we ensured that each location was at least 20 trials from its original position. The activation map was smoothed (Gaussian kernel, SD = 1) then the grid score was computed. For each condition, this shuffling procedure was performed 500 times on each simulation run (on a subset of 200 simulations). The threshold was defined as the 95th percentile of the 500 shuffled grid scores, giving 200 threshold values (from each simulation run) per condition (number of clusters). The highest threshold value (most conservative) was used as the threshold for each condition. In the figure in the main text (Fig. 3d, e), the thresholds plotted are the highest (most conservative) thresholds across all conditions in that particular environment.

For each condition, we computed the percentage of activation maps that exceeded the shuffled grid score threshold. We computed the percentage of ‘grid cells’ for each condition (number of clusters) separately and then computed the mean percentage across conditions.

Gridness in trapezoid environments

To simulate the effect of asymmetric boundaries in a trapezoid enclosure on gridness36, we took cluster positions from simulations after learning in square environments, and ran an additional learning phase for 250,000 trials. In this new learning phase, the shape of the environment was now a trapezoid (the agent could only move to those locations), and the annealed learning rate schedule continued (starting at 0.0025, reducing to 0.002 at the end). The trapezoid dimensions were 5 × 24 × 50 pixels, closely matching the proportions in36 (0.2 × 0.9 × 1.9 meters; multiplied by (50/1.9) equals to 5.26, 23.7, and 50).

In order to test whether the asymmetric boundaries of the trapezoid affected gridness, the trapezoid was split into two halves and we computed the grid score for the spatial autocorrelogram on the left (wide) and right (narrow) side of the shape. Due to discretization, we split it as close to equal as possible. The wide half extended from the leftmost pixels to the 17th pixel (338 pixels), and the narrow side extended from the 18th pixel to the 50th pixel (339 pixels).

Due to the asymmetrical shape of the trapezoid environment, the procedure for generating a movement trajectory above leads to a slightly biased sampling of the wide part of the trapezoid, and less exploration of the middle and top parts of the shape. To deal with this, we made a slight change to the possible steps after generating a step that brings the agent out of the environment, described below. For each trial, the step was generated as before. If the generated step was out of the environment, the step was cancelled, and the next step was determined as follows. If the step generated would have brought the agent out of the bottom of the trapezoid, the next step was sampled from [0, 0, 1, 1] (stay or up). If the step brings the agent out to the top, the next step was sampled from [−1, −1, 0, 0] (down or stay). When the step takes the agent out of the left of the trapezoid, then the next step to be sampled on the horizontal axis were [0, 1, 1, 2, 4], towards the inner portion of the environment. If the step took the agent out of the right side of the trapezoid, the next step was generated as before, from [−4, −2, −1, −1, 0, 1, 1, 2, 4]. This is because when the agent is out of the trapezoid on the horizontal (left-right) axis, the agent could still be in the middle of the shape on the vertical axis, since the shape becomes more narrow as it reaches the right. Finally, when it lands exactly in the middle of the horizontal axis, but is out of the shape (on the horizontal axis), the next step to be sampled from on the vertical axis is [−1, 0, 1].

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.