Correlation maps obtained from the last two transfers in the MNIST experiment illustrate a typical transfer operation. The target weight W transfer that we attempt to write into the PCM devices is not exactly the overall weight W, but instead W transfer = W − offset − [g(V = 0.5 V) − gshared(V = 0.5 V)]. The final two terms are the residual difference between the conductances of the g and gshared devices even when initialized to the same voltage, which allows the PCM devices to compensate partially for CMOS variability during transfer. The offset, equal to 2 μS, is added because g devices are not equally good at compensating positive and negative conductance errors. At the initialization voltage of 0.5 V, device conductance is relatively small (see Extended Data Fig. 10a), providing less dynamic range to move to smaller conductances and to correct PCM devices programmed to weights that are too positive. The initial 0.5 V was chosen carefully, to accommodate substantial ‘decay’ towards 0.8 V, providing much more dynamic range for increasing 3T1C conductance. A positive offset value strongly favours negative errors, allowing us to exploit the capability for g values to increase. When W transfer is positive but smaller than the offset we reset both PCM devices and use g to correct the residual error. a, Correlation between the weight portion encoded in PCMs before transfer, such as F(G+ − G−), with W transfer . Here we expect a difference because the neural-network training has changed the weights—we now need to checkpoint these weight changes from volatile storage on the 3T1C devices into non-volatile storage on the PCM devices. b, Correlation between the desired W transfer conductance differences and the actual F(G+ − G−) values obtained after PCM programming operation. With perfect devices and no offset, this should be a diagonal line along y = x. The variability we see is caused partly by PCM programming error (unintended), partly by the intentional offset and partly by CMOS initialization mismatch (where we are intentionally aiming for a ‘wrong’ PCM conductance difference to help to compensate for our flawed CMOS devices). c, Correlation between the weights before (W pre ) and after (W post ) transfer, after post-transfer tuning of g to compensate for programming errors in b. The goal of the transfer operation is to obtain W post = W pre , which would correspond to all points falling on the diagonal y = x. The effect of post-transfer tuning is clear by comparing the variability in b to the near-ideal behaviour in c. d–f, As in a–c, but for negative polarity transfer. Because the polarity of g is inverted, the offset is negative, and so the large dynamic range can be used to increase g to compensate for positive errors in PCM weight.