$\begingroup$

Let $c,d\in [a,b]$, with $c<d$. It suffices prove that $$ f(d)-f(c)>-(4+d-c)\varepsilon, $$ for every $\varepsilon>0$.

We enumerate $A$ as $A=\{\alpha_n\}_{n\in\mathbb N}$ and choose $\delta_n>0$, such that $$ x\in(\alpha_n-\delta_n,\alpha_n+\delta_n)\quad\Longrightarrow\quad|f(x)-f(\alpha_n)| <\frac{\varepsilon}{2^n} $$ for all $n\in\mathbb N$. Finding such $\delta_n$'s is possible due to continuity of $f$. Set $I_n=(\alpha_n-\delta_n,\alpha_n+\delta_n)$. In particular $$ y_1,\,y_2\in I_n\,\,\,\Longrightarrow\,\,\, f(y_2)>f(y_1)-\frac{\varepsilon}{2^{n-1}} \tag{1} $$

Let $x\in [a,b]\setminus A$. Then there exists an $\eta_x>0$, such that $$ y\in(x-\eta_x,x+\eta_x)\quad\Longrightarrow\quad -\varepsilon |y-x|<f(y)-f(x)-(y-x)f'(x)< \varepsilon|y-x|, $$ and hence whenever $y_1,y_2\in J_x=(x-\eta_x,x+\eta_x)$, with $y_1\le x\le y_2$, we have that $$ f(y_2)-f(y_1)-(y_2-y_1)f'(x)\ge -\varepsilon(|y_1-x|+|y_2-x|) $$ and since $f'(x)\ge 0$, we finally obtain that

$$ f(y_2)>f(y_1)-\varepsilon(y_2-y_1). \tag{2} $$

We shall use the following result (for a proof see here):

Cousin's Lemma. Let $\mathcal C$ be a full cover of $[a, b]$, that is, a collection of closed subintervals of $[a, b]$ with the property that for every $x\in[a, b]$, there exists a $\delta>0$, so that $\mathcal C$ contains all subintervals of $[a, b]$ which contains $x$ and have length smaller than $\delta$. Then there exists a partition $\{I_1,\,I_2,\ldots,I_m\}\subset\mathcal C$ of non-overlapping intervals for $[a, b]$, where $I_i=[x_{i-1}, x_i]$ and $a=x_0 < x_1 <\cdots <x_n=b,$ for all $1\le i\le m$.

We define a $\mathcal C$ the collection of all closed subintervals $K$ of $[c,d]$, such that either $K\subset I_n$ and $\alpha_n\in K$, for some $\alpha_n\in A$ or $K\subset J_x$ and $x\in K$ for some $x\in [a,b]\setminus A$. Cousin's Lemma provides the existence of points $c=x_0<x_1<\cdots<x_m=d$, such that the closed intervals $$ K_1=[x_0,x_1],\, K_2=[x_1,x_2],\ldots,K_m=[x_{m-1},x_m] $$ belong to $\mathcal C$.

From the construction of $\mathcal C$, each $K_j$ is either a subinterval of some $I_n$ or some $J_x$, and possibly $K_j$ is a subset of more than one such intervals. To every $K_j$ we assign exactly one such interval. In particular, to every $j\in\{1,\ldots,m\}$ we assign either a unique $n\in\mathbb N$, such that $\alpha_n\in K_j\subset I_n$, which we denote as $n_j$, or a unique $x\in [a,b]\setminus A$, such that $x\in K_j\subset J_x$. This mapping is not necessarily $1-1$, since if $\alpha_n$ is the common endpoint of $K_j$ and $K_{j+1}$, it is possible that $n_j=n_{j+1}$. Thus, some of the $I_n$'s may have been assigned to two $K_j$'s (and no more than two).

We split $S=\{1,\ldots,m\}$ as a union of two disjoint sets. $S_1$ shall be the set of those $j\in S$, to which an $n\in\mathbb N$ has been assigned (i.e., $\alpha_n\in K_j\subset I_n=I_{n_j}$) while $S_2=S\setminus S_1$. If $j\in S_2$, then an $x\in [a,b]\setminus A$ has been assigned to $j$ and $x\in K_j\subset J_x$.

If $j\in S_1$, and $K_j\subset I_{n_j}$ then $(1)$ provides the $f(x_j)-f(x_{j-1})>-\dfrac{\varepsilon}{2^{n_j-1}}$, while if $j\in S_2$, then $(2)$ provides that $ f(x_j)-f(x_{j-1})>-\varepsilon (x_j-x_{j-1})$.

We now have that $$ f(d)-f(c)=\sum_{j=1}^m \big(f(x_j)-f(x_{j-1})\big)= \sum_{j\in S_1} \big(f(x_j)-f(x_{j-1})\big)+\sum_{j\in S_2} \big(f(x_j)-f(x_{j-1})\big) \\ \ge -\sum_{j\in S_1} \frac{\varepsilon}{2^{n_j-1}}-\sum_{j\in S_2}\varepsilon(x_j-x_{j-1}) > -4\varepsilon-\varepsilon(d-c)=-(4+d-c)\varepsilon. $$ The last inequality holds because in the first sum, $\sum_{j\in S_1} \dfrac{1}{2^{n_j-1}}< 2\sum_{n=1}^\infty \dfrac{1}{2^{n-1}}=4$, since the power $\dfrac{1}{2^{n-1}}$ may appear twice, if $\alpha_n$ is an endpoint of two neighboring $K_j$'s.