We tend to not use higher derivative theories. It turns out that there is a very good reason for this, but that reason is rarely discussed in textbooks. We will take, for concreteness, $L\left(q,\dot q, \ddot q\right)$, a Lagrangian which depends on the 2nd derivative in an essential manner. Inessential dependences are terms such as $q\ddot q$ which may be partially integrated to give ${\dot q}^2$. Mathematically, this is expressed through the necessity of being able to invert the expression $$P_2 = \frac{\partial L\left(q,\dot q, \ddot q\right)}{\partial \ddot q},$$ and get a closed form for $\ddot q \left(q, \dot q, P_2 \right)$. Note that usually we also require a similar statement for $\dot q \left(q, p\right)$, and failure in this respect is a sign of having a constrained system, possibly with gauge degrees of freedom.

In any case, the non-degeneracy leads to the Euler-Lagrange equations in the usual manner: $$\frac{\partial L}{\partial q} - \frac{d}{dt}\frac{\partial L}{\partial \dot q} + \frac{d^2}{dt^2}\frac{\partial L}{\partial \ddot q} = 0.$$ This is then fourth order in $t$, and so require four initial conditions, such as $q$, $\dot q$, $\ddot q$, $q^{(3)}$. This is twice as many as usual, and so we can get a new pair of conjugate variables when we move into a Hamiltonian formalism. We follow the steps of Ostrogradski, and choose our canonical variables as $Q_1 = q$, $Q_2 = \dot q$, which leads to \begin{align} P_1 &= \frac{\partial L}{\partial \dot q} - \frac{d}{dt}\frac{\partial L}{\partial \ddot q}, \\ P_2 &= \frac{\partial L}{\partial \ddot q}. \end{align} Note that the non-degeneracy allows $\ddot q$ to be expressed in terms of $Q_1$, $Q_2$ and $P_2$ through the second equation, and the first one is only necessary to define $q^{(3)}$.

We can then proceed in the usual fashion, and find the Hamiltonian through a Legendre transform: \begin{align} H &= \sum_i P_i \dot{Q}_i - L \\ &= P_1 Q_2 + P_2 \ddot{q}\left(Q_1, Q_2, P_2\right) - L\left(Q_1, Q_2,\ddot{q}\right). \end{align} Again, as usual, we can take time derivative of the Hamiltonian to find that it is time independent if the Lagrangian does not depend on time explicitly, and thus can be identified as the energy of the system.

However, we now have a problem: $H$ has only a linear dependence on $P_1$, and so can be arbitrarily negative. In an interacting system this means that we can excite positive energy modes by transferring energy from the negative energy modes, and in doing so we would increase the entropy — there would simply be more particles, and so a need to put them somewhere. Thus such a system could never reach equilibrium, exploding instantly in an orgy of particle creation. This problem is in fact completely general, and applies to even higher derivatives in a similar fashion.