$\begingroup$

If I understand your question correctly, then I think for case of pairwise alignment, there is a simple explanation.

I believe the key insight is that: a mismatch should always score better than a gap.*

This follows biologically since the insertion/deletion (indel) rate is roughly 1/10th that of the substitution rate (i.e. the occurrence of single nucleotide changes), at least in vertebrates. (This varies across the tree of life but I think the substitution rate virtually always exceeds the indel rate.)

To understand why this matters, consider an example:

ATG-AG ATGT-G

This is an 'impossible alignment' under the probabilities you gave since here we have a transition from a gap-residue alignment to a residue-gap.

However, under our assumption that mismatches are more likely biologically than indels, the correct alignment should be:

ATGAG ATGTG

Indeed, the latter does look like a better alignment.

This also follows for more complex examples, so this:

ATG--AAG ATGTT-AG

Becomes this:

ATG-AAG ATGTTAG

(Or this:

ATGA-AG ATGTTAG

)

* Strictly, I mean a substitution should score better than an indel (with the associated gap opening and extension penalties). In fact, for the assumption to always be true, a run of mismatches should still score worse than a single indel. This may not always be a correct assumption, consider this example below, is the true alignment case 1) or 2) or something else? Or is in fact a global alignment bad here and this should be split into 2 local alignments? Is there a likely biological mutational event that could explain this? I ask these questions just to point out it is not black-and-white, I don't have clear answers

1)

CGTACGTAGAGGAATGCCCCCCCCC--------AGCAACGTAGCAT CGTACGTAGAGGAATG---------TTTTTTTTAGCAACGTAGCAT

2)