Recently, the World Phonotactics Database was launched. While I've noticed some errors in languages I'm familiar with, it may be useful to check there.

First off, is there a trend across languages in terms of syllable structure? To start with, let's look at how many consonants can be in an onset. Their sample contains 2,338 languages. Of these, 2 do not allow onsets, 1,363 allow for at most 1 consonant in their onset position, 727 allow for at most 2 onset consonants, 229 allow for at most 3 consonants, 15 allow for at most 4 consonants, and 2 allow for at most 6 consonants (note that they have no languages in their sample with at most 5 onset consonants).

Let's also look at how many consonants can be in a coda. They again sample from 2,338 languages. Of these, 485 languages do not allow for coda consonants, 1,373 allow for at most 1 coda consonant, 358 allow for at most 2 coda consonants, 107 allow for at most 3 coda consonants, 12 allow for at most 4 coda consonants, and 3 allow for at most 5 coda consonants.

With this in mind, I think it's likely that what we're seeing in the rest of the world--where "familiar" languages allow for complex onsets and codas--is actually a result confirmation bias. Indo-European languages, for instance, with the exception of highly divergent languages like English or Russian, really only allow for at most 2 or 3 onset consonants (2s include Armenian, Spanish, and Punjabi; and 3s include Albanian, Irish, and Lithuanian).

In most of the world's languages, a CV structure is preferred. This is, as far as I'm aware, a very old typological generalization. The Universals Archive cites Jakobson and Hale (1956) as the first statement of this structure being a linguistic universal.