The reason is that there is overhead on call_user_func_array . It has the overhead of an additional function call. Typically this is in the range of microseconds, but it can become important in two cases:

Recursive Function Calls Since it's adding another call to the stack, it will double the amount of stack usage. So you can run into issues (with xdebug, or memory constraints) which will cause your application to crash if you run out of stack. In applications (or parts), using this style approach can reduce your stack usage by as much as 33% (which can be the difference between an application running and crashing) Performance If you're calling the function a lot, then those microseconds can add up significantly. Since this is in a framework (It looks like something done by Lithium), it will likely be called tens, hundreds or even thousands of times in the lifetime of the application. So, even though each individual call is a micro-optimization, the effect adds up significantly.

So yes, you can remove the switch and replace it with call_user_func_array and it will be 100% the same with respect to functionality. But you'll loose the two optimization benefits mentioned above.

EDIT And to prove the performance difference:

I decided to do a benchmark myself. Here's a link to the exact source that I used:

http://codepad.viper-7.com/s32CSb (also included at the bottom of this answer for reference)

Now, I tested it on a Linux system, a windows system and codepad's site (2 command line, and 1 online, and 1 with XDebug enabled) All running 5.3.6 or 5.3.8

Conclusion

Since the results are rather long, I'll summarize first.

If you're calling this a lot, it's not a micro-optimization to do this. Sure, an individual call is insignificant difference. But if it's going to be used a lot, it can save quite a bit of time.

Now, it's worth noting that all except one of these tests are run with XDebug off. This is extremely important, as xdebug appears to significantly alter the results of the benchmark.

Here are the raw results:

Linux

With 0 Arguments: test1 in 0.0898239612579 Seconds test2 in 0.0540208816528 Seconds testObj1 in 0.118539094925 Seconds testObj2 in 0.0492739677429 Seconds With 1 Arguments: test1 in 0.0997269153595 Seconds test2 in 0.053689956665 Seconds testObj1 in 0.137704849243 Seconds testObj2 in 0.0436580181122 Seconds With 2 Arguments: test1 in 0.0883569717407 Seconds test2 in 0.0551269054413 Seconds testObj1 in 0.115921974182 Seconds testObj2 in 0.0550417900085 Seconds With 3 Arguments: test1 in 0.0809321403503 Seconds test2 in 0.0630970001221 Seconds testObj1 in 0.124716043472 Seconds testObj2 in 0.0640230178833 Seconds With 4 Arguments: test1 in 0.0859131813049 Seconds test2 in 0.0723040103912 Seconds testObj1 in 0.137611865997 Seconds testObj2 in 0.0707349777222 Seconds With 5 Arguments: test1 in 0.109707832336 Seconds test2 in 0.122457027435 Seconds testObj1 in 0.201376914978 Seconds testObj2 in 0.217674016953 Seconds

(I actually ran it about a dozen times, and the results are consistent). So, you can clearly see that on that system, it's significantly faster to use the switch for functions with 3 or less arguments. For 4 arguments, it's close enough to qualify as a micro-optimization. For 5 it's slower (due to the overhead of the switch statement).

Now, objects are another story. For objects, it's significantly faster to use the switch statement even with 4 arguments. And the 5 argument is slightly slower.

Windows

With 0 Arguments: test1 in 0.078088998794556 Seconds test2 in 0.040416955947876 Seconds testObj1 in 0.092448949813843 Seconds testObj2 in 0.044382095336914 Seconds With 1 Arguments: test1 in 0.084033012390137 Seconds test2 in 0.049020051956177 Seconds testObj1 in 0.098193168640137 Seconds testObj2 in 0.055608987808228 Seconds With 2 Arguments: test1 in 0.092596054077148 Seconds test2 in 0.059282064437866 Seconds testObj1 in 0.10753011703491 Seconds testObj2 in 0.06486701965332 Seconds With 3 Arguments: test1 in 0.10003399848938 Seconds test2 in 0.073707103729248 Seconds testObj1 in 0.11481595039368 Seconds testObj2 in 0.072822093963623 Seconds With 4 Arguments: test1 in 0.10518193244934 Seconds test2 in 0.076627969741821 Seconds testObj1 in 0.1221661567688 Seconds testObj2 in 0.080114841461182 Seconds With 5 Arguments: test1 in 0.11016392707825 Seconds test2 in 0.14898705482483 Seconds testObj1 in 0.13080286979675 Seconds testObj2 in 0.15970706939697 Seconds

Again, just as with Linux, it's faster for every case except 5 arguments (which is expected). So nothing out of the normal here.

Codepad

With 0 Arguments: test1 in 0.094165086746216 Seconds test2 in 0.046183824539185 Seconds testObj1 in 0.088129043579102 Seconds testObj2 in 0.046132802963257 Seconds With 1 Arguments: test1 in 0.093621969223022 Seconds test2 in 0.054486036300659 Seconds testObj1 in 0.11912703514099 Seconds testObj2 in 0.053775072097778 Seconds With 2 Arguments: test1 in 0.099776029586792 Seconds test2 in 0.072152853012085 Seconds testObj1 in 0.10576200485229 Seconds testObj2 in 0.065294027328491 Seconds With 3 Arguments: test1 in 0.11053204536438 Seconds test2 in 0.088426113128662 Seconds testObj1 in 0.11045718193054 Seconds testObj2 in 0.073081970214844 Seconds With 4 Arguments: test1 in 0.11662006378174 Seconds test2 in 0.085783958435059 Seconds testObj1 in 0.11683893203735 Seconds testObj2 in 0.081549882888794 Seconds With 5 Arguments: test1 in 0.12763905525208 Seconds test2 in 0.15642619132996 Seconds testObj1 in 0.12538290023804 Seconds testObj2 in 0.16010403633118 Seconds

This shows the same picture as with Linux. With 4 arguments or less, it's significantly faster to run it through the switch. With 5 arguments, it is significantly slower with the switch.

Windows With XDebug

With 0 Arguments: test1 in 0.31674790382385 Seconds test2 in 0.31161189079285 Seconds testObj1 in 0.40747404098511 Seconds testObj2 in 0.32526516914368 Seconds With 1 Arguments: test1 in 0.32827591896057 Seconds test2 in 0.33025598526001 Seconds testObj1 in 0.38013815879822 Seconds testObj2 in 0.3494348526001 Seconds With 2 Arguments: test1 in 0.33168315887451 Seconds test2 in 0.35207295417786 Seconds testObj1 in 0.37523794174194 Seconds testObj2 in 0.38242697715759 Seconds With 3 Arguments: test1 in 0.33901619911194 Seconds test2 in 0.36867690086365 Seconds testObj1 in 0.41470503807068 Seconds testObj2 in 0.3860080242157 Seconds With 4 Arguments: test1 in 0.35170817375183 Seconds test2 in 0.39288783073425 Seconds testObj1 in 0.39424705505371 Seconds testObj2 in 0.39747595787048 Seconds With 5 Arguments: test1 in 0.37077689170837 Seconds test2 in 0.59246301651001 Seconds testObj1 in 0.41220307350159 Seconds testObj2 in 0.60260510444641 Seconds

Now this tells a different story. In this case with XDebug enabled (but no coverage analysis, just the extension turned on), it's almost always slower to use the switch optimization. This is curious since many benchmarks are run on dev boxes with xdebug enabled. Yet production boxes usually don't run with xdebug. So it's a pure lesson in executing benchmarks in proper environments.

Source