The BFL single produces high stale rates on P2Pool because of the way its firmware is designed. The BFL does 2^32 hashes and then reports any shares found. This takes it about 5 seconds.

With solo mining or typical pools, the work unit you get is valid for several minutes or until a new block is found (on average, one every ten minutes). So the average delay of 2-1/2 seconds between when you find a share and when it's reported has no effect. But P2Pool operates with an effectively much higher block rate because you're building blocks for the P2Pool chain.

Essentially, it comes down to which races you're trying to win and whether an average 2-1/2 second handicap matters.

Solo mining: You're trying to find a block before someone else does. That happens on average every 10 minutes. So your 2-1/2 second handicap costs you about .4%.

Conventional mining pool: You're trying to find a share before the block is invalid. You can find multiple shares inside the same work unit. Work units typically last at least 4 minutes. Worst case, your 2-1/2 second handicap costs you about 1% (usually less, especially if the pool has a forgiving stale policy).

P2Pool: You're trying to mine a P2Pool block before anyone else finds a share. Multiple shares in a work unit are useless as only one block can come next on the P2Pool chain. The average time to find a new P2Pool block is 10 seconds. Your 2-1/2 second handicap (plus loss of additional shares per block) costs you about 25%.

Unfortunately, all these penalties should be doubled because the same penalty hits you when a new block is first found. All your units spend, on average, an extra 2-1/2 seconds finishing their work on the old block. (Effectively, the entire 5 seconds spent on a work unit while a new block is found is wasted. Any shares found before the five seconds are over are wasted since it's too late to use them, and any shares found after the five seconds are over are wasted since they were based on the wrong block.)

The fix would be a simple firmware change to make the FPGA report a share as soon as one is found. That would solve half the problem. The next change would be to allow the FPGA to start working on a new work unit when a new one is available without having to finish the previous. That would solve the other half of the problem.