From Linus Torvalds <> Date Fri, 29 Dec 2017 17:00:01 -0800 Subject Re: 4.14.9 doesn't boot (regression) f



On Fri, Dec 29, 2017 at 4:10 PM, Andy Lutomirski <luto@amacapital.net> wrote:

>

> Double faults use IST, so a double fault that double faults will effectively just start over rather than eventually running out of stack and triple faulting.

>

> But check out the registers. We have RSP = ...28fd8 and CR2 = ...27f08.

> IOW the double fault stack is ...28000 - ...28fff and we're somehow getting

> a failed page fault a couple hundred bytes below the bottom of the IST stack.

> IOW, I think we're just stuck in a neverending loop of stack overflows.



Ahh, good catch. This feels like it might finally be explaining things.



> (Also, Josh, the oops code should have printed the contents of the struct pt_regs at the top of the DF stack. Any idea why it didn't?)

>

> Toralf, can you send the complete output of:

>

> objdump -dr arch/x86/kernel/traps.o

>

> From the build tree of a nonworking kernel?



Alexander made one of his failing kernels available earlier:



https://www.dropbox.com/s/yesupqgig3uxf73/linux-4.15-rc5%2B.tar.xz?dl=0



and yes, there's something seriously wrong there. Doing a disassembly

on "do_double_fault()" shows:



ffffffff8101bda0 <do_double_fault>:

ffffffff8101bda0: 41 54 push %r12

ffffffff8101bda2: 55 push %rbp

ffffffff8101bda3: 53 push %rbx

ffffffff8101bda4: 48 81 ec 20 10 00 00 sub $0x1020,%rsp

ffffffff8101bdab: 48 83 0c 24 00 orq $0x0,(%rsp)

ffffffff8101bdb0: 48 81 c4 20 10 00 00 add $0x1020,%rsp



WTF? That's bogus crap, and not ok in the kernel. Doing a stack probe

below the stack by subtracting 4128rom the stack pointer and then

oring it, and then resetting the stack pointer again is just crazy.

And it's definitely not ever going to work for the kernel that has a

limited stack.



So yes, It's a terminally broken compiler from hell. I assume gentoo

has applied some completely broken security patch to their compiler,

turning said compiler into complete garbage.



Doing some trivial grepping on the disassembly in that vmlinux file,

there's tons of those "let's probe more than a page below the stack"

issues. The biggest offset I found was 0x1400.



That one happened to be in do_sys_poll().



> Also, you wouldn't happen to be using Gentoo perchance?



Yes, several people involved are using gentoo. Maybe everybody.



> I already have two reports of a Gentoo system miscompiling the vDSO

> due to Gentoo enabling -fstack-check and GCC generating stack check

> code that is highly suboptimal, actively incorrect, and doesn't even

> manage to check the stack in a particularly helpful way.



Yes. Good. I think you root-caused it.



Good. I was not feeling so happy about this bug report, but now I can

firmly just blame the gentoo compiler for having some shit-for-brains

"feature".



Linus



