Body:

No Text

Maynard Handley (name99.delete@this.name99.org) on June 25, 2014 6:48 pm wrote: > Suppose you were designing a system incorporating CPUs+GPUs from scratch, without 20+ years of history. > How you would do this? I ask to see the gap between where we are today and where we should be. What I > have in mind is more immediately applicable to mobile, but desktop will be there in a few years. > > So obviously you'd have the CPU and GPU on the same SoC, serviced by the same high-speed > interconnect. However (if you were starting from scratch) here's a list of things > I expect you'd want to do. I'd be curious to see comments on them: > > - there'd be a single system-wide logical-physical memory map (for CPUs, GPU, and all IO). > Each device (like a GPU) could cache its mapping locally, but when those need to change, it > would consult the single locus of truth for new mappings. I think we're already here. > > - GPUs would be treated as a differently ISA'd CPU on the bus, just like the primary CPU. The OS would > use a process model to control it. (That is, the work running on the GPU at any one time, along with its > globs of memory) would be owned by a process. This carries a bunch of implications. It implies that our > MMU/TLB in the GPU is no longer just holding mappings; it's also holding permissions and ASIDs. This means > that accesses to the TLB can fault --- which means we need a way to handle that. Most likely it's easier > to pause the GPU and "reflect" the fault to a primary CPU than to force the GPU to handle the fault --- > but we probably still need some minor machinery to retry the faulting instruction+warp. > > Since we're doing all this anyway, this now means we can allow RAM accessed by the GPU (instructions or > data) to be paged out or compressed. This may not seem to valuable, but I'm thinking to a system beyond > video games, where OpenCL is common and so large, long-running tasks could be occupying the GPU. > > - the GPU could take interrupts (at the very least something like the timer interrupt that drives context switching > on a CPU). My expectation is that one uses thread priorities and the soft real-time technologies already extant > in all OSs of interest to schedule the CPU, and that most of the time this will not require context switching; > but forced context switching is available --- once again for long running OpenCL tasks. > > - finally I've talked about "the GPU" but the usual situation seems to be that "the GPU" consists > of 6 (or 12 or 20) identical units. Each such unit has its TLB, its register set, its HW thread > scheduling. Presumably we'd like our heterogenous-ISA-aware OS to see this as 6 (or 12, or 20) > distinct GPUs, and to have the flexibility, just like on a multicore system, to schedule things > as appropriate across these different GPU cores. (So now, e.g., we might [on the fly, not dedicated] > devote 4 cores to playing a game in a window, one to updating the system UI, and one (plus whatever > slack is available on the other 5) to handling a long-running OpenCL task). > > Does this seem like a reasonable expectation for the future? Or are there well-founded reasons (as opposed to > prejudice and "this is the way we've always done it") for keeping the GPU as a very distinct object on the SoC, > programmed in a batch mode by giving it very coarse-grained long-running commands rather than having it contribute > (in a more or less virtualized way) to the pool of compute available just like CPUs contribute.