R0H1T No, it's not. In fact EMIB doesn't solve Intel's problems wrt inter die high speed interconnect ala IF for AMD. That's why they hired keller, EMIB is a physical implementation but they still have to get something like IF for the bridge (EMIB) to not be a bottleneck in transferring vast amounts of data between the chips. IF is rated at a max of 512GBps & accordingly is the fastest interconnect out there, faster than Nvlink IIRC.

TheGuruStud I figured it would be fancy hardware scheduler/fetch with a big buffer on the front and whatever you call the stuff on the back end with large cache.

FordGT90Concept Exactly. It's going to require something far more complex than Infinity Fabric to do properly; hence, "active interposer" maybe?

RejZoR Actually, the first one was 3dfx with its VSA100 chips. They could basically stack as many as they needed. Main 3dfx's problem was they were so ahead of time all that became their demise. If they were doing the stuff they were doing back then, things would probably work out great. Vega at 7nm is a refresh of last series, but with NAVI, it is predicted that they are using same approach they used for Ryzen. Stacking of smaller GPU cores to work as one. I frankly think that is indeed the future because making huge GPU's we have now has really crappy yields and that's just bad. But smaller cores have very high yields even with new nodes. It's just up to them to solve the intercore communication. But AMD has quite some experience even with that. From Ryzen and also from their older products where they used special "ring bus" designs to circulate necessary processing data between core logics. We'll see. But I hope it'll work out for them, because that might change the face of the gaming (and compute) industry quite dramatically.

MCM in TR/Epic works because the intended market/niche does not care about latency. Their implementation is technically not that much different from multi-socket server. That is the same reason thousands of GPUs in supercomputers are working just fine together. This is fine for CPUs as the work is generalized and granular enough. GPUs are massively parallel when it comes to computation units but there are control functions as well as hardware like TMU/ROP/etc that are performing rendering stages that cannot be easily distributed to different chips over reasonable link (not even wide IF or NVLink). At least not without some paradigm change in how work in GPUs is distrubuted. Current understanding and implementations of MCM GPU would fall into Crossfire/SLI category, the solutions that manufacturers are actively pulling out from.EMIB resolves the problem with interposer size. Patents and implementation details will undoubtedly be an interesting fight we will see soon enough.Infinity Fabric seems to have received more hype than it's worth. It is definitely awesome but it is not the only contemporary scalable interconnect out there.AMD hasAMD has stated the range to be 30 GB/s to 512 GB/s. Implementation details seem to be somewhat different depending on type of interconnect - in Zen's case inside (2x32 bits) or between chips (2x16 wires).Actual implementation: In EPIC/TR/Ryzen each link inside chip has bandwidth 42 GB/s bi-directionally and between chips 38 GB/s bi-directionally. This is with the spec DDR4-2666 memory and 1333 MHz clock.Nvidia hasA channel (link) is 32 wires - 8-bit differential pairs (lanes) and bi-directional. 20 GT/s (or 25 for NVLink 2.0). Assuming NVLink 2.0, that is 25GB/s per link, 50GB/s bi-directional and can be scaled up with more links. NVLink's NVSwitch is 18-port and total bandwidth of 900 GB/s so NVLink by itself does seem to be scalable.Actual implementation: Big Pascal has 4 NVLink 1.0 links - 80/160 GB/s, big Volta has 6 NVLink 2.0 links - 150/300 GB/s.Intel has(and soon UPI):These are Intel's multi-CPU interconnects that have been evolving over time. Historically these tend to favor latency over bandwidth as that has been Intel's need. Currently, dual-socket Xeon systems actually have inter-CPU latencies that are nor far off from inter-chip IF in EPIC CPUs. As an interesting note, QPI is 20-lane link (40 data wires) logically divided into 4 quadrants, 5 lane each. These quadrants can be managed separately.UPI is stated to have 10.4 GT/s and 2-3 UPI links per CPU which brings some management and power improvements but performance-wise is just the same interconnect as QPI running at max 5.2 GHz.Actual Implementation: Initial implementation in Nehalem was running at 3.2 Ghz with resulting bandwidth of 25.6 GB/s. Haswell's implementation is running at 4.8 GHz and bandwidth 38.4 GB/s.UPI will bring the bandwidth per link up to 41.6 GB/s.This is just bandwidth. There are several other important considerations. Latency is one, power consumption is another, die space consumed becomes a problem at some amount of links as well.I think power is the primary consideration here. Powering the links on/through interposer conserves power when compared to links on/though PCB.AMD has declared power efficiency as one of the wins of Infinity Fabric and it does consume a notable amount of power in TR/EPIC. Intel, in the same manner, is trying to reduce the power consumption going from QPI to UPI.The problem is not the link as much as managing traffic on a bunch of high-speed links. Interconnect itself will still be Infinity Fabric but the active part on the interposer will have to be a switch of some type.But this is still a physical(ish) layer problem, software issues are on top of all this.3dfx's approach to multiple chips was SLI (Scan-Line Interleave) that was employed on both multiple chips on Voodoo 4/5 cards as well as linking two separate Voodoo2 cards. Then, each chip simply rendered the next row of pixels. While the specific method has not been used for a long time, this type of work distribution is easily doable by both AMD and Nvidia today with Crossfire and SLI respectively. Unfortunately, GPU functionality as well as engines have become much more complex so these naive methods are too twitchy. Interconnect is not the problem here unless using a brute force method of using an extremely wide and low latency bus which is technically not viable.AMD's GPU chief has now confirmed that Navi will not be an MCM approach and while they would like to (and most definitely do) explore this, it is not in the cards in foreseeable future.