JoeRambo said: Only latencies intra-CCX are low ( and at a size of 4 cores that is a very small unit of computing ). Click to expand...

JoeRambo said: Between CCX'es in same chip the latencies are already high 140 versus 100 in Your posted pics) and higher than for Intel on same chip. Click to expand...

JoeRambo said: Between chips on same socket, latencies are very comparable to what Intel has between sockets ( STH tested 4 socket system, we have 2 socket systems that have less latency in our tests). Click to expand...

JoeRambo said: And sub-numa clustering? It is for very specialized, NUMA aware software. Click to expand...

JoeRambo said: We run multiple JVMs, that are forced on NUMA nodes, but there was no benefit to go to sub-NUMA clustering in our testing. Drawbacks of half memory per node, half memory BW (it's more complicated than that, as ideal OS could/would allocate memory on that sibling NUMA on chip first and maybe have nice perf?, but the way things are for us, it's all or nothing*) are huge compared to tiny gains in latency. Click to expand...

JoeRambo said: Your blabbering about "quite noisy latencies", Sub-NUMA clustering is like Your other posts here - You read something somewhere and post it as AMD supposed superiority items, when in fact opposite is true.

There are people on this board who actually run server grade hardware ( including me, on smaller scale and not top end items ). And our findings do not agree with Your bs. Click to expand...

JoeRambo said: *It is all or nothing, cause we use numactl to bind cpus and nodes ( as in numactl --cpunodebind=0 --membind=0 ) and we force JVM to allocate all memory with THP's -XX:+AlwaysPreTouch -XX:+UseTransparentHugePages. We found this workload and setup fits Skylake-EP architecture as glove, as JVMs allocate and work with ton of memory, everything stays local due to large L2 and non-inclusive L3. With previuos architecture we had a lot of problems with response time variability, if there was more than 1 JVM running on CPU and one ( or more) started to garbage collect. Click to expand...

Small for who exactly? A large number of CPUs (thanks intel) still only have 4 core / 8 thread, it's actually a pretty standardized number for clustering, and Intel's biggest selling sub segment of processors in Xeon Silver is a. So, a 4core CCX is a perfect building block from desktop to enteprrise. Taking note of the power of AMD baking this directly into their architecture via a CCX whereby you can achievelower latencies vs Intel's mesh, in comes one of the best feature's of AMD's approach not the worst as you seem to claim for your boutique jVM all or nothing workload.Yes, this is the tradeoff of CCX/MCM. Lower cost of production and scalability have their drawbacks. With AMD's very low inter CCX latencies the performance is established. How you scale this is relates to software design which is the whole principle of NUMA. You can't simply speak from an anecdotal developer's perspective and that of a narrow case of software and make the claim that of what kind of architecture should be pursued. This architecturalis going to cost intel tens if not hundreds of billions of dollars.If it isn't broke, don't fix it. With gate shrink, you can pack more cores on a CCX and this downside becomes muted with time. You can spend more time on software development and mute the impact of this high latency cross CCX transaction. Not to mention, as I have stated multiple times, mesh has variable latency depending on which core talks to another and it approaches AMD's MCM latency when this occurs :While it is still better latency than MCM, the issue is the wide ranging variance. This might not be an issue for your development, but its a nightmare to others.Welcome to reality and future.I'm in the bare metal C/ASM camp. I don't deny heightened latency in the AMD approach but it is also far more consistent and performant for my use cases. There's a segment of benchmarks that Ryzen actually beats Intel. There's a reason for this and its the future IMO.There's nothing blabbering about my post. I posted sub-numa clustering for a reason. Do you deny why it exists? Or are you going to pretend there's no use case? You run JVM on server grade hardware? I work at a much lower level. The findings of people relates to a broad scope of software packages that are more performant on Intel's architecture. I have a different set of considerations and Ryzen is more performant. If mesh weren't such a gaff, Intel wouldn't be reorganizing for MCM architectures. In my prior work, I worked at the hardware level on custom asics and solutions that have throughput and performance requirements that make even the highest end Xeon look like a joke. AMD tapped into this ecosystem and understood its potential to scale a CCX. Intel went off into la-la land and yet again arrived at a rather expensive architecture that hits performance requirements for a segment at a ridiculous cost point and with many cons. The whole eco-system of computing and even HPC is shifting wildly to a new architecture beyond AMD and Intel's influence. Storage is becoming very low latency and is taking a front seat. Software design will evolve with it. Your particular company and others found better performance with Intel's current class of server chips. Congrats ! It is going to cost you a pretty penny to settle in on it which is why Intel stays in business and server sales will be a hard thing for AMD to crack into. That's a completely tangential discussion than the technicals I highlighted. Everything isn't glory with Intel's mesh architecture. It is an expensive monolithic design with a large number of short comings beyond performance.. and in some cases it underperforms AMD's design. So there is 0% blabbering in my post. Your use case isn't everyone's and I am likely far more seasoned on both the hardware/low-level software than you are which is why I cut right to the point and beyond the bullshit. Not everything is aboutperformance use case which is why far bigger companies are investing in EPYC platforms.Xeon silver makes a lot of sales especially Silver 4112. What's the core count? 4 cores...What's the core count in a CCX?You thought I was someone w/o knowledge or industry experience. You're wrong. I dont need to drill into super technical details because I'm seasoned enough to pick off the relevant ones and talk about them at a high level. I re-state... Intel's Mesh architecture is an absolutefrom an over-all approach and baseline for chip design.I have no doubt that some boutique software groups are comparing every nano-second of performance they can squeeze out of a particular chip and have no concern about how much it will cost them to invest in such a platform. I'm speaking about the broader and evolving market and much more serious compute loads that AMD's architecture actually beats Intel's.You tried to insult someone whose in a completely different league and you're factually wrong in a number of ways.