NVIDIA ConnectX
Sometimes you just want to go fast. We have been discussing 400Gbps networking recently in the context of it being a new capability that PCIe Gen5 x16 slots can handle. Today, we are going to take a look at setting that up using NDR 400Gbps Infiniband/ 400GbE.
A special thanks to PNY. We did not know this a year ago, but PNY not only sells NVIDIA workstation GPUs but also its networking components. We were working on a 400GbE switch, and in discussions, it came up that we should review these cards as part of that process. That may sound easy enough, but it is a big jump from 100GbE networking to 400GbE and the MCX75310AAS-NEAT cards are hot commodities right now because of how many folks are looking to deploy high-end networking gear.
The ConnectX-7 (MCX75310AAS-NEAT) is a PCIe Gen5 x16 low-profile cards. We took photos with the full-height bracket but there is a low-profile bracket in the box.
Something that should make folks take notice is the size of the cooling solution. Just to give some sense of how early we are in this, we looked up the power specs on the ConnectX-7 and could not find them. We asked NVIDIA through official channels for the specs. We are publishing this piece without them since it seems as though NVIDIA is unsure of what it is at the moment. It is a bit strange that NVIDIA does not just publish power specs on these cards in its data sheet.
Here is the back of the card with a fun heatsink backplate.
Here is a side view of the card looking from the PCIe Gen5 x16 connector.
Here is another view looking from the top of the card.
Here is a view looking from the direction airflow is expected to travel in most servers.
For some quick perspective here, this is a low-profile single-port card running at 400Gbps speeds. That is an immense amount of bandwidth.
With a card like this, one of the most important aspects is getting it installed in a system that can utilize the speed.
Luckily we installed these in our Supermicro SYS-111C-NR 1U and Supermicro SYS-221H-TNR 2U servers, and they worked without issue.
The SYS-111C-NR made us appreciate single-socket nodes since we did not have to avoid socket-to-socket when we set up the system. At 10/40Gbps speeds and even 25/50Gbps speeds, we hear folks discuss traversing socket-to-socket links as performance challenges. With 100GbE, it became more acute and very common to have one network adapter per CPU to avoid traversal. With 400GbE speeds, the impact is significantly worse. Using dual-socket servers with a single 400GbE card it might be worth looking into the multi-host adapters that can connect directly to each CPU.
Once the cards were installed, we had the next challenge. The cards use OSFP cages. Our 400GbE switch uses QSFP-DD.
The two standards are a bit different in terms of their power levels and physical design. One can adapt QSFP-DD to OSFP, but not the other way around. If you have never seen an OSFP optic or DAC, they have their own thermal management solution. QSFP-DD on top uses heatsinks on the QSFP-DD cages. OSFP often includes the cooling solution which we have on our lab's OSFP DACs and optics.
That brought us to a few days of panic. The $500 Amphenol OSFP DACs as well as the OSFP to QSFP-DD DACs on hand utilized the heatsink cooling solution. We sent everything off to the lab to get hooked up only to get a note back that the OSFP ends of the DACs did not fit into the OSFP ports of the ConnectX-7 cards because of the direct cooling on the DACs.
The reason NVIDIA is using OSFP is likely because of the higher power level. OSFP allows for 15W optics while QSFP-DD is 12W. Early in adoption cycles, having higher power ceilings allows for easier early adoption which is one of the reasons there are things like 24W CFP8 modules. On the other hand, we already have looked at FS 400Gbase-SR8 400GbE QSFP-DD optics so that market is moving.
A few calls later, we had cables that would work. Our key takeaway whether you are using ConnectX-7 OSFP adapters today, or if you are reading this article 5 years from now when they become inexpensive second-hand gear, is to mind the heatsink size on the OSFP end you plug into the ConnectX-7. If you are used to QSFP/ QSFP-DD where everything plugs in and works, there is a bigger challenge with running into silly issues like connector sizes. On the other hand, if you are a solution provider, this is an opportunity for professional services support. NVIDIA and resellers like PNY also sell LinkX cables which would have been an easier route. That is a great lesson learned.
Also, thank you to the anonymous STH reader who helped us out with getting the cables/ optics for a few days on loan. They wished to remain anonymous since they were not supposed to loan the 400G cables/ optics they had.
Next, let us get this all setup and working.