You are here

Infiniband Fabric

 

Infiniband Fabric

 

The different clusters have different optimizations of the Infiniband network fabric.  Both have what is referred to as a "fat tree" topology.  Individual nodes are connected to "leaf switches", and the leaf switches are connected to "core" switches.   They differ in the amount of bandwidth between leaf switches.

An ideal fat tree has no over-subscription, in other words full bi-sectional bandwidth: in any arbitrary division of the cluster into pairs of nodes, all pairs can communicate with each other at full bandwidth.  This means that every leaf switch must have as much bandwidth into the core as is needed for all nodes connected to it.  For current generation nodes which are able to completely fill one link, this means that for a 36 port switch (as we have), only 18 nodes can be connected, and the other 18 ports connect into the core.  If you connect 24 nodes and only have 12 up-links to the core, we refer to that as 2:1 over-subscribed.

The 12s cluster is not over-subscribed.  There are 276 nodes attached to 16 leaf switches (16 to 18 nodes per switch).  Each leaf switch has 16-18 uplinks, divided into pairs of uplinks going to 9 small core switches (identical hardware as the leaf switches, so 36 ports).  Each core switch could therefore accept connections from up to 18 leaf switches.  From one node to a node on another leaf switch therefore takes 5 hops: node=>leaf=>spine=>leaf=>node, and the switching fabric is referred to as a 3-layer fabric.  The 3-layer fabric with no over-subscription requires 3 switch ports and 2 cables per host node. "Pruning" the fat tree is often done for cost reasons, to reduce the number of switch ports and cables required.

All up-links are in pairs to allow fault-tolerance against a single switch port or cable failure.