With the increasing use of the internet, more people are storing data over the internet. As a result, the number of data centers is increasing day by day. With the increase of datacenters, the services running in data centers like are data analysis, processing, and storage are also multiplying rapidly. The complexity of data centers’ infrastructure becomes a significant consideration as a result of their expansion.
The number of applications using the cloud is also increasing rapidly. These applications are written to be deployed across tens of thousands of machines, but in doing so, they put a strain on the data center’s networking fabric. Data processing software like MapReduce, BigTable, and Dryad shuffle a significant quantity of data between multiple machines. In contrast, distributed file systems like GFS move vast amounts of data between end systems. It’s critical for optimum flexibility when deploying new apps that any machine can play any role without causing network fabric hotspots.
Nowadays, data centers contain hundreds of thousands of switches and servers, with data being kept on servers. Each node in an extensive data centers network has multiple flows. As a result, structuring massive data centers to make data access simple and reliable is extremely difficult. Due to the distributive nature of data, storing data in different data centers in an anatomic and consistent manner becomes a difficult task.
Due to many users, the significant, unpredictable network traffic might generate congestion and network imbalance in particular portions of the data center network. As a result, throughput and latency suffer due to imbalance and congestion, resulting in the performance of the application performance. It also affects correctness and monetary revenue in some latency-sensitive systems like web-search. Therefore this problem of network load dynamics needs to be addressed.
Different physical network topologies are being used to solve this problem, like FatTree, BClub, and VL2. The FatTree and BCube topologies were recently shown to improve bandwidth utilization and reduce congestion in bottleneck links across big data centers. Because transfer control protocol (TCP) only provides a single connection between two nodes, it cannot use the bandwidth of all available flows to that node simultaneously in DCN.
Multipath Transfer Control Protocol (MPTCP)
A new idea called Multipath Transfer Control Protocol (MPTCP) is used to solve the TCP challenge of utilizing the available bandwidth of several flows at once in DCN. In substitute of TCP, MPTCP was proposed for sending data securely and efficiently. The integrated congestion controller in each MPTCP end system may act on extremely short timescales to transfer its traffic from more congested paths to less congested paths, which is a significant advantage of this technology. According to theory, such behavior can be both reliable and valid for load balancing the entire network.
It’s a drop-in replacement for TCP that, in theory, can improve data center application performance by aggregating network bandwidth across numerous channels, increasing stability and resiliency against network outages, and improving performance in congested environments. MPTCP utilizes multiple flows to transferring the data simultaneously and provide better bandwidth utilization which results in better throughput by using the available bandwidth of flows available to that node.
Components of data center network architectures:
There are four major components of data center network architectures:
- Physical topology
- Congestion control of traffic on the selected paths
- Selection between the paths supplied by routing
- Routing over the topology
Typically, data centers have been built with hierarchical topologies: racks of hosts link to a top-of-rack switch, which connects to aggregation switches, which relate to a core switch. If the majority of traffic flows in and out of the data center, such topologies make sense. In the case of Intra datacenter topology traffic, the bandwidth is distributed unevenly. Recently FatTree and VL2 topologies are used, which uses multiple core switches to provide full bandwidth between all the pairs of host networks. In the case of FatTree, many links are used with low speed, while in VL2, high-speed links are used.
In large data centers, there are multiple paths for each pair of a host. Therefore, selecting a path with a lesser load for transferring data between multiple congested paths is pretty challenging. To deal with this situation, different routing techniques are used for better path selection in DCN. For example, Equal Cost Multiple Path (ECMP) is one of the algorithms used for routing in DCN. This algorithm chooses a path randomly for load balancing. The limitation of ECMP is that it doesn’t provide good results when there is a rise in traffic either with time or on-demand.
While implementing MPTCP, different algorithms are available, like linked increase algorithm (LIA), balanced linked adaptation (BALIA), and opportunistic linked increase algorithm (OLIA). All these algorithms help solve congestion problems and achieve the target. Each source contains a single for transferring data and updating the congestion window in single-path TCP. MPTCP also uses the same bases of a single TCP to transfer data and update its congestion window for each subflow.
Linked Increase Algorithm (LIA):
In the LIA MPTCP algorithm, there is an increase in the congestion flow window on receiving each acknowledgment to avoid the congestion phase. It continues to adjust the window size for each path. Since LIA is increasing the window size to the maximum amount, it results in a decrease in performance. The limitation of LIA might fail in load balancing properly.
Opportunistic Linked Increase Algorithm (OLIA):
OLIA is another algorithm for MPTCP which solves the problem of LIA. It deals with the issue of load balancing but has some factors of unresponsiveness to network changes. OLIA algorithm categorizes three different sets of paths for MPTCP, i.e., the Best path, Max path, and Collected path. For each path, it adjusts the congestion window.
Balanced Linked Adaptation (BALIA):
Balanced linked adaptation (BALIA) is the latest algorithm in MPTCP, which combines both LIA and OLIA and generalizes it.
In Comparison to single-path TCP, MPTCP is a simple approach that can successfully use the dense parallel network topologies recommended for current data centers, considerably enhancing throughput and fairness. MPTCP combines path selection and congestion control, directing traffic to available capacity. This adaptability enables the design of more cost-effective and better network topologies.