WebRTC

Table of Contents

Introduction

You may be wondering how web applications like Google Meet enables video chat functionality, third-party APIs? Turns out that modern browsers has a native set of APIs that enables peer-to-peer communications, called WebRTC. You may be curious about what WebRTC is and its application in integrating video chat into web applications. Allow me to guide you through its basics and uses.

WebRTC equips both browsers and mobile apps with the capability of real-time communication (RTC) through straightforward APIs. A standout characteristic of WebRTC is its peer-to-peer architecture, which enables data transmission directly between users, bypassing the need for a server. This approach significantly boosts both speed and efficiency.

Signaling Server

In order to establish a peer-to-peer connection, you will first need a central server so that the peers could communicate first. The signalling server essentially acts as a proxy, user send activities to it and it will broadcast to the rest of the peers as signals. This is usually achieved with WebSocket. For example, when a user joins a room, the user makes a request to the server and then is broadcasted with the peers, so the peers within the same channels are notified that the user has joined.

Example signal

{
  type: "EXCHANGE",
  from: userId,
  to: otherUserId,
  sdp: JSON.stringify(pc.localDescription),
}

ICE candidates

You might be wondering now why do we need a signaling server? The peer-to-peer connection can only be established after the peers have exchanged their ICE candidates, and that’s why we need a signaling server. ICE (Interactive Connectivity Establishment) is an essential part of WebRTC that enables real-time communication even in the face of complex network scenarios. An ICE candidate is a potential method that two peers on a WebRTC session can use to communicate. Each candidate is a combination of a transport address (a combination of IP address and a port number), a transport protocol (usually UDP or TCP), and a type that reflects the candidate’s network topology.

ICE candidates are collected for each type of media in the connection (be it audio, video, or data). The process of gathering these candidates is known as “ICE gathering.”

There are three primary types of ICE candidates:

Once a peer gathers an ICE candidate, it sends a message over the signaling channel to the other peer. This message includes the candidate’s transport address, protocol, and type. The receiving peer adds the candidate to its RTCPeerConnection by calling the addIceCandidate() method. The process continues until all candidates have been gathered and sent to the other peer.

Ex: Broadcasting your ICE candidates

peerConnection.addEventListener('icecandidate', event => {
    if (event.candidate) {
      this.signalingChannel.send({
        type: EXCHANGE,
        from: userId,
        to: otherUserId,
        sdp: JSON.stringify(event.candidate),
      })
    }
});

webrtc-overview

A diagram on how ICE candidates are retrived and exchanged.

STUN and TURN servers

STUN and TURN servers play a crucial role in ICE:

Why you need a TURN server? Sometimes, you will need a server in between to bypass restrictions such as firewalls. This is common if a user is on Cellular as opposed to Wi-Fi where Cellular has more restrictions. Thus the connection has to be established through a TURN server as it usually runs on port 80 or 443 to bypass the restrictions. There are several solutions on the internet, but they all come with a cost. There is a way around this is to host your own TURN server with coturn.

turn-server-21d0a088ff33ba667aa62b5101226d57-2 Image from https://www.metered.ca/tools/openrelay/assets/images/turn-server-21d0a088ff33ba667aa62b5101226d57.png

Signalling and Establishing Connection

Establishing a WebRTC connection involves several steps and operations. Here’s a more detailed explanation of each step.

Streaming

After the connection is established and say you receive the stream from the other user, this is where the on ontrack event handler comes in place. With the event handler, you can now start streaming audio and video between the peers.

const remoteVideo = document.querySelector('#remoteVideo');

peerConnection.addEventListener('track', async (event) => {
    const [remoteStream] = event.streams;
    remoteVideo.srcObject = remoteStream;
});

Example from https://webrtc.org/getting-started/remote-streams

Other Architectures

The peer-to-peer (P2P) model in WebRTC, while advantageous for small-scale interactions, faces significant scalability challenges as the number of participants in a group call increases. This exponential growth in connections can quickly become impractical and resource-intensive. For instance, in a 100-person video conference, the necessity to establish 4950 unique connections (calculated as n(n-1)/2 for n participants) demonstrates the limitation of the P2P model for large groups. This is where alternative architectures like SFU (Selective Forwarding Unit) and MCU (Multipoint Control Unit) come into play, offering more scalable solutions for WebRTC applications.

Additional Resources