How To Use Web Sockets (Socket IO) With Digital Ocean Load Balancers And Kubernetes With Ingress Nginx

02/13/2020
Share

Web sockets are awesome, although it’s not exactly new technology. You would assume that there is a wealth of information out there on the internet covering every possible use case you can imagine however recently I came across a problem that I had assumed would have a heavily documented solution online, but as it turns out, I found very little.

The problem I was trying to solve was running a multi server, web socket application (using Socket IO), within Kubernetes on Digital Oceans hosted K8S solution with a Digital Ocean load balancer attached to an Nginx Ingress controller. (That’s ingress-nginx, not nginx’s ingress controller)

This should be fine.. right? Well, there are some tricky gotchas here as I soon discovered that caught me out. I wasted hours tweaking configuration, so i’m writing this blog post so you don’t have to do the same.

First, the problem and a shameless plug.
We run a dashboard management software product called VuePilot

VuePilot is multiscreen dashboard rotation management software for your work place

One of the core pieces of functionality VuePilot offers is the ability to remotely manage and control your dashboards and TV screens mounted around your office.

We offer a centralised dashboard that users can use to start, stop, update and configure the dashboard screens in your organisation. Basically allowing you to be lazy and not get out of your seat to update the dashboard screens at the other end of the building.

So, when you click that button to take over a screen and display some new dashboard on it, that happens by way of web sockets. Machines are always online and available for commands from the user at any time.

More info
How To Remotely Manage Office Dashboard Screens
How To Manage Multiple Dashboard Screens From One Machine

Recently we’ve seen a large uptick in users and have decided to break apart the app and move everything into Kubernetes to offer greater control over our scaling.
For example, the service that handles this remote management behaviour lives as its own service now that purely just handles web socket connections.

So, back to the point of the article, rather than offer a “step by step” guide to setting up load balancers and Kubernetes on Digital Ocean (which would be long) I’m just going to run over the sticking points that you will likely hit when you attempt to do it yourself.

I’d like to point out that if anyone feels like correcting me or pointing out another solution on any of these points, please do so, however this is what worked for me.

Assuming you’ve got your cluster up and running and you’ve configured your ingress-nginx controller of type “LoadBalancer” which has created your Digital Ocean load balancer, what should you be aware of?

Use HTTP & HTTPS, Not TCP As Load Balancer Protocols

This one confused me for a while, and I still don’t quite understand why, here’s what I found.

When Kubernetes provisions the load balancer for you, by default the protocol will be set to TCP with the relevant ports (most likely 80 and 443) being routed to the random ports Kubernetes has assigned to the service.

I was unable to get anything working at all with TCP set as the protocol, which is a contrast from how an AWS ELB works whereby it always lists TCP as the protocol and works fine.

Switching to HTTP and HTTPS solved this for me. I suspect playing around with “proxy protocol” may also solve this, but I wasn’t able to get it working for my use case.

Using the HTTPS protocol on the load balancer has the added benefit (if you wish) of offloading TLS / SSL termination at the load balancer level which is not possible when using TCP as the load balancer protocol. You can of course use the SSL Passthrough option if you wish to terminate SSL at the pod level.

You can also do this from the user interface, but you should configure this in your Kubernetes manifests to ensure you can restore this configuration if need be.

Configuring HTTPS

Within your ingress controller service configuration, setting the

service.beta.kubernetes.io/do-loadbalancer-certificate-id

annotation will automatically switch your TCP 443 routing to be HTTPS 443 by supplying the ID of the certificate you want to use (from Digital Oceans certificate manager).

You can create this certificate from the Digital Ocean dashboard under Account > Security. Annoyingly you need to use doctl with the command

doctl compute certificate list

to get the certificate ID as its not visible in the dashboard for some reason.

Configuring HTTP

Within your ingress controller service configuration, setting the

service.beta.kubernetes.io/do-loadbalancer-protocol: http

annotation will switch your TCP 80 routing to be HTTP 80

Here’s an example ingress service config

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: http
    service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
    # Use "doctl compute certificate list" to get this ID
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: “xxxx-xxxxx-xxxxx”

Side note: You do not need to change the TCP settings at the service level for your ingress controller. Mine still looks like

ports:
  - name: http
    containerPort: 80
    protocol: TCP
  - name: https
    containerPort: 443
    protocol: TCP

Long Life Certificates For CloudFlare Users

Digital Ocean offers the ability to generate LetsEncrypt certificates for you, providing you host your apps DNS records with them, which is awesome, unless you are using a service like CloudFlare, in which case it sucks because your DNS will be hosted at CloudFlare.

CloudFlare already provides me with managed certificates and well, I’m not going back to managing them myself so how to get around this?

We only need to secure the transmission between CloudFlare and the DO load balancer, the end user will only ever see the CloudFlare certificate so really, we just need a valid cert for secure transport that could even be self signed. Self signed certs however, just don’t feel right, CloudFlare has another option we can use

It’s not completely managed, but CloudFlares Origin CA certificates allow us to generate a certificate, signed by CloudFlare for our domains, with a 15 year expiry which we can then add to Digital Ocean and assign to our load balancer. If you add another domain you will need to regenerate the certificate to include it but this is a pretty simple task.

More info here
https://blog.cloudflare.com/cloudflare-ca-encryption-origin/

Multi Server “Session ID unknown” Disconnect Errors / Broken Long Polling

Chances are, you’ll want to run more than one pod serving your Socket IO servers, of which you’ll like use something like Redis as a PubSub backend for communication between the pods. When you do, you’re going to hit one ugly problem.

Your server logs will show your clients repeatedly connecting and disconnecting every few seconds and your client console will be blasted with “Session ID unknown” errors.

“What the hell?!” I can hear you say.. Well, that’s the nice version of what I was saying.

This is a result of multiple levels of load balancing at both the DO Load Balancer and the Kubernetes service level balancing you onto different pods.

Socket IO will start by long polling the endpoint, then send a HTTP 101 (Switching Protocols) to “Upgrade” your connection to web sockets. The problem here is that the follow up request doesn’t land on the same pod and so … “Session ID unknown”

There are two ways to solve this

Solution 1: Use Session Affinity

Session affinity essentially means sticky sessions, which basically means, any follow up requests from the same user will be routed to the same pod.

This will allow you to keep using the Long Poll > Upgrade to WS default method for Socket IO.

Just be aware that heavy traffic users making many requests will not be load balanced to other pods, which is fine for web sockets but if you are serving other static content and API requests from the same app, individual users requests will not be spread across your pods which may screw a little with your load balancing strategy.

To this, you simply set the “affinity” annotation at the ingress level. This will set a “route” cookie which contains a hash which nginx remembers that is used to route follow up requests to the same upstream pod.

Here’s an example ingress definition

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "route"
    nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"

Solution 2: Disable Long Polling

You can disable the long polling altogether and go straight to web sockets which prevents this issue whilst not interfering with the natural load balancing. You’ll want to be sure that your users browser will be fine with this before enabling it, as the long poll upgrade is a fairly nice feature of Socket IO and offers a fall back incase of web socket failure.

To do remove long polling and force only web sockets, simply set the transports property in your client to “websocket”

const ioSocket = io('https://ws.myapp.com', {
  transports: [‘websocket’],
});

Excessive Client Reconnects

By default our clients will reconnect every 60 seconds as per the default nginx “proxy-read-timeout” configuration. This is pretty excessive so let’s make this something longer, like an hour (3600 seconds).

Again, we can configure this in the ingress annotation definition

 apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"

Secure Web Sockets (WSS)

With above protocol changes in place and the load balancer terminating TLS on 443 with our new certificate, we can now force upgrades to WSS instead of WS connections which will be encrypted all the way up to the entry point to our cluster. As mentioned above, you can also use SSL Passthrough if you want to terminate at the pod level.

An example of how you can force WSS from your client side code

const ioSocket = io("https://ws.myapp.com", {
  secure: true,
});

The usage of HTTPS in the URL will also tell Socket IO to upgrade to secure transmission.

Redirecting HTTP To HTTPS

Rather than do this at the app level we can do this at the load balancer level by setting the

do-loadbalancer-redirect-http-to-https

annotation to true in our ingress controller service definition.

Example

   
apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
labels:
  app.kubernetes.io/name: ingress-nginx
  app.kubernetes.io/part-of: ingress-nginx
annotations:
  service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: “true”

Enable CORS

Depending on your application you’ll possibly want to enable CORS (Cross Origin Resource Sharing) on your ingress to allow clients to connect from other domains.

Again, this is done at the ingress resource level with annotations, here’s an example CORS configuration that essentially opens the ingress to all origins.

apiVersion: extensions/v1beta1
  kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
annotations:
  nginx.ingress.kubernetes.io/enable-cors: “true”
  nginx.ingress.kubernetes.io/cors-allow-methods: “PUT, GET, POST, OPTIONS”
  nginx.ingress.kubernetes.io/cors-allow-credentials: “true”
  nginx.ingress.kubernetes.io/configuration-snippet: |
  more_set_headers “Access-Control-Allow-Origin: $http_origin”;

The Final Configuration

Most of what’s been mentioned above happens in the ingress controller service and ingress resource definitions.
Here’s what the final configuration files look like

Ingress Resource

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: vuepilot-node
  namespace: vuepilot
  annotations:
    #kubernetes.io/ingress.class: nginx-general
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "route"
    nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/cors-allow-methods: "PUT, GET, POST, OPTIONS"
    nginx.ingress.kubernetes.io/cors-allow-credentials: "true"
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "Access-Control-Allow-Origin: $http_origin";
spec:
  rules:
    - host: ws.vuepilot.com
      http:
        paths:
          - path: /
            backend:
              serviceName: vuepilot-node
              servicePort: 8080

Ingress Controller Service

apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: http
    service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
    # Use "doctl compute certificate list" to get this ID
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: “xxx-xxx-xxx”
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 80
    - name: https
      port: 443
      targetPort: 80
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

Here’s hoping these tips save you some time and hassle 🙂