OpenStack Load Balancers

MetaKube allows you to easily publish your Service behind a load balancer with an external IP address.
For clusters on SysEleven Stack, the load balancers are provided by OpenStack Octavia.

Kubernetes Service of type LoadBalancer

The OpenStack cloud controller manager will automatically create an Octavia load balancer for each Service with spec.type: LoadBalancer.

Minimal Service manifest

A minimal Kubernetes manifest that creates a load balancer:

apiVersion: v1
kind: Service
metadata:
  name: ingress-controller
spec:
  type: LoadBalancer
  selector:
    app: ingress-controller
  ports:
    - port: 80
      targetPort: http

Limitations

  • MetaKube does not support Layer 7 (HTTP) Octavia load balancers

    See Ingress Controllers for more information.

  • MetaKube does not support UDP Octavia load balancers.

    A Service that uses a port with UDP will stay in status Pending, since the cloud controller manager will not provision a load balancer for it.

Ingress Controllers

To route HTTP traffic to different applications in your cluster, create an ingress controller behind your Load Balancer Service.
It is dynamically configured through Ingress objects and can also handle things like TLS termination.
For more information, see the official Kubernetes documentation.

The Create an Ingress Controller tutorial guides you through installing an Nginx Ingress Controller with automatic Let's Encrypt TLS certificate support.

Configuration

The load balancer itself is configured through fields in the Service resource itself and through its annotations.

externalTrafficPolicy

We strongly recommend setting externalTrafficPolicy: Local.

There's two different values for spec.externalTrafficPolicy: Cluster (default) and Local.
The option determines what endpoints connections are load balanced to that arrive at Service's nodePort.

  1. Cluster (default)

    Every packet is forwarded to a Pod from the Service's Endpoints (likely on a different node) always involving DNAT and SNAT.
    So a packet may traverse two nodes until it reaches the Pod.

    The additional port allocation related to SNAT can cause port collisions and thus failures to add entries in the node's conntrack table.
    This typically leads to TLS handshake timeouts, since the connection with the load balancer is established, but the load balancer fails to contact a backend.

  2. Local (recommended)

    The Node will only forward a packet directly to a Pod on the same node.
    If the node doesn't have a Pod of the Service, traffic to the NodePort is dropped.

    Since this also applies to the load balancer's healthcheck, nodes (the load balancer's backend members)
    without Pods of the Service will be excluded of the load balancer's active backend pool and not receive further traffic.

    Since the packet stays on the node, SNAT is not necessary.
    This reduces latency (TCP and TLS alone cause 6 trips) and avoids possible issues with the superfluous port allocation.

    Not specifing healthCheckNodePort, the service controller allocates a port from the cluster's NodePort range. see here. If ports are being blocked from the NodePort range with security groups, will need to configure it to a port which is allowed, otherwise the healthchecks won't succeed.

Caveats with Local

  • The setting doesn't preserve the original Client IPs.

    Destination hosts see the source IP of the load balancer, not the client, which is probably not desirable.
    The original client IP is "lost" at the load balancer, since it uses a different TCP connection with the backend.

    To preserve the original client IP with a load balancer, you may use Proxy Protocol.

  • The setting also applies to internal traffic.

    Kube-proxy in IPVS mode (default and highly recommended) doesn't differentiate between internal or external traffic here.
    To get around this, deploy your proxy as a DaemonSet, so that there's a valid endpoint for the service on each node.

Proxy Protocol

The Proxy Protocol is an industry standard to pass client connection information through load balancers to the destination server.
Activating the proxy protocol allows you to retain the original client IP address and see it in your application.

To use Proxy Protocol, specify the loadbalancer.openstack.org/proxy-protocol: "true" annotation in your Service.

Changing the annotation after the creation of the load balancer does not reconfigure the load balancer!
To migrate a Service of type LoadBalancer to Proxy Protocol, you need to recreate the Service.
To avoid losing the associated Floating IP, see Keeping load balancer floating IP.

Caveat: cluster-internal traffic

This issue does not affect every proxy.
Some proxies (e.g. Traefik) are capable of accepting connections with or without proxy protocol header.

Proxy Protocol adds a header at the beginning of the connection and your reverse proxy (Ingress Controller) will usually expect it.
Connections where the header is not prepended to the payload, will likely lead to a parsing error.

This causes issues in conjunction with hairpin NAT:

Let's say a Pod sends a request to the address of the load balancer, behind which sits a typical reverse proxy with Proxy Protocol enabled.
Kubernetes (kube-proxy) will intercept the packets because it knows their final destination, the reverse proxy Pods and NAT the packets.
This means the request will forego the hop over the load balancer, which would normally add the Proxy Protocol header and thus, omit it.
Your reverse proxy will return an error, because it fails to parse the payload, expecting the Proxy Protocol header.

Typical situations where this problem appears are:

  • cert-manager with http01 challenges
  • Prometheus blackbox exporter
  • CMS server-side rendered previews

A fix for these issues in Kubernetes is underway, but has already been postponed multiple times.

Workarounds
  • Use a proxy that can handle connections, both with and without proxy protocol header, e.g. Traefik.

  • The issue can be circumvented by adding another Ingress Controller behind a load balancer without Proxy Protocol.
    All cluster-internal traffic then needs to be sent to that Service.

  • We can offer a workaround, which inhibits kube-proxy from adding the problematic rules, at your own risk.
    This may again break other applications that rely on the .status.loadBalancerIP field in the Services or Ingresses.

    Please contact us if you're interested.

loadbalancerSourceRanges

OpenStack will add a security group rule for each CIDR you specify in spec.loadbalancerSourceRanges, and thus filter traffic to your Load Balancer.

Caveats

  • With IPVS, the setting also applies to internal traffic.

    Kube-proxy in IPVS mode (default and highly recommended) will also block internal traffic going to a service, when spec.loadbalancerSourceRanges is set.
    You must also allow traffic from Node subnet CIDR and Pod subnet CIDR to circumvent this.

Keeping load balancer floating IP

When a Service of type LoadBalancer is created without further configuration it gets an ephemeral IP address from the IP pool of the external network of the cluster (usually ext-net).
Deleting the Service also releases that floating IP again into the pool, and it becomes available for others.
There are no guarantees that the IP will still be available afterwards.

To retain a floating IP in your OpenStack project even when the Service gets deleted, set the loadbalancer.openstack.org/keep-floatingip: "true" annotation.
To reuse the floating IP afterwards, specify it in the spec.loadBalancerIP field in the Service.
The floating IP must exist in the same region as the LB.

Allow ingress only from specific subnets

Octavia supports the spec.loadBalancerSourceRanges option in the LoadBalancer Service to block traffic from non-matching source IPs.
Specify an array of subnets in CIDR notation.

Known limitations

  • With many rules (~50), it might take a couple of minutes for all to take effect.

  • Cluster-internal traffic also gets blocked

    If an application is sending requests to a LoadBalancer Service within the cluster, with the source IP not matching any subnet ranges, packets will be dropped.
    This is because kube-proxy intercepts the traffic and applies the rules as well.
    To circumvent this, also include the Pod and Node subnet CIDRs.

Troubleshooting

I created a load balancer, but I can't reach the application

This can have multiple reasons. Typical problems are:

  • Your application is not reachable:

    Try to use kubectl port-forward svc/$SVC_NAME 8080:$SVC_PORT and check if you can reach your application locally on port 8080.

  • The service node port is not reachable.

    Create a Pod with an interactive shell and test the connection to the node port:

    IP="$(k get nodes -o jsonpath='{.items[0].status.addresses[0].address}')"
    PORT="$(k -n "${NAMESPACE}" get service "${SVC}" -o jsonpath='{.spec.ports[0].nodePort}')"
    
    kubectl run --rm --restart=Never -it --image=busybox test -- nc -w 1 -z "${IP}" "${PORT}" && echo "success" || echo "failure"
  • The load balancer can't reach the worker nodes.

    Make sure that your nodes' security group has opened the port range 30000 - 32767 for the node network (default 192.168.1.0/24).
    On Clusters without advanced configuration, we create this rule automatically.
    To list the security group rules run:

    openstack security group rule list "metakube-${CLUSTER_ID}"

Client connections timeout after 50 seconds

By default, the SysEleven Stack's LBaaS closes idle connections after 50s.
If you encounter timeouts at 50s, you may configure higher timeout values with annotations on the Service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    # Set timeouts to 5min
    loadbalancer.openstack.org/timeout-client-data: "300000"
    loadbalancer.openstack.org/timeout-member-data: "300000"
spec:
  type: LoadBalancer
[...]

References