Load Balancer

Octavia Load Balancers

MetaKube clusters come with built-in support for external load balancers (LBs) through OpenStack Octavia.
This allows you to easily get an external IP address for a Service.
Traffic to this IP address will be automatically load balanced over all available pods of the exposed service.
See the Create a Load Balancer tutorial for instructions.

The OpenStack cloud controller manager will automatically create an Octavia LB for each Service of type LoadBalancer.
These LBs are L4 (TCP) LBs and may be configured to use Proxy Protocol.

Important notes and limitations

  • We currently do not support creating L7 LBs with MetaKube.

    See Ingress Controllers for more information.

  • Metakube does not support UDP load balancers.

    A Service that uses a port with UDP will stay Pending, since the cloud controller manager will not provision a load balancer for it.

Ingress Controllers

To route HTTP traffic to different applications in your cluster, create an Ingress Controller behind your Load Balancer Service.
It is dynamically configured through Ingress objects and can also handle things like TLS termination.
For more information on Ingress resources and Ingress Controllers see the official Kubernetes documentation.
The Create an Ingress Controller tutorial guides you through installing an Nginx Ingress Controller with automatic Let's Encrypt TLS certificate support.

Configuration

The Load Balancer is configured through fields in the Service resource itself and through its annotations.

externalTrafficPolicy

We strongly recommend setting externalTrafficPolicy: Local.

There's two different values for service.spec.externalTrafficPolicy: Cluster (default) and Local.
The option determines how kube-proxy handles packets that arrive at a Node on the Service's NodePort.

  1. Cluster (default)

    Every packet is forwarded to a Pod from the Service's endpoints (likely on a different node) always involving DNAT and SNAT.
    So a packet may traverse two nodes until it reaches the Pod.

    The additional port allocation related to SNAT can cause port collisions and thus failures to add entries in the node's conntrack table.

  2. Local (recommended)

    The Node will only forward a packet directly to a Pod on the same node.
    If the node doesn't have a Pod of the Service, traffic to the NodePort is dropped.

    Since this also applies to the LB's healthcheck (the node is a backend member),
    nodes without Pods of the Service will be excluded of the LB's active backend pool and not receive further traffic.

    Since the packet stays on the node, SNAT is not necessary.
    This reduces latency (TCP and TLS alone cause 6 trips) and avoids possible issues with the superfluous port allocation.

Caveats with Local

  • The setting doesn't preserve the original Client IPs.

    As far as Kubernetes is concerned, the client IP is preserved, ... the IP of the LB that is.
    The original client IP is "lost" at the Load Balancer, since it uses a different TCP connection with the backend.

    To preserve the original client IP with a Load Balancer, you may use Proxy Protocol.

  • The setting also applies to internal traffic.

    Kube-proxy in IPVS mode (default and highly recommended) doesn't differentiate between internal or external traffic here.
    To get around this, deploy your proxy as a DaemonSet, so that there's a valid endpoint for the service on each node.

Proxy Protocol

The Proxy Protocol is an industry standard to pass client connection information through load balancers to the destination server.
Activating the proxy protocol allows you to retain the original client ip address and see it in your application.

To use Proxy Protocol, specify the loadbalancer.openstack.org/proxy-protocol: "true" annotation in your Service.

Changing the annotation after the creation of the load balancer does not reconfigure the load balancer!
To migrate a Service of type LoadBalancer to Proxy Protocol, you need to recreate the Service.
To avoid losing the associated Floating IP, see Keeping load balancer floating IP

Caveat: cluster-internal traffic

This issue does not affect every proxy.
Some proxies (e.g. Traefik) are capable of accepting connections with or without proxy protocol header.

Proxy Protocol adds a header at the beginning of the connection and your reverse proxy (Ingress Controller) will usually expect it.
Connections where the header is not prepended, will likely lead to a parsing error.

This causes issues for traffic inside a cluster:

Let's say a Pod sends a request to the address of the LB, behind which sits a typical reverse proxy with Proxy Protocol enabled.
Kubernetes (KubeProxy) will intercept the packets because it knows their final destination, the reverse proxy Pods.
This means the request will forego the hop over the LB, which would normally add the Proxy Protocol header and thus, omit it.
Your reverse proxy will return an error, because it fails to parse the payload, expecting the Proxy Protocol header.

Typical applications and use cases are:

  • cert-manager with http01 challenges
  • Prometheus blackbox exporter
  • CMS server-side rendered previews

A fix for these issues in Kubernetes is underway, but has already been postponed multiple times.

Workarounds
  • Use a proxy that can handle connections, both with and without proxy protocol header, e.g. Traefik.

  • The issue can be circumvented by adding another Ingress Controller behind a load balancer without Proxy Protocol.
    All cluster-internal traffic then needs to be sent to that Service.

  • We can offer a workaround, which inhibits Kube-Proxy from adding the IPVS rules, at your own risk.
    This may again break other applications that rely on the .status.loadBalancerIP field in the Service.

    Please contact us if you're interested.

loadbalancerSourceRanges

OpenStack will create a Security Group Rule for each CIDR you specify in spec.loadbalancerSourceRanges,
and thus filter traffic to your Load Balancer.

Caveats

  • With IPVS, the setting also applies to internal traffic.

    Kube-proxy in IPVS mode (default and highly recommended) will also block internal traffic going to a service, when
    spec.loadbalancerSourceRanges is set.
    You must also allow traffic from Node subnet CIDR and Pod subnet CIDR to circumvent this.

Keeping load balancer floating IP

When a service of type LoadBalancer is created without further configuration it gets an ephemeral IP address from the OpenStack network pool of the cluster (usually ext-net).
Deleting the load balancer also releases that floating IP again into the pool and becomes available for others.
There are no guarantees that the IP will still be available afterwards.

To retain a floating IP in your OpenStack project even when the LB gets deleted, set the loadbalancer.openstack.org/keep-floatingip: "true" annotation.
To reuse a specific floating IP afterwards, set the .spec.loadBalancerIP field in the Service.
The floating IP must exist in the same region as the LB.

Specify a custom floating IP network

If you need even more control of your used IP addresses you can also rent a dedicated IP space from us or bring your own IP space.
To use this floating IP network for your load balancer, specify the loadbalancer.openstack.org/floating-network-id: "<network id>" annotation.

To list networks and find the network ID, run:

openstack network list

Whitelist ingress from specific subnets

Octavia supports the .spec.loadBalancerSourceRanges option in the LoadBalancer Service to block traffic from non-matching source IPs.
Specify an array of subnets in CIDR notation.

Known limitations

  • With many rules (~50), it might take a couple of minutes for all to take effect.

  • Cluster-internal traffic also gets blocked

    If an application is sending requests to a LoadBalancer Service within the cluster,
    without subnet ranges matching the pod or node IPs, packets will be dropped.
    This is because the IPVS rules created by Kube-proxy intercept the traffic and apply the whitelist.
    To circumvent this, also include the Pod and Node subnet CIDR.

Troubleshooting

I created a load balancer, but I can't reach the application

This can have multiple reasons. Typical problems are:

  • Your application is not reachable:

    Try to use kubectl port-forward svc/$SVC_NAME 8080:$SVC_PORT and check if you can reach your application locally on port 8080.

  • The service node port is not reachable.

    Create a Pod with an interactive shell and test the connection to the node port:

    IP="$(k get nodes -o jsonpath='{.items[0].status.addresses[0].address}')"
    PORT="$(k -n "${NAMESPACE}" get service "${SVC}" -o jsonpath='{.spec.ports[0].nodePort}')"
    
    kubectl run --rm --restart=Never -it --image=busybox test -- nc -w 1 -z "${IP}" "${PORT}" && echo "success" || echo "failure"
  • The load balancer can't reach the worker nodes.

    Make sure that your nodes' security group has opened the port range 30000 - 32767 for the node network (default 192.168.1.0/24).
    On Clusters without advanced configuration, we create this rule automatically.
    To list the security group rules run:

    openstack security group rule list "metakube-${CLUSTER_ID}"

I am getting connection timeouts after 50 seconds

By default, the SysEleven Stack's LBaaS closes idle connections after 50s.
If you encounter timeouts at 50s, you may configure higher timeout values with annotations on the Loadbalancer Service:

apiVersion: v1
kind: Service
metadata:
  annotations:
    # Set timeouts to 5min
    loadbalancer.openstack.org/timeout-client-data: "300000"
    loadbalancer.openstack.org/timeout-member-data: "300000"
spec:
  type: LoadBalancer
[...]

Further information