Known Issues

Idle TCP sessions being closed

This guide is an extension of the Known Issues documentation from SysEleven OpenStack. The issue in MetaKube is basically related and quite similar, but needs to be handled slightly different.

Affected Regions:

CBK and DBL

(Region FES is not affected)

Problem Statement:

MetaKube worker nodes run as virtual machines on SysEleven OpenStack.

When a virtual machine establishes a TCP connection to a remote server, it uses a random TCP source port.
In order for return traffic to be allowed to flow into a VM in Openstack, a dynamic inbound security group rule will automatically be created by the SDN (Software Defined Network), allowing traffic to flow back to this random TCP port.
This dynamic rule will expire if the connection is idle for 60 seconds.

If the server is quiet for too long, any return traffic from the remote server will be dropped.

Solution:

Follow these steps to avoid this issue:

Openstack:

Select one of the following Openstack solutions which applies for your setup:

  • Either add a rule to your existing MetaKube security group, which explicitly allows returning traffic.
    Now the SDN has no need to create dynamic rules anymore. The Linux kernel option net.ipv4.ip_local_port_range configures the range from which the random source port will be picked when a virtual machine initiates a connection.
    For example, setting this value to 32768 - 60999 and allowing all traffic incoming from the server to the client port range 32768 - 60999 will solve the issue.
    To determine net.ipv4.ip_local_port_range for your worker nodes you need SSH access to the VM or connect through a node-shell and issue: sysctl net.ipv4.ip_local_port_range

  • or just in case both virtual machines (the client application and the server) run in the same Openstack project and region, add a security group to the server which allows ingress traffic to high ports such as 32768 - 60999 from another security group that the server's ports are members of.

MetaKube:

  • Your application must support enabling TCP keepalives. Turn it on with a timeout value shorter than 60 seconds.

  • The following (or similar) sysctl parameters must be set in the network namespace of the container/Pod (not of the host):

    • net.ipv4.tcp_keepalive_intvl = 10
    • net.ipv4.tcp_keepalive_probes = 5
    • net.ipv4.tcp_keepalive_time = 10

    Kubelet does not consider these settings as "safe" and will not allow containers to run, if they are set in
    podSecurityContext section of your container.
    Currently, MetaKube does not offer a way to change this.

    A valid alternative is to set them in an initContainer in the same Pod as your application.

    Example:

    apiVersion: v1
    kind: Pod
    metadata:
    labels:
      run: myapp-client
    name: myapp-client
    spec:
    containers:
    - name: myapp-client
      image: busybox
      command: ['sh', '-c', 'echo The app is running! && sleep 3600']
    initContainers:                
    - command: [sh, -c]
      args:              
        - |   
          echo "10" > /proc/sys/net/ipv4/tcp_keepalive_intvl
          echo "5" > /proc/sys/net/ipv4/tcp_keepalive_probes
          echo "10" > /proc/sys/net/ipv4/tcp_keepalive_time
      image: alpine:3.16    
      name: sysctl     
      securityContext:   
        runAsUser: 0   
        runAsGroup: 0       
        privileged: true

    Verify the settings by: kubectl exec -it myapp-client -- sysctl -a | grep keepalive

    Since keepalives are now activated but not automatically sent on every TCP connection, still the application must request kernel keepalives when it opens the TCP socket. To take effect, the MetaKube solution steps must be used in combination.

References