This guide is an extension of the Known Issues documentation from SysEleven OpenStack. The issue in MetaKube is basically related and quite similar, but needs to be handled slightly different.
Affected Regions:
CBK and DBL
(Region FES is not affected)
Problem Statement:
MetaKube worker nodes run as virtual machines on SysEleven OpenStack.
When a virtual machine establishes a TCP connection to a remote server, it uses a random TCP source port.
In order for return traffic to be allowed to flow into a VM in Openstack, a dynamic inbound security group rule will automatically be created by the SDN (Software Defined Network), allowing traffic to flow back to this random TCP port.
This dynamic rule will expire if the connection is idle for 60 seconds.
If the server is quiet for too long, any return traffic from the remote server will be dropped.
Solution:
Follow these steps to avoid this issue:
Openstack:
Select one of the following Openstack solutions which applies for your setup:
Either add a rule to your existing MetaKube security group, which explicitly allows returning traffic.
Now the SDN has no need to create dynamic rules anymore. The Linux kernel option net.ipv4.ip_local_port_range
configures the range from which the random source port will be picked when a virtual machine initiates a connection.
For example, setting this value to 32768 - 60999
and allowing all traffic incoming from the server to the client port range 32768 - 60999
will solve the issue.
To determine net.ipv4.ip_local_port_range
for your worker nodes you need SSH access to the VM or connect through a node-shell and issue: sysctl net.ipv4.ip_local_port_range
or just in case both virtual machines (the client application and the server) run in the same Openstack project and region, add a security group to the server which allows ingress traffic to high ports such as 32768 - 60999
from another security group that the server's ports are members of.
MetaKube:
Your application must support enabling TCP keepalives. Turn it on with a timeout value shorter than 60
seconds.
The following (or similar) sysctl parameters must be set in the network namespace of the container/Pod (not of the host):
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_time = 10
Kubelet does not consider these settings as "safe" and will not allow containers to run, if they are set in
podSecurityContext
section of your container.
Currently, MetaKube does not offer a way to change this.
A valid alternative is to set them in an initContainer
in the same Pod as your application.
Example:
apiVersion: v1
kind: Pod
metadata:
labels:
run: myapp-client
name: myapp-client
spec:
containers:
- name: myapp-client
image: busybox
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- command: [sh, -c]
args:
- |
echo "10" > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo "5" > /proc/sys/net/ipv4/tcp_keepalive_probes
echo "10" > /proc/sys/net/ipv4/tcp_keepalive_time
image: alpine:3.16
name: sysctl
securityContext:
runAsUser: 0
runAsGroup: 0
privileged: true
Verify the settings by: kubectl exec -it myapp-client -- sysctl -a | grep keepalive
Since keepalives are now activated but not automatically sent on every TCP connection, still the application must request kernel keepalives when it opens the TCP socket. To take effect, the MetaKube solution steps must be used in combination.