-->

k8s: forwarding from public VIP to clusterIP with

2020-03-30 07:30发布

问题:

I'm trying to understand in depth how forwarding from publicly exposed load-balancer's layer-2 VIPs to services' cluster-IPs works. I've read a high-level overview how MetalLB does it and I've tried to replicate it manually by setting keepalived/ucarp VIP and iptables rules. I must be missing something however as it doesn't work ;-]

Steps I took:

  1. created a cluster with kubeadm consisting of a master + 3 nodes running k8s-1.17.2 + calico-3.12 on libvirt/KVM VMs on a single computer. all VMs are in 192.168.122.0/24 virtual network.

  2. created a simple 2 pod deployment and exposed it as a NodePort service with externalTrafficPolicy set to cluster:
    $ kubectl get svc dump-request NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE dump-request NodePort 10.100.234.120 <none> 80:32292/TCP 65s
    I've verified that I can reach it from the host machine on every node's IP at 32292 port.

  3. created a VIP with ucarp on all 3 nodes:
    ucarp -i ens3 -s 192.168.122.21 -k 21 -a 192.168.122.71 -v 71 -x 71 -p dump -z -n -u /usr/local/sbin/vip-up.sh -d /usr/local/sbin/vip-down.sh (example from knode1)
    I've verified that I can ping the 192.168.122.71 VIP. I even could ssh through it to the VM that was currently holding the VIP.
    Now if kube-proxy was in iptables mode, I could also reach the service on its node-port through the VIP at http://192.168.122.71:32292. However, to my surprise, in ipvs mode this always resulted in connection timing out.

  4. added an iptables rule on every node for packets incoming to 192.168.122.71 to be forwarded to to service's cluster-IP 10.100.234.120:
    iptables -t nat -A PREROUTING -d 192.168.122.71 -j DNAT --to-destination 10.100.234.120
    (later I've also tried to narrow the rule only to the relevant port, but it didn't change the results in any way:
    iptables -t nat -A PREROUTING -d 192.168.122.71 -p tcp --dport 80 -j DNAT --to-destination 10.100.234.120:80)

Results:

in iptables mode all requests to http://192.168.122.71:80/ resulted in connection timing out.

in ipvs mode it worked partially:
if the 192.168.122.71 VIP was being held by a node that had a pod on it, then about 50% requests were succeeding and they were always served by the local pod. the app was also getting the real remote IP of the host machine (192.168.122.1). the other 50% (being sent to the pod on anther node presumably) were timing out.
if the VIP was being held by a node without pods then all requests were timing out.

I've also checked if it affects the results in anyway to keep the rule on all nodes at all times vs. to keep it only on the node holding the VIP and deleting it at the release of the VIP: results were the same in both cases.

Does anyone know why it doesn't work and how to fix it? I will appreciate help with this :)

回答1:

need to add MASQUERADE rule also, so that the source is changed accordingly. for example:
iptables -t nat -A POSTROUTING -j MASQUERADE

tested with ipvs