Skip to content

Gateway

This page covers diagnosing common issues with the Hedgehog Gateway, including connectivity problems and NAT issues.

Health Checks

Start by verifying the gateway has picked up its current configuration:

$ kubectl get gatewayagents
NAME          APPLIED             APPLIEDG   CURRENTG   VERSION   PROTOCOLIP   VTEPIP   AGE
gateway-1     10 minutes ago      3          3          v1.2.0    ...          ...      2d

AppliedG should equal CurrentG. If they differ, the gateway has not yet applied the latest configuration — check the dataplane pod logs.

If the gateway is not reporting in at all, check that both pods are running:

$ kubectl get pods -n fab -l app.kubernetes.io/component=gateway
NAME                               READY   STATUS    RESTARTS   AGE
gw--gateway-1--dataplane-7v9ss     1/1     Running   0          12h
gw--gateway-1--frr-c9kwc           2/2     Running   0          12h

If either pod is not Running, inspect its logs:

$ kubectl logs -n fab gw--gateway-1--dataplane-7v9ss
$ kubectl logs -n fab gw--gateway-1--frr-c9kwc -c frr
$ kubectl logs -n fab gw--gateway-1--frr-c9kwc -c frr-agent

Common Issues

Traffic not flowing through gateway

  1. Check peering is configured: Verify the GatewayPeering object exists and is not rejected:

    $ kubectl get gatewaypeerings
    

  2. Check routes on the leaf: Verify gateway routes are installed on the leaf switches:

    $ kubectl fabric inspect vpc <vpc-name>
    
    Look for routes pointing to the gateway's VTEP IP.

  3. Check FRR is advertising routes: Use the FRR pod to verify BGP is advertising the peering prefixes (see FRR and BGP State).

  4. Check flow filter: Use the dataplane CLI show flow-filter table to verify the peering policy is loaded. If the flow filter is empty, the dataplane configuration may not have been applied yet; check the FRR agent logs.

NAT not working as expected

  1. Check flow table: Use show flow-table entries in the dataplane CLI to see if flows are being created. If the flow table is empty while traffic is flowing, the packets may be dropped by the flow filter before reaching the NAT stage.

  2. Check NAT state: Use show masquerading state, show static-nat rules, or show port-forwarding rules to verify the NAT configuration is loaded.

  3. Idle timeout: If connections work briefly then stop, the flow may be expiring. Check the idleTimeout setting in the GatewayPeering spec. Use TCP or application-layer keepalives for long-lived connections.

Gateway failover

  1. Check both gateways are running: Verify both gateway pods are healthy.

  2. Check gateway group membership:

    $ kubectl get gateways -o yaml
    
    Verify both gateways are members of the expected group with correct priorities.

  3. Check BGP on leaves: After a failover, the leaf switches should withdraw routes from the failed gateway and install routes from the backup. Use kubectl fabric inspect bgp to check.

Diagnostics

Dataplane CLI

The dataplane includes an interactive CLI for inspecting internal state. Access it by exec'ing into the dataplane pod:

$ kubectl exec -n fab -it gw--gateway-1--dataplane-7v9ss -- ./dataplane-cli

Key commands:

Command Description
show flow-filter table Peering policy loaded on the dataplane
show flow-table entries Active stateful NAT sessions
show masquerading state Masquerade NAT configuration and pool state
show static-nat rules Static NAT mappings
show port-forwarding rules Port-forwarding rules
show ip fib IPv4 forwarding table
show config summary Configuration generation and apply status
show tech Full diagnostic dump (for support)

Use help in the CLI to see all available commands.

FRR and BGP State

FRR runs in a separate pod. Use vtysh to inspect BGP state:

$ kubectl exec -n fab -it gw--gateway-1--frr-c9kwc -c frr -- vtysh

Check BGP neighbors:

gateway-1# show bgp summary

All neighbors should be in Established state. If a neighbor is in Active or Idle, the BGP session is not established; check physical connectivity and IP configuration.

Check routes advertised by the gateway:

gateway-1# show ip route

VPC peering prefixes should appear as BGP routes pointing to the gateway's VTEP IP.

Check VRF routing tables:

gateway-1# show ip route vrf all

Metrics

The dataplane exposes Prometheus metrics scraped by the Alloy agent on the gateway node and forwarded to the Fabric Proxy.

Each metric is emitted with three label variants:

  • {total="<vpc>"}: all traffic in or out of the VPC
  • {drops="<vpc>"}: traffic dropped for the VPC
  • {from="<src>",to="<dst>"}: directional traffic between two VPCs

Available metrics:

Metric Type Description
vpc_packet_count Gauge Packet count
vpc_packet_rate Gauge Packet rate
vpc_byte_count Gauge Byte count
vpc_byte_rate Gauge Byte rate

To inspect metrics directly, run on the gateway node itself (the dataplane uses host networking, so the endpoint is accessible on the node at port 9442):

$ curl -s http://localhost:9442/metrics