Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unreliable gateway monitoring and recovery from (staged) failure

    Scheduled Pinned Locked Moved Routing and Multi WAN
    1 Posts 1 Posters 71 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      Pentangle
      last edited by

      Hi all,

      I've just spent an hour beating my head against a brick wall.

      I have 2 pfsense instances at a customer site. They're in HA (CARP). The site has a 600Mbit/s FTTP connection and a 100Mbit/s leased line. Since they're in HA both the pfsenses are sharing their configuration (essentially) although with the FTTP connection it's one of those weird ones where the ISP wouldn't give us any static IP addresses and they wouldn't entertain us connecting via anything but DHCP. Accordingly, the secondary pfsense has the interface but doesn't have the physical FTTP connection.

      I think the fact there are 2 pfsenses and HA is relatively irrelevant here, as my issue is related to the failover between the FTTP and leased line on the primary pfsense, so as CARP stays with the master I won't bother expanding the problem to include 2 boxes.

      There's a gateway group, containing the FTTP link as a Tier 1 and the leased line as a Tier 2.

      This site is connected via an IPsec VPN to a third pfsense at a datacentre, behind which is the domain controller containing the DNS the users use (i.e. all their servers are in colo, with nothing physically on-prem aside from these pfsenses). The VPN is connected using the dynamic DNS name given by DuckDNS and programmed into pfsense to update.

      So, the testing:

      • When both connections are live, everything works.

      • When I remove the FTTP connection, one of three things happens:

      1. There is a single dropped packet, then the leased line takes over, the VPN gets re-established, and everything's fine (this happens very rarely)
      2. There is a command prompt page's worth of dropped packets, and then the leased line takes over, the VPN does not get re-established in this instance
      3. There's an infinite amount of dropped packets and nothing comes back for the 10 minutes or so I let it test.

      My problems I believe stem from:

      A) The gateway group not realising the primary gateway is down
      B) The Dynamic DNS service not realising something's happened which would require a new Dynamic DNS Update
      C) Something else I know not what.

      I would appreciate some help, specifically I'd love to know:

      • Why if there's a X against the interface in the dashboard, does the gateway monitor still think that things are "Pending" and "Gathering data"
      • What is the impact of the gateway monitor IP address when there's no static IP address in the primary link path I can use (i.e. if I use 1.1.1.1 or 8.8.8.8 as a gateway monitor address, is this going to screw things up because it can be "seen" from the other interface(s)
      • What can I do about ensuring a Dynamic DNS update occurs following a gateway change so that the VPN can re-establish?

      Any help gratefully received with virtual beers all round.

      1 Reply Last reply Reply Quote 0
      • First post
        Last post
      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.