|
| 1 | +--- |
| 2 | +title: Palo Alto Failover |
| 3 | +weight: 10 |
| 4 | +last_reviewed_on: 2025-01-10 |
| 5 | +review_in: 4 months |
| 6 | +--- |
| 7 | +# <%= current_page.data.title %> |
| 8 | +## Overview |
| 9 | + |
| 10 | +For applications accessed via Cloud Gateway, we have pinned routes in place to route traffic through |
| 11 | +a specific Palo Alto firewall, to ensure the return traffic is routed back through the same firewall. |
| 12 | + |
| 13 | +In production, by default these pinned routes target 10.11.8.37 which is [hmcts-hub-prod-int-palo-vm-1](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Compute/virtualMachines/hmcts-hub-prod-int-palo-vm-1/overview) |
| 14 | +and should be updated to 10.11.8.38 which is [hmcts-hub-prod-int-palo-vm-0](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Compute/virtualMachines/hmcts-hub-prod-int-palo-vm-0/overview) in the event of a failover is needed in production. |
| 15 | + |
| 16 | +This means that in the event of palo-vm-1 being unavailable, either due to an incident, or scheduled maintenance, traffic from Cloud Gateway will not be able to reach the applications. |
| 17 | + |
| 18 | +Applications affected by this are: |
| 19 | +* DARTs |
| 20 | +* LibraGoB |
| 21 | +* SDP |
| 22 | +* Juror Digital |
| 23 | +* CVP |
| 24 | +* BAIS |
| 25 | +* MI |
| 26 | +* Interim Hosting |
| 27 | + |
| 28 | +This document will describe the process of failing over to the other firewall in the pair. |
| 29 | + |
| 30 | + |
| 31 | +## Failover Process |
| 32 | +Before starting the failover process, be aware that there may be a brief period of downtime while the failover is in progress. |
| 33 | + |
| 34 | +Where possible, James Drew ( [email protected]) and Kalyan Deevanapalli ( [email protected]) should be contacted in advance of the failover. |
| 35 | + |
| 36 | +There are three places across two repositories where the IP address needs to be updated: |
| 37 | +* [aks-sds-deploy](https://github.com/hmcts/aks-sds-deploy) |
| 38 | +* [azure-platform-virtualwan](https://github.com/hmcts/azure-platform-virtualwan) |
| 39 | + |
| 40 | +### 1. Update the IP address in the following files in the aks-sds-deploy repository: |
| 41 | + |
| 42 | +* Pinned aks routes in [prod-pinned-aks-routes.yaml](https://github.com/hmcts/aks-sds-deploy/blob/master/environments/01-network/prod-pinned-aks-routes.yaml#L3) |
| 43 | +* Pinned app gateway routes [prod-pinned-appgw-routes.yaml](https://github.com/hmcts/aks-sds-deploy/blob/master/environments/01-network/prod-pinned-appgw-routes.yaml#L3) |
| 44 | + |
| 45 | +See example [PR for DEMO enviornment](https://github.com/hmcts/aks-sds-deploy/pull/662/files) |
| 46 | +**Note:** the IPs are different for DEMO and PROD environments. |
| 47 | + |
| 48 | +### 2. Update the IP address in the following files in the azure-platform-virtualwan repository: |
| 49 | + |
| 50 | +* static vnet routes in [prod.tfvars](https://github.com/hmcts/azure-platform-virtualwan/blob/b48a0cc40b52accfd6884b8811d2e79503d071be/environments/prod/prod.tfvars#L153-L156) |
| 51 | + |
| 52 | +See example [PR for DEMO enviornment](https://github.com/hmcts/azure-platform-virtualwan/pull/103/files) |
| 53 | +**Note:** the IPs are different for DEMO and PROD environments. |
| 54 | + |
| 55 | +### 3. Running the pipelines |
| 56 | + |
| 57 | +There are usually two reason why you would want to failover, either; |
| 58 | +* due to an incident / issue with a VM |
| 59 | +* scheduled maintenance, such as patching. |
| 60 | + |
| 61 | +If the failover is required to be done urgently, during an incident for example, it is recommended to run the |
| 62 | +[aks-sds-deploy pipeline](https://dev.azure.com/hmcts/PlatformOperations/_build?definitionId=482&_a=summary) from |
| 63 | +your branch, as merging the PR and allowing all stages to run on the pipeline can take over 45 minutes to complete. |
| 64 | +If you run from your branch, and select only the required pipeline stages, it can be completed in around 10 minutes. |
| 65 | + |
| 66 | +Ensure you still raise a PR, wait for checks to complete successfully, and then run the pipeline from your branch. |
| 67 | + |
| 68 | +The required stages for this are: |
| 69 | +* 'Precheck' |
| 70 | +* 'Checking Clusters for sbox' |
| 71 | +* '{Environment}: Genesis' |
| 72 | +* '{Environment}: Network' |
| 73 | + |
| 74 | +**Finally, your PR should still be merged** this is to keep the codebase in sync with the enviornment as well as to prevent a future pipeline run from overwriting your changes. |
| 75 | + |
| 76 | +The [azure-platform-virtualwan pipeline](https://dev.azure.com/hmcts/PlatformOperations/_build?definitionId=478&_a=summary) is usually quicker to run, and can be run by merging your PR, following approval. |
| 77 | + |
| 78 | +### 4. Verify the failover has been successful. |
| 79 | + |
| 80 | +To verify the failover has been successful, you can check the following: |
| 81 | +* Your pipeline has completed successfully. |
| 82 | +* Check the [aks-prod-appgw-route-table](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/5ca62022-6aa2-4cee-aaa7-e7536c8d566c/resourceGroups/ss-prod-network-rg/providers/Microsoft.Network/routeTables/aks-prod-appgw-route-table/overview) has been updated. Check the 'Next hop IP address now reflects your PR. You can ignore any routes pointing to .36 addresses' |
| 83 | +* Check the [aks-prod-route-table](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/5ca62022-6aa2-4cee-aaa7-e7536c8d566c/resourceGroups/ss-prod-network-rg/providers/Microsoft.Network/routeTables/aks-prod-route-table/overview) for the same. |
| 84 | + |
| 85 | +## FAQ |
| 86 | + |
| 87 | +#### How do I find out IP's for the firewalls? |
| 88 | + |
| 89 | +You can find out which IPs belong to which Palos by checking the backend pool of the Load Balancer in the Azure Portal. |
| 90 | + |
| 91 | +For example, the production backend pool can be found [here](https://https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Network/loadBalancers/hmcts-hub-prod-int-palo-lb/backendPools) |
| 92 | + |
| 93 | +#### The terraform plans are showing a lot of unexpected changes, is this normal? |
| 94 | + |
| 95 | +Yes, this is normal, expect a lot of changes. The pipeline reorders resources and IPs within the plan which creates a busy output that can be hard to interpret. If you are in any doubt, ask #platform-operations on Slack. |
0 commit comments