Skip to content

Commit f3483d9

Browse files
authored
DTSPO-22443 - Palo Alto failover documentation (#380)
* initial draft * adding paths to file * correcting filename * updates * including failover info in palo patching process * updates * updates * updates * updates * spelling correction * update * spelling * spelling
1 parent 68a9267 commit f3483d9

File tree

5 files changed

+104
-0
lines changed

5 files changed

+104
-0
lines changed

.github/actions/spelling/expect.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Bance
22
cdef
33
Chirag
4+
CVP
5+
deevanapalli
46
devrot
57
dsl
68
EDB
@@ -14,6 +16,7 @@ jfmd
1416
jfrou
1517
jfrt
1618
journalctl
19+
kalyan
1720
mdv
1821
MDVADMVPNHA
1922
MDVDMZJUMPL
@@ -35,3 +38,4 @@ totp
3538
TTLs
3639
utilisation
3740
Virtualbox
41+
virtualwan

source/network/index.html.md.erb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ We host a number of Public & Private DNS zones in Azure.
7272
### Guides
7373
- [Connecting to a Palo Alto firewall](palo-alto/connecting-palos.html)
7474
- [Upgrading Palo Alto firewall Software](palo-alto/palos-upgrade.html)
75+
- [Palo Alto firewall Failover](palo-alto/palo-failover.html)
7576
- [Upgrading Panorama](palo-alto/panorama-upgrade.html)
7677

7778
### Troubleshooting

source/network/palo-alto/index.html.md.erb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ weight: 70
1010
### Guides
1111
- [Connecting to a Palo Alto firewall](connecting-palos.html)
1212
- [Upgrading Palo Alto firewall Software](palos-upgrade.html)
13+
- [Palo Alto Failover](palo-failover.html)
1314
- [Upgrading Panorama](panorama-upgrade.html)
1415

1516
### Troubleshooting
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Palo Alto Failover
3+
weight: 10
4+
last_reviewed_on: 2025-01-10
5+
review_in: 4 months
6+
---
7+
# <%= current_page.data.title %>
8+
## Overview
9+
10+
For applications accessed via Cloud Gateway, we have pinned routes in place to route traffic through
11+
a specific Palo Alto firewall, to ensure the return traffic is routed back through the same firewall.
12+
13+
In production, by default these pinned routes target 10.11.8.37 which is [hmcts-hub-prod-int-palo-vm-1](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Compute/virtualMachines/hmcts-hub-prod-int-palo-vm-1/overview)
14+
and should be updated to 10.11.8.38 which is [hmcts-hub-prod-int-palo-vm-0](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Compute/virtualMachines/hmcts-hub-prod-int-palo-vm-0/overview) in the event of a failover is needed in production.
15+
16+
This means that in the event of palo-vm-1 being unavailable, either due to an incident, or scheduled maintenance, traffic from Cloud Gateway will not be able to reach the applications.
17+
18+
Applications affected by this are:
19+
* DARTs
20+
* LibraGoB
21+
* SDP
22+
* Juror Digital
23+
* CVP
24+
* BAIS
25+
* MI
26+
* Interim Hosting
27+
28+
This document will describe the process of failing over to the other firewall in the pair.
29+
30+
31+
## Failover Process
32+
Before starting the failover process, be aware that there may be a brief period of downtime while the failover is in progress.
33+
34+
Where possible, James Drew ([email protected]) and Kalyan Deevanapalli ([email protected]) should be contacted in advance of the failover.
35+
36+
There are three places across two repositories where the IP address needs to be updated:
37+
* [aks-sds-deploy](https://github.com/hmcts/aks-sds-deploy)
38+
* [azure-platform-virtualwan](https://github.com/hmcts/azure-platform-virtualwan)
39+
40+
### 1. Update the IP address in the following files in the aks-sds-deploy repository:
41+
42+
* Pinned aks routes in [prod-pinned-aks-routes.yaml](https://github.com/hmcts/aks-sds-deploy/blob/master/environments/01-network/prod-pinned-aks-routes.yaml#L3)
43+
* Pinned app gateway routes [prod-pinned-appgw-routes.yaml](https://github.com/hmcts/aks-sds-deploy/blob/master/environments/01-network/prod-pinned-appgw-routes.yaml#L3)
44+
45+
See example [PR for DEMO enviornment](https://github.com/hmcts/aks-sds-deploy/pull/662/files)
46+
**Note:** the IPs are different for DEMO and PROD environments.
47+
48+
### 2. Update the IP address in the following files in the azure-platform-virtualwan repository:
49+
50+
* static vnet routes in [prod.tfvars](https://github.com/hmcts/azure-platform-virtualwan/blob/b48a0cc40b52accfd6884b8811d2e79503d071be/environments/prod/prod.tfvars#L153-L156)
51+
52+
See example [PR for DEMO enviornment](https://github.com/hmcts/azure-platform-virtualwan/pull/103/files)
53+
**Note:** the IPs are different for DEMO and PROD environments.
54+
55+
### 3. Running the pipelines
56+
57+
There are usually two reason why you would want to failover, either;
58+
* due to an incident / issue with a VM
59+
* scheduled maintenance, such as patching.
60+
61+
If the failover is required to be done urgently, during an incident for example, it is recommended to run the
62+
[aks-sds-deploy pipeline](https://dev.azure.com/hmcts/PlatformOperations/_build?definitionId=482&_a=summary) from
63+
your branch, as merging the PR and allowing all stages to run on the pipeline can take over 45 minutes to complete.
64+
If you run from your branch, and select only the required pipeline stages, it can be completed in around 10 minutes.
65+
66+
Ensure you still raise a PR, wait for checks to complete successfully, and then run the pipeline from your branch.
67+
68+
The required stages for this are:
69+
* 'Precheck'
70+
* 'Checking Clusters for sbox'
71+
* '{Environment}: Genesis'
72+
* '{Environment}: Network'
73+
74+
**Finally, your PR should still be merged** this is to keep the codebase in sync with the enviornment as well as to prevent a future pipeline run from overwriting your changes.
75+
76+
The [azure-platform-virtualwan pipeline](https://dev.azure.com/hmcts/PlatformOperations/_build?definitionId=478&_a=summary) is usually quicker to run, and can be run by merging your PR, following approval.
77+
78+
### 4. Verify the failover has been successful.
79+
80+
To verify the failover has been successful, you can check the following:
81+
* Your pipeline has completed successfully.
82+
* Check the [aks-prod-appgw-route-table](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/5ca62022-6aa2-4cee-aaa7-e7536c8d566c/resourceGroups/ss-prod-network-rg/providers/Microsoft.Network/routeTables/aks-prod-appgw-route-table/overview) has been updated. Check the 'Next hop IP address now reflects your PR. You can ignore any routes pointing to .36 addresses'
83+
* Check the [aks-prod-route-table](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/5ca62022-6aa2-4cee-aaa7-e7536c8d566c/resourceGroups/ss-prod-network-rg/providers/Microsoft.Network/routeTables/aks-prod-route-table/overview) for the same.
84+
85+
## FAQ
86+
87+
#### How do I find out IP's for the firewalls?
88+
89+
You can find out which IPs belong to which Palos by checking the backend pool of the Load Balancer in the Azure Portal.
90+
91+
For example, the production backend pool can be found [here](https://https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Network/loadBalancers/hmcts-hub-prod-int-palo-lb/backendPools)
92+
93+
#### The terraform plans are showing a lot of unexpected changes, is this normal?
94+
95+
Yes, this is normal, expect a lot of changes. The pipeline reorders resources and IPs within the plan which creates a busy output that can be hard to interpret. If you are in any doubt, ask #platform-operations on Slack.

source/network/palo-alto/palos-upgrade.html.md.erb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ The steps to take are outlined in the [PAN-OS Software Updates](https://docs.pal
1515
and are summarised and illustrated below as per the steps that have been recently taken
1616
while upgrading from the `v9.1.x` version to the `v10.0.x`
1717

18+
To ensure connectivity from Cloud Gateway sources are not unnecessarily left unavailable, you should complete a manual failover of the palo alto firewalls. More information can be found here: [Palo Alto Failover](palo-failover.html)
19+
20+
**Be aware that there may be a brief period of downtime while the failover is in progress.**
1821

1922
## Prerequisite
2023

0 commit comments

Comments
 (0)