Skip to content

Commit

Permalink
DTSPO-22443 - Palo Alto failover documentation (#380)
Browse files Browse the repository at this point in the history
* initial draft

* adding paths to file

* correcting filename

* updates

* including failover info in palo patching process

* updates

* updates

* updates

* updates

* spelling correction

* update

* spelling

* spelling
  • Loading branch information
JordanHoey96 authored Jan 14, 2025
1 parent 68a9267 commit f3483d9
Show file tree
Hide file tree
Showing 5 changed files with 104 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .github/actions/spelling/expect.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Bance
cdef
Chirag
CVP
deevanapalli
devrot
dsl
EDB
Expand All @@ -14,6 +16,7 @@ jfmd
jfrou
jfrt
journalctl
kalyan
mdv
MDVADMVPNHA
MDVDMZJUMPL
Expand All @@ -35,3 +38,4 @@ totp
TTLs
utilisation
Virtualbox
virtualwan
1 change: 1 addition & 0 deletions source/network/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ We host a number of Public & Private DNS zones in Azure.
### Guides
- [Connecting to a Palo Alto firewall](palo-alto/connecting-palos.html)
- [Upgrading Palo Alto firewall Software](palo-alto/palos-upgrade.html)
- [Palo Alto firewall Failover](palo-alto/palo-failover.html)
- [Upgrading Panorama](palo-alto/panorama-upgrade.html)

### Troubleshooting
Expand Down
1 change: 1 addition & 0 deletions source/network/palo-alto/index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ weight: 70
### Guides
- [Connecting to a Palo Alto firewall](connecting-palos.html)
- [Upgrading Palo Alto firewall Software](palos-upgrade.html)
- [Palo Alto Failover](palo-failover.html)
- [Upgrading Panorama](panorama-upgrade.html)

### Troubleshooting
Expand Down
95 changes: 95 additions & 0 deletions source/network/palo-alto/palo-failover.html.md.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Palo Alto Failover
weight: 10
last_reviewed_on: 2025-01-10
review_in: 4 months
---
# <%= current_page.data.title %>
## Overview

For applications accessed via Cloud Gateway, we have pinned routes in place to route traffic through
a specific Palo Alto firewall, to ensure the return traffic is routed back through the same firewall.

In production, by default these pinned routes target 10.11.8.37 which is [hmcts-hub-prod-int-palo-vm-1](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Compute/virtualMachines/hmcts-hub-prod-int-palo-vm-1/overview)
and should be updated to 10.11.8.38 which is [hmcts-hub-prod-int-palo-vm-0](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Compute/virtualMachines/hmcts-hub-prod-int-palo-vm-0/overview) in the event of a failover is needed in production.

This means that in the event of palo-vm-1 being unavailable, either due to an incident, or scheduled maintenance, traffic from Cloud Gateway will not be able to reach the applications.

Applications affected by this are:
* DARTs
* LibraGoB
* SDP
* Juror Digital
* CVP
* BAIS
* MI
* Interim Hosting

This document will describe the process of failing over to the other firewall in the pair.


## Failover Process
Before starting the failover process, be aware that there may be a brief period of downtime while the failover is in progress.

Where possible, James Drew ([email protected]) and Kalyan Deevanapalli ([email protected]) should be contacted in advance of the failover.

There are three places across two repositories where the IP address needs to be updated:
* [aks-sds-deploy](https://github.com/hmcts/aks-sds-deploy)
* [azure-platform-virtualwan](https://github.com/hmcts/azure-platform-virtualwan)

### 1. Update the IP address in the following files in the aks-sds-deploy repository:

* Pinned aks routes in [prod-pinned-aks-routes.yaml](https://github.com/hmcts/aks-sds-deploy/blob/master/environments/01-network/prod-pinned-aks-routes.yaml#L3)
* Pinned app gateway routes [prod-pinned-appgw-routes.yaml](https://github.com/hmcts/aks-sds-deploy/blob/master/environments/01-network/prod-pinned-appgw-routes.yaml#L3)

See example [PR for DEMO enviornment](https://github.com/hmcts/aks-sds-deploy/pull/662/files)
**Note:** the IPs are different for DEMO and PROD environments.

### 2. Update the IP address in the following files in the azure-platform-virtualwan repository:

* static vnet routes in [prod.tfvars](https://github.com/hmcts/azure-platform-virtualwan/blob/b48a0cc40b52accfd6884b8811d2e79503d071be/environments/prod/prod.tfvars#L153-L156)

See example [PR for DEMO enviornment](https://github.com/hmcts/azure-platform-virtualwan/pull/103/files)
**Note:** the IPs are different for DEMO and PROD environments.

### 3. Running the pipelines

There are usually two reason why you would want to failover, either;
* due to an incident / issue with a VM
* scheduled maintenance, such as patching.

If the failover is required to be done urgently, during an incident for example, it is recommended to run the
[aks-sds-deploy pipeline](https://dev.azure.com/hmcts/PlatformOperations/_build?definitionId=482&_a=summary) from
your branch, as merging the PR and allowing all stages to run on the pipeline can take over 45 minutes to complete.
If you run from your branch, and select only the required pipeline stages, it can be completed in around 10 minutes.

Ensure you still raise a PR, wait for checks to complete successfully, and then run the pipeline from your branch.

The required stages for this are:
* 'Precheck'
* 'Checking Clusters for sbox'
* '{Environment}: Genesis'
* '{Environment}: Network'

**Finally, your PR should still be merged** this is to keep the codebase in sync with the enviornment as well as to prevent a future pipeline run from overwriting your changes.

Check failure on line 74 in source/network/palo-alto/palo-failover.html.md.erb

View workflow job for this annotation

GitHub Actions / Check spelling

`enviornment` is not a recognized word. (unrecognized-spelling)

The [azure-platform-virtualwan pipeline](https://dev.azure.com/hmcts/PlatformOperations/_build?definitionId=478&_a=summary) is usually quicker to run, and can be run by merging your PR, following approval.

### 4. Verify the failover has been successful.

To verify the failover has been successful, you can check the following:
* Your pipeline has completed successfully.
* Check the [aks-prod-appgw-route-table](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/5ca62022-6aa2-4cee-aaa7-e7536c8d566c/resourceGroups/ss-prod-network-rg/providers/Microsoft.Network/routeTables/aks-prod-appgw-route-table/overview) has been updated. Check the 'Next hop IP address now reflects your PR. You can ignore any routes pointing to .36 addresses'
* Check the [aks-prod-route-table](https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/5ca62022-6aa2-4cee-aaa7-e7536c8d566c/resourceGroups/ss-prod-network-rg/providers/Microsoft.Network/routeTables/aks-prod-route-table/overview) for the same.

## FAQ

#### How do I find out IP's for the firewalls?

You can find out which IPs belong to which Palos by checking the backend pool of the Load Balancer in the Azure Portal.

For example, the production backend pool can be found [here](https://https://portal.azure.com/#@HMCTS.NET/resource/subscriptions/0978315c-75fe-4ada-9d11-1eb5e0e0b214/resourceGroups/hmcts-hub-prod-int/providers/Microsoft.Network/loadBalancers/hmcts-hub-prod-int-palo-lb/backendPools)

#### The terraform plans are showing a lot of unexpected changes, is this normal?

Yes, this is normal, expect a lot of changes. The pipeline reorders resources and IPs within the plan which creates a busy output that can be hard to interpret. If you are in any doubt, ask #platform-operations on Slack.
3 changes: 3 additions & 0 deletions source/network/palo-alto/palos-upgrade.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ The steps to take are outlined in the [PAN-OS Software Updates](https://docs.pal
and are summarised and illustrated below as per the steps that have been recently taken
while upgrading from the `v9.1.x` version to the `v10.0.x`

To ensure connectivity from Cloud Gateway sources are not unnecessarily left unavailable, you should complete a manual failover of the palo alto firewalls. More information can be found here: [Palo Alto Failover](palo-failover.html)

**Be aware that there may be a brief period of downtime while the failover is in progress.**

## Prerequisite

Expand Down

1 comment on commit f3483d9

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@check-spelling-bot Report

🔴 Please review

See the 📜action log or 📝 job summary for details.

Unrecognized words (1)

enviornment

To accept these unrecognized words as correct, you could run the following commands

... in a clone of the [email protected]:hmcts/ops-runbooks.git repository
on the main branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/v0.0.22/apply.pl' |
perl - 'https://github.com/hmcts/ops-runbooks/actions/runs/12767034792/attempts/1'
Warnings (1)

See the 📜action log or 📝 job summary for details.

ℹ️ Warnings Count
ℹ️ no-newline-at-eof 2

See ℹ️ Event descriptions for more information.


🖊️ Please consider adding a word to the allow list if it is flagged as a spelling error but is genuinely used within the project.
🤔 Think we might see a flagged mistake in another PR in the future? Please consider adding it as an expected pattern

Please sign in to comment.