Skip to content

Commit 3a23ae8

Browse files
authored
Merge branch 'main' into feat/document-scs2
2 parents 12bb038 + b7e5e2e commit 3a23ae8

File tree

1 file changed

+201
-0
lines changed

1 file changed

+201
-0
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
sidebar_label: Pre Installation Checklist
3+
---
4+
5+
# Pre Installation Checklist
6+
7+
:::warning
8+
9+
This checklist is currently really **work in progress and incomplete**.
10+
11+
:::
12+
13+
This list describes some aspects (without claiming to be exhaustive) that should be clarified before a pilot and at least before production installation.
14+
15+
The aim of this list is to reduce:
16+
17+
- projects that could be more successful
18+
- long project waiting/implementation times
19+
- unexpected errors or difficulties
20+
- major restructuring work soon after the system was initially put into operation
21+
- unexpected issues that have a major impact on costs.
22+
23+
For reasons of clarity, most of the topics are not described in such detail that readers who are new to this
24+
environment can easily understand the context of the question. Rather, this list should be seen as a catalog
25+
of questions or a task list that should be discussed, clarified and processed in the run-up to a project.
26+
27+
_Opensource benefits from the collaboration of its users and its developers._
28+
29+
For this reason, we are collecting questions, important topics to be clarified and hints to make it easier for users of the Sovereign Cloud Stack to increase the success with it.
30+
Therefore we would be very pleased if specific experiences from users, implementers and operators [are contributing](https://github.com/SovereignCloudStack/docs/docs/01-getting-started/preinstall-checklist.md) to this list.
31+
32+
## General
33+
34+
### Availability and Support
35+
36+
- What requirements do you have for the availability of the system?
37+
- How much downtime is acceptable for mainetenance tasks?
38+
- What are your expectations in terms of downtimes or what downtime is just within the tolerable range for you?
39+
(as you probably know, this has a significant impact on the hardware and personnel requirements)
40+
- What gradation or requirements are there for the elimination of problems with regard to the different types of problems?
41+
- Examples problem scenarios:
42+
- complete cloud service outage or downtime
43+
- performance problems
44+
- application problems
45+
- ....
46+
- How do you want to manage the lifecycle of your Systems?
47+
- Where do you want to test releases, configuration changes and operational procedures before rolling them out for production environments?
48+
- Is it possible to have a testing environment which compareable to production from a logical/architectural perspektive?
49+
- Is it possible to have a testing environment which runs real-world workloads ("eat your own dogfood")
50+
51+
### General Hardware Conditions
52+
53+
- Are there defined hardware standards for the target data center?
54+
- Are there defined/standarized suppliers?
55+
- Are there defined/standarized manufacturers and server models?
56+
- Do the requirements of the defined hardware and vedors match these to the requirements for building a cloud system (cost structure, delivery times, support, automation options, quality, ...)?
57+
- It may be worthwhile to evaluate other suppliers or manufacturers whose properties are more suitable.
58+
(depending on how large the target systems are, this can quickly make sense)
59+
- Instead of cost-intensive hardware support contracts, does it make sense to keep free additional capacities (which are already integrated into the cloud) or a stock of spare parts?
60+
- What are the general conditions for the target data center?
61+
- How should the systems be provisioned with an operating system?
62+
- Decide which base operating system is used (e.g. RHEL or Ubuntu) and whether this fits the hardware support, strategy, upgrade support and cost structure.
63+
- How many failure domains, environments, availability zones are required?
64+
65+
### Required IP Networks
66+
67+
Estimate the expected number of IP addresses and plan sufficient reserves so that no adjustments to the networks will be necessary at a later date.
68+
The installation can be carried out via IPv4 or IPv6 as well as hybrid.
69+
70+
- Provider Networks: One or more dedicated public IP networks for services published by the cloud platform and its users
71+
- this is in most cases a public IPv4 network
72+
- at least TCP port 443 should be accessible for all addresses of this network from other networks (i.e. internet)
73+
- Openstack Node Communication: A dedicated private IP adress space / network for the internal communication between the nodes
74+
- every node needs a dedicated IP
75+
- a DHCP range for for performing node installations might be useful, but not mandatory
76+
- all nodes in this network should have access to the NTP server
77+
- all nodes should have access to public DNS servers and HTTP/HTTPS servers
78+
- In some cases, it may make sense to operate Ceph in a dedicated network or multiple dedicated networks (public, cluster).
79+
Methods for high-performance and scalable access to the storage:
80+
- very high-performance routing (layer 3), for example via switch infrastructure
81+
- Dedicated network adapters in the compute nodes for direct access to the storage network
82+
- Ceph Node Communication: A dedicated private IP adress space / network for the internal communication between the ceph nodes
83+
- Management: A private IP adress space / network for the hardware out of out band management of the nodes
84+
- every node needs a dedicated management IP
85+
- a DHCP range for installation might be useful, but not mandatory
86+
- Manager Access: Dedicated IP adresses for the access of the manager nodes
87+
- Every manager gets a dedicated external address for SSH and Wireguard Access
88+
- The IP adresses should not be part of the "Frontend Access" network
89+
- At least Port 443/TCP and 51820/UDP should be reachable from external networks
90+
91+
### Identity Management of the Platform
92+
93+
How should access to the administration of the environment (e.g. Openstack) be managed?
94+
95+
Should there only be local access or should the system be linked to one or more identity providers via OIDC or SAML (identity brokering)?
96+
97+
### Network configuration of nodes and tenant networks
98+
99+
TBD:
100+
101+
- It must be decided how the networks of the tenants should be separated in Openstack (Neutron)
102+
- It must be decided how the underlay network of the cloud platform should be designed.
103+
(e.g. native Layer2, Layer2 underlay with Tenant VLANs, Layer3 underlay)
104+
- Layer 3 Underlay
105+
- FRR Routing on the Nodes?
106+
- ASN nameing scheme
107+
108+
### Domains and Hosts
109+
110+
- Cloud Domain: A dedicated subdomain used for the cloud environment
111+
(i.e. `*.zone1.landscape.scs.community`)
112+
- Internal API endpoint: A hostname for the internal api endpoint which points to address to the "Node Communication" network
113+
(i.e. `api-internal.zone1.landscape.scs.community`)
114+
- External API endpoint: A hostname for the external api endpoint which points to address to the "Frontend Access" network
115+
(i.e. `api.zone1.landscape.scs.community`)
116+
117+
### TLS Certificates
118+
119+
Since not all domains that are used for the environment will be publicly accessible and therefore the use of "Let's Encrypt" certificates
120+
is not generally possible without problems, we recommend that official TLS certificates are available for at least the two API endpoints.
121+
Either a multi-domain certificate (with SANs) or a wildcard certificate (wildcard on the first level of the cloud domain) can be used for this.
122+
123+
### Access to installation resources
124+
125+
For the download of installation data such as container images, operating system packages, etc.,
126+
either access to publicly accessible networks must be provided or a caching proxy or a dedicated
127+
repository server must be provided directly from the network for "Node communication".
128+
129+
The [Configuration Guide](https://docs.scs.community/docs/iaas/guides/configuration-guide/proxy) provides more detailed information on how this can be configured.
130+
131+
TBD:
132+
133+
- Proxy requirements
134+
- Are authenticated proxies possible?
135+
136+
### Git Repository
137+
138+
- A private Git Repository for the [configuration repository](https://osism.tech/docs/guides/configuration-guide/configuration-repository)
139+
140+
### Access managment
141+
142+
- What requirments are neede or defined for the administration of the system
143+
- The public Keys of all administrators
144+
145+
### Monitoring and On-Call/On-Duty
146+
147+
- Connection and integration into existing operational monitoring
148+
- What kind of On-Call/On-Duty do you need?
149+
- How quickly should the solution to a problem be started?
150+
- What downtimes are tolerable in extreme cases?
151+
- Does a log aggregation system already exist and does it make sense to use it for the new environment?
152+
153+
## NTP Infrastructure
154+
155+
- The deployed nodes should have permanent access to at least 3 ntp servers
156+
- It has turned out to be advantageous that the 3 control nodes have access to NTP servers
157+
and provide NTP servers for the other nodes of the SCS installation.
158+
- The NTP servers used, should not run on virtual hardware
159+
(Depending on the architecture and the virtualization platform, this can otherwise cause minor or major problems in special situations.)
160+
161+
## Openstack
162+
163+
### Hardware Concept
164+
165+
TBD:
166+
167+
- How many compute nodes are needed?
168+
- Are local NVMe needed?
169+
- Are GPUs needed?
170+
171+
## Ceph Storage
172+
173+
### General
174+
175+
TBD:
176+
177+
- Amount of usable storage
178+
- External Ceph storage installation?
179+
- What is the purpose of your storage?
180+
- Fast NVMe disks?
181+
- More read/write intensive workloads or mixed?
182+
- Huge amounts of data, but perfomance is a second level requirement?
183+
- Object Storage?
184+
- ...
185+
- What kind of network storage is needed?
186+
- Spinners
187+
- NVMe/SSD
188+
- Dedicated ceph environment or hyperconverged setup?
189+
- Crush / Failure domain properies
190+
- Failure domains?
191+
- Erasure encoded?
192+
- Inter datacenter replication?
193+
- ...
194+
195+
### Disk Storage
196+
197+
- What use cases can be expected and on what scale?
198+
199+
### Object Storage
200+
201+
- Rados Gateway Setup

0 commit comments

Comments
 (0)