Skip to content

Commit bdcaf0c

Browse files
authored
Add SupportBundle collection functionalities (#140)
This PR adds support bundle collection functionalities to Theia manager. A new API endpoint apis/system.theia.antrea.io/v1alpha1/supportbundles is added to trigger and retrieve log bundle collection. The following logs will be collected as part of the operation: - Flow Aggregator logs (copied from /var/log/antrea/flow-aggregator/) - Theia Manager logs (local log files) - ClickHouse server logs (copied from /var/log/clickhouse-server/) - Grafana logs (copied from /var/log/grafana/) - Spark operator, driver and executor logs, if available during PR job (via streaming console logs) - zookeeper (via streaming console logs) Logging configuration of ClickHouse and Grafana are also added to enable better configuration of log level and detention control. To trigger log collection, or retrieve a log bundle job that finished earlier, run: $ theia supportbundle Theia CLI optional arg --since currently only supports filtering logs collected from Theia manager and Flow aggregator. Support of other components will be added in a future change. Signed-off-by: Shawn Wang <[email protected]>
1 parent ff33a90 commit bdcaf0c

File tree

26 files changed

+1763
-6
lines changed

26 files changed

+1763
-6
lines changed

Diff for: build/charts/theia/README.md

+11
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ Kubernetes: `>= 1.16.0-0`
2828
| clickhouse.cluster.zookeeperHosts | list | `[]` | To use a pre-installed ZooKeeper for ClickHouse data replication, please provide a list of your ZooKeeper hosts. To install a customized ZooKeeper, refer to <https://github.com/Altinity/clickhouse-operator/blob/master/docs/zookeeper_setup.md> |
2929
| clickhouse.connectionSecret | object | `{"password":"clickhouse_operator_password","username":"clickhouse_operator"}` | Credentials to connect to ClickHouse. They will be stored in a secret. |
3030
| clickhouse.image | object | `{"pullPolicy":"IfNotPresent","repository":"projects.registry.vmware.com/antrea/theia-clickhouse-server","tag":""}` | Container image used by ClickHouse. |
31+
| clickhouse.logger.count | int | `4` | The number of archived log files that ClickHouse stores. |
32+
| clickhouse.logger.level | string | `"information"` | Logging level. Acceptable values: trace, debug, information, warning, error. |
33+
| clickhouse.logger.size | string | `"100M"` | Size of log files. Applies to log and errorlog. Once the file reaches size, ClickHouse archives and renames it, and creates a new log file in its place. |
3134
| clickhouse.monitor.deletePercentage | float | `0.5` | The percentage of records in ClickHouse that will be deleted when the storage grows above threshold. Vary from 0 to 1. |
3235
| clickhouse.monitor.enable | bool | `true` | Determine whether to run a monitor to periodically check the ClickHouse memory usage and clean data. |
3336
| clickhouse.monitor.execInterval | string | `"1m"` | The time interval between two round of monitoring. Can be a plain integer using one of these unit suffixes ns, us (or µs), ms, s, m, h. |
@@ -50,6 +53,14 @@ Kubernetes: `>= 1.16.0-0`
5053
| grafana.homeDashboard | string | `"homepage.json"` | Default home dashboard. |
5154
| grafana.image | object | `{"pullPolicy":"IfNotPresent","repository":"projects.registry.vmware.com/antrea/theia-grafana","tag":"8.3.3"}` | Container image used by Grafana. |
5255
| grafana.installPlugins | list | `["https://downloads.antrea.io/artifacts/grafana-custom-plugins/theia-grafana-sankey-plugin-1.0.2.zip;theia-grafana-sankey-plugin","https://downloads.antrea.io/artifacts/grafana-custom-plugins/theia-grafana-chord-plugin-1.0.1.zip;theia-grafana-chord-plugin","grafana-clickhouse-datasource 1.0.1"]` | Grafana plugins to install. |
56+
| grafana.log | object | `{"daily_rotate":"true","level":"info","log_rotate":"true","max_days":"7","max_lines":"10.4.0-dev0","max_size_shift":"27","mode":"console file"}` | Grafana logging options. |
57+
| grafana.log.daily_rotate | string | `"true"` | Enable daily rotation of files, valid options are false or true. Default is true. Only applicable when “file” used in [log] mode. |
58+
| grafana.log.level | string | `"info"` | Logging level. Options are “debug”, “info”, “warn”, “error”, and “critical”. Default is info. |
59+
| grafana.log.log_rotate | string | `"true"` | Enable automated log rotation, valid options are false or true. Default is true. When enabled use the max_lines, max_size_shift, daily_rotate and max_days to configure the behavior of the log rotation. Only applicable when “file” used in [log] mode. |
60+
| grafana.log.max_days | string | `"7"` | Maximum number of days to keep log files. Default is "7". Only applicable when “file” used in [log] mode. |
61+
| grafana.log.max_lines | string | `"10.4.0-dev0"` | Maximum lines per file before rotating it. Default is "10.4.0-dev0". Only applicable when “file” used in [log] mode. |
62+
| grafana.log.max_size_shift | string | `"27"` | Maximum size of file before rotating it. Default is "27", which means 1 << 27, 128MB. Only applicable when “file” used in [log] mode. |
63+
| grafana.log.mode | string | `"console file"` | Logging mode. Options are “console”, “file”, and “syslog”. Default is “console” and “file”. Use spaces to separate multiple modes, e.g. console file |
5364
| grafana.loginSecret | object | `{"password":"admin","username":"admin"}` | Credentials to login to Grafana. They will be stored in a Secret. |
5465
| grafana.securityContext | object | `{"fsGroup":472,"supplementalGroups":[0]}` | Set securityContext. Use a specific uid, gid for grafana. |
5566
| grafana.service.tcpPort | int | `3000` | TCP port number for the Grafana service. |

Diff for: build/charts/theia/templates/clickhouse/clickhouseinstallation.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ spec:
4242
- host: {{ $host }}
4343
{{- end }}
4444
{{- end }}
45+
settings:
46+
logger/level: {{ .Values.clickhouse.logger.level }}
47+
logger/size: {{ .Values.clickhouse.logger.size }}
48+
logger/count: {{ .Values.clickhouse.logger.count }}
4549
defaults:
4650
templates:
4751
podTemplate: pod-template

Diff for: build/charts/theia/templates/grafana/deployment.yaml

+14
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,20 @@ spec:
6161
key: admin-password
6262
- name: GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH
6363
value: /opt/grafana/dashboards/{{ .Values.grafana.homeDashboard }}
64+
- name: GF_LOG_MODE
65+
value: '{{ .Values.grafana.log.mode }}'
66+
- name: GF_LOG_LEVEL
67+
value: '{{ .Values.grafana.log.level }}'
68+
- name: GF_LOG_FILE_LOG_ROTATE
69+
value: '{{ .Values.grafana.log.log_rotate }}'
70+
- name: GF_LOG_FILE_MAX_LINES
71+
value: '{{ .Values.grafana.log.max_lines }}'
72+
- name: GF_LOG_FILE_MAX_SIZE_SHIFT
73+
value: '{{ .Values.grafana.log.max_size_shift }}'
74+
- name: GF_LOG_FILE_DAILY_ROTATE
75+
value: '{{ .Values.grafana.log.daily_rotate }}'
76+
- name: GF_LOG_FILE_MAX_DAYS
77+
value: '{{ .Values.grafana.log.max_days }}'
6478
ports:
6579
- containerPort: 3000
6680
name: http-grafana

Diff for: build/charts/theia/templates/theia-cli/clusterrole.yaml

+14
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,18 @@ rules:
2121
- clickhouse
2222
verbs:
2323
- get
24+
- apiGroups:
25+
- system.theia.antrea.io
26+
resources:
27+
- supportbundles
28+
verbs:
29+
- get
30+
- create
31+
- delete
32+
- apiGroups:
33+
- system.theia.antrea.io
34+
resources:
35+
- supportbundles/download
36+
verbs:
37+
- get
2438
{{- end }}

Diff for: build/charts/theia/templates/theia-manager/clusterrole.yaml

+7-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,13 @@ rules:
5151
verbs: ["update"]
5252
- apiGroups: [ "" ]
5353
resources: [ "pods" ]
54-
verbs: ["list"]
54+
verbs: ["get", "list"]
55+
- apiGroups: [ "" ]
56+
resources: [ "pods/exec" ]
57+
verbs: ["get", "create"]
58+
- apiGroups: [ "" ]
59+
resources: [ "pods/log" ]
60+
verbs: ["get"]
5561
- apiGroups: [ "" ]
5662
resources: [ "services", "secrets" ]
5763
verbs: ["get"]

Diff for: build/charts/theia/values.yaml

+31
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ clickhouse:
44
repository: "projects.registry.vmware.com/antrea/theia-clickhouse-server"
55
pullPolicy: "IfNotPresent"
66
tag: ""
7+
logger:
8+
# -- Logging level. Acceptable values: trace, debug, information, warning, error.
9+
level: information
10+
# -- Size of log files. Applies to log and errorlog. Once the file reaches size,
11+
# ClickHouse archives and renames it, and creates a new log file in its place.
12+
size: 100M
13+
# -- The number of archived log files that ClickHouse stores.
14+
count: 4
715
monitor:
816
# -- Determine whether to run a monitor to periodically check the ClickHouse
917
# memory usage and clean data.
@@ -126,6 +134,29 @@ grafana:
126134
# -- Determine whether to install Grafana. It is used as a data visualization
127135
# and monitoring tool.
128136
enable: true
137+
# -- Grafana logging options.
138+
log:
139+
# -- Logging mode. Options are “console”, “file”, and “syslog”. Default is “console” and “file”.
140+
# Use spaces to separate multiple modes, e.g. console file
141+
mode: "console file"
142+
# -- Logging level. Options are “debug”, “info”, “warn”, “error”, and “critical”. Default is info.
143+
level: info
144+
# -- Enable automated log rotation, valid options are false or true. Default is true.
145+
# When enabled use the max_lines, max_size_shift, daily_rotate and max_days to configure
146+
# the behavior of the log rotation. Only applicable when “file” used in [log] mode.
147+
log_rotate: "true"
148+
# -- Maximum lines per file before rotating it. Default is "1000000".
149+
# Only applicable when “file” used in [log] mode.
150+
max_lines: "1000000"
151+
# -- Maximum size of file before rotating it. Default is "27", which means 1 << 27, 128MB.
152+
# Only applicable when “file” used in [log] mode.
153+
max_size_shift: "27"
154+
# -- Enable daily rotation of files, valid options are false or true. Default is true.
155+
# Only applicable when “file” used in [log] mode.
156+
daily_rotate: "true"
157+
# -- Maximum number of days to keep log files. Default is "7".
158+
# Only applicable when “file” used in [log] mode.
159+
max_days: "7"
129160
# -- Set securityContext.
130161
# Use a specific uid, gid for grafana.
131162
securityContext:

Diff for: build/yamls/flow-visibility.yml

+18
Original file line numberDiff line numberDiff line change
@@ -6025,6 +6025,20 @@ spec:
60256025
name: grafana-secret
60266026
- name: GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH
60276027
value: /opt/grafana/dashboards/homepage.json
6028+
- name: GF_LOG_MODE
6029+
value: console file
6030+
- name: GF_LOG_LEVEL
6031+
value: info
6032+
- name: GF_LOG_FILE_LOG_ROTATE
6033+
value: "true"
6034+
- name: GF_LOG_FILE_MAX_LINES
6035+
value: "1000000"
6036+
- name: GF_LOG_FILE_MAX_SIZE_SHIFT
6037+
value: "27"
6038+
- name: GF_LOG_FILE_DAILY_ROTATE
6039+
value: "true"
6040+
- name: GF_LOG_FILE_MAX_DAYS
6041+
value: "7"
60286042
image: projects.registry.vmware.com/antrea/theia-grafana:8.3.3
60296043
imagePullPolicy: IfNotPresent
60306044
livenessProbe:
@@ -6287,6 +6301,10 @@ spec:
62876301
replicasCount: 1
62886302
shardsCount: 1
62896303
name: clickhouse
6304+
settings:
6305+
logger/count: 4
6306+
logger/level: information
6307+
logger/size: 100M
62906308
users:
62916309
clickhouse_operator/k8s_secret_password: flow-visibility/clickhouse-secret/password
62926310
clickhouse_operator/networks/ip: ::/0

Diff for: ci/jenkins/test-vmc.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ function deliver_antrea {
337337

338338
control_plane_ip="$(kubectl get nodes -o wide --no-headers=true | awk -v role="$CONTROL_PLANE_NODE_ROLE" '$3 ~ role {print $6}')"
339339

340-
${GIT_CHECKOUT_DIR}/hack/generate-manifest.sh --ch-size 100Mi --ch-monitor-threshold 0.1 > ${GIT_CHECKOUT_DIR}/build/yamls/flow-visibility.yml
340+
${GIT_CHECKOUT_DIR}/hack/generate-manifest.sh --ch-size 100Mi --ch-monitor-threshold 0.1 --theia-manager > ${GIT_CHECKOUT_DIR}/build/yamls/flow-visibility.yml
341341
${GIT_CHECKOUT_DIR}/hack/generate-manifest.sh --no-grafana --spark-operator --theia-manager > ${GIT_CHECKOUT_DIR}/build/yamls/flow-visibility-with-spark.yml
342342
${GIT_CHECKOUT_DIR}/hack/generate-manifest.sh --no-grafana --theia-manager > ${GIT_CHECKOUT_DIR}/build/yamls/flow-visibility-ch-only.yml
343343

Diff for: ci/kind/test-e2e-kind.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ function print_usage {
3838

3939
TESTBED_CMD=$(dirname $0)"/kind-setup.sh"
4040
YML_DIR=$(dirname $0)"/../../build/yamls"
41-
FLOW_VISIBILITY_CMD=$(dirname $0)"/../../hack/generate-manifest.sh --ch-size 100Mi --ch-monitor-threshold 0.1"
41+
FLOW_VISIBILITY_CMD=$(dirname $0)"/../../hack/generate-manifest.sh --ch-size 100Mi --ch-monitor-threshold 0.1 --theia-manager"
4242
FLOW_VISIBILITY_WITH_SPARK_CMD=$(dirname $0)"/../../hack/generate-manifest.sh --no-grafana --spark-operator --theia-manager"
4343
FLOW_VISIBILITY_CH_ONLY_CMD=$(dirname $0)"/../../hack/generate-manifest.sh --no-grafana --theia-manager"
4444
CH_OPERATOR_YML=$(dirname $0)"/../../build/charts/theia/crds/clickhouse-operator-install-bundle.yaml"

Diff for: cmd/theia-manager/theia-manager.go

+3
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ const informerDefaultResync = 12 * time.Hour
4747

4848
func createAPIServerConfig(
4949
client kubernetes.Interface,
50+
kubeConfig *rest.Config,
5051
selfSignedCert bool,
5152
bindPort int,
5253
cipherSuites []uint16,
@@ -92,6 +93,7 @@ func createAPIServerConfig(
9293
return apiserver.NewConfig(
9394
serverConfig,
9495
client,
96+
kubeConfig,
9597
caCertController,
9698
nprq,
9799
chq), nil
@@ -135,6 +137,7 @@ func run(o *Options) error {
135137

136138
apiServerConfig, err := createAPIServerConfig(
137139
kubeClient,
140+
kubeConfig,
138141
*o.config.APIServer.SelfSignedCert,
139142
o.config.APIServer.APIPort,
140143
cipherSuites,

Diff for: go.mod

+26
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ require (
1010
github.com/google/uuid v1.1.2
1111
github.com/kevinburke/ssh_config v0.0.0-20190725054713-01f96b0aa0cd
1212
github.com/sirupsen/logrus v1.9.0
13+
github.com/spf13/afero v1.9.2
1314
github.com/spf13/cobra v1.4.0
1415
github.com/spf13/pflag v1.0.5
1516
github.com/stretchr/testify v1.7.1
@@ -21,12 +22,37 @@ require (
2122
k8s.io/api v0.24.0
2223
k8s.io/apimachinery v0.24.0
2324
k8s.io/apiserver v0.24.0
25+
k8s.io/cli-runtime v0.24.0
2426
k8s.io/client-go v0.24.0
2527
k8s.io/klog/v2 v2.60.1
2628
k8s.io/kube-aggregator v0.24.0
29+
k8s.io/kubectl v0.24.0
2730
k8s.io/utils v0.0.0-20220210201930-3a6ce19ff2f9
2831
)
2932

33+
require (
34+
github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
35+
github.com/MakeNowJust/heredoc v0.0.0-20170808103936-bb23615498cd // indirect
36+
github.com/chai2010/gettext-go v0.0.0-20160711120539-c6fed771bfd5 // indirect
37+
github.com/exponent-io/jsonpath v0.0.0-20151013193312-d6023ce2651d // indirect
38+
github.com/fatih/camelcase v1.0.0 // indirect
39+
github.com/fvbommel/sortorder v1.0.1 // indirect
40+
github.com/go-errors/errors v1.0.1 // indirect
41+
github.com/google/btree v1.0.1 // indirect
42+
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
43+
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7 // indirect
44+
github.com/liggitt/tabwriter v0.0.0-20181228230101-89fcab3d43de // indirect
45+
github.com/mitchellh/go-wordwrap v1.0.0 // indirect
46+
github.com/moby/term v0.0.0-20210619224110-3f7ff695adc6 // indirect
47+
github.com/monochromegane/go-gitignore v0.0.0-20200626010858-205db1a8cc00 // indirect
48+
github.com/peterbourgon/diskv v2.0.1+incompatible // indirect
49+
github.com/russross/blackfriday v1.5.2 // indirect
50+
github.com/xlab/treeprint v0.0.0-20181112141820-a009c3971eca // indirect
51+
go.starlark.net v0.0.0-20200306205701-8dd3e2ee1dd5 // indirect
52+
sigs.k8s.io/kustomize/api v0.11.4 // indirect
53+
sigs.k8s.io/kustomize/kyaml v0.13.6 // indirect
54+
)
55+
3056
require (
3157
antrea.io/libOpenflow v0.8.0 // indirect
3258
antrea.io/ofnet v0.6.1 // indirect

0 commit comments

Comments
 (0)