Skip to content

Commit 33cc583

Browse files
authored
Support compression for Actions logs (#31761)
Support compression for Actions logs to save storage space and bandwidth. Inspired by #24256 (comment) The biggest challenge is that the compression format should support [seekable](https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md). So when users are viewing a part of the log lines, Gitea doesn't need to download the whole compressed file and decompress it. That means gzip cannot help here. And I did research, there aren't too many choices, like bgzip and xz, but I think zstd is the most popular one. It has an implementation in Golang with [zstd](https://github.com/klauspost/compress/tree/master/zstd) and [zstd-seekable-format-go](https://github.com/SaveTheRbtz/zstd-seekable-format-go), and what is better is that it has good compatibility: a seekable format zstd file can be read by a regular zstd reader. This PR introduces a new package `zstd` to combine and wrap the two packages, to provide a unified and easy-to-use API. And a new setting `LOG_COMPRESSION` is added to the config, although I don't see any reason why not to use compression, I think's it's a good idea to keep the default with `none` to be consistent with old versions. `LOG_COMPRESSION` takes effect for only new log files, it adds `.zst` as an extension to the file name, so Gitea can determine if it needs decompression according to the file name when reading. Old files will keep the format since it's not worth converting them, as they will be cleared after #31735. <img width="541" alt="image" src="https://github.com/user-attachments/assets/e9598764-a4e0-4b68-8c2b-f769265183c9">
1 parent 791d7fc commit 33cc583

File tree

15 files changed

+615
-9
lines changed

15 files changed

+615
-9
lines changed

assets/go-licenses.json

+10
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

custom/conf/app.example.ini

+6
Original file line numberDiff line numberDiff line change
@@ -2687,6 +2687,12 @@ LEVEL = Info
26872687
;DEFAULT_ACTIONS_URL = github
26882688
;; Logs retention time in days. Old logs will be deleted after this period.
26892689
;LOG_RETENTION_DAYS = 365
2690+
;; Log compression type, `none` for no compression, `zstd` for zstd compression.
2691+
;; Other compression types like `gzip` if NOT supported, since seekable stream is required for log view.
2692+
;; It's always recommended to use compression when using local disk as log storage if CPU or memory is not a bottleneck.
2693+
;; And for object storage services like S3, which is billed for requests, it would cause extra 2 times of get requests for each log view.
2694+
;; But it will save storage space and network bandwidth, so it's still recommended to use compression.
2695+
;LOG_COMPRESSION = none
26902696
;; Default artifact retention time in days. Artifacts could have their own retention periods by setting the `retention-days` option in `actions/upload-artifact` step.
26912697
;ARTIFACT_RETENTION_DAYS = 90
26922698
;; Timeout to stop the task which have running status, but haven't been updated for a long time

go.mod

+2
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ require (
2020
github.com/Azure/go-ntlmssp v0.0.0-20221128193559-754e69321358
2121
github.com/ProtonMail/go-crypto v1.0.0
2222
github.com/PuerkitoBio/goquery v1.9.2
23+
github.com/SaveTheRbtz/zstd-seekable-format-go/pkg v0.7.2
2324
github.com/alecthomas/chroma/v2 v2.14.0
2425
github.com/blakesmith/ar v0.0.0-20190502131153-809d4375e1fb
2526
github.com/blevesearch/bleve/v2 v2.4.2
@@ -209,6 +210,7 @@ require (
209210
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
210211
github.com/golang/protobuf v1.5.4 // indirect
211212
github.com/golang/snappy v0.0.4 // indirect
213+
github.com/google/btree v1.1.2 // indirect
212214
github.com/google/go-querystring v1.1.0 // indirect
213215
github.com/google/go-tpm v0.9.0 // indirect
214216
github.com/gorilla/css v1.0.1 // indirect

go.sum

+4
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ github.com/RoaringBitmap/roaring v0.4.23/go.mod h1:D0gp8kJQgE1A4LQ5wFLggQEyvDi06
8080
github.com/RoaringBitmap/roaring v0.7.1/go.mod h1:jdT9ykXwHFNdJbEtxePexlFYH9LXucApeS0/+/g+p1I=
8181
github.com/RoaringBitmap/roaring v1.9.4 h1:yhEIoH4YezLYT04s1nHehNO64EKFTop/wBhxv2QzDdQ=
8282
github.com/RoaringBitmap/roaring v1.9.4/go.mod h1:6AXUsoIEzDTFFQCe1RbGA6uFONMhvejWj5rqITANK90=
83+
github.com/SaveTheRbtz/zstd-seekable-format-go/pkg v0.7.2 h1:cSXom2MoKJ9KPPw29RoZtHvUETY4F4n/kXl8m9btnQ0=
84+
github.com/SaveTheRbtz/zstd-seekable-format-go/pkg v0.7.2/go.mod h1:JitQWJ8JuV4Y87l8VsHiiwhb3cgdyn68mX40s7NT6PA=
8385
github.com/alecthomas/assert/v2 v2.7.0 h1:QtqSACNS3tF7oasA8CU6A6sXZSBDqnm7RfpLl9bZqbE=
8486
github.com/alecthomas/assert/v2 v2.7.0/go.mod h1:Bze95FyfUr7x34QZrjL+XP+0qgp/zg8yS+TtBj1WA3k=
8587
github.com/alecthomas/chroma/v2 v2.2.0/go.mod h1:vf4zrexSH54oEjJ7EdB65tGNHmH3pGZmVkgTP5RHvAs=
@@ -395,6 +397,8 @@ github.com/golang/snappy v0.0.1/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEW
395397
github.com/golang/snappy v0.0.2/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
396398
github.com/golang/snappy v0.0.4 h1:yAGX7huGHXlcLOEtBnF4w7FQwA26wojNCwOYAEhLjQM=
397399
github.com/golang/snappy v0.0.4/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
400+
github.com/google/btree v1.1.2 h1:xf4v41cLI2Z6FxbKm+8Bu+m8ifhj15JuZ9sa0jZCMUU=
401+
github.com/google/btree v1.1.2/go.mod h1:qOPhT0dTNdNzV6Z/lhRX0YXUafgPLFUh+gZMl761Gm4=
398402
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
399403
github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
400404
github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=

models/actions/task.go

+7-1
Original file line numberDiff line numberDiff line change
@@ -502,7 +502,13 @@ func convertTimestamp(timestamp *timestamppb.Timestamp) timeutil.TimeStamp {
502502
}
503503

504504
func logFileName(repoFullName string, taskID int64) string {
505-
return fmt.Sprintf("%s/%02x/%d.log", repoFullName, taskID%256, taskID)
505+
ret := fmt.Sprintf("%s/%02x/%d.log", repoFullName, taskID%256, taskID)
506+
507+
if setting.Actions.LogCompression.IsZstd() {
508+
ret += ".zst"
509+
}
510+
511+
return ret
506512
}
507513

508514
func getTaskIDFromCache(token string) int64 {

modules/actions/log.go

+47-2
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import (
1515
"code.gitea.io/gitea/models/dbfs"
1616
"code.gitea.io/gitea/modules/log"
1717
"code.gitea.io/gitea/modules/storage"
18+
"code.gitea.io/gitea/modules/zstd"
1819

1920
runnerv1 "code.gitea.io/actions-proto-go/runner/v1"
2021
"google.golang.org/protobuf/types/known/timestamppb"
@@ -28,6 +29,9 @@ const (
2829
defaultBufSize = MaxLineSize
2930
)
3031

32+
// WriteLogs appends logs to DBFS file for temporary storage.
33+
// It doesn't respect the file format in the filename like ".zst", since it's difficult to reopen a closed compressed file and append new content.
34+
// Why doesn't it store logs in object storage directly? Because it's not efficient to append content to object storage.
3135
func WriteLogs(ctx context.Context, filename string, offset int64, rows []*runnerv1.LogRow) ([]int, error) {
3236
flag := os.O_WRONLY
3337
if offset == 0 {
@@ -106,6 +110,17 @@ func ReadLogs(ctx context.Context, inStorage bool, filename string, offset, limi
106110
return rows, nil
107111
}
108112

113+
const (
114+
// logZstdBlockSize is the block size for zstd compression.
115+
// 128KB leads the compression ratio to be close to the regular zstd compression.
116+
// And it means each read from the underlying object storage will be at least 128KB*(compression ratio).
117+
// The compression ratio is about 30% for text files, so the actual read size is about 38KB, which should be acceptable.
118+
logZstdBlockSize = 128 * 1024 // 128KB
119+
)
120+
121+
// TransferLogs transfers logs from DBFS to object storage.
122+
// It happens when the file is complete and no more logs will be appended.
123+
// It respects the file format in the filename like ".zst", and compresses the content if needed.
109124
func TransferLogs(ctx context.Context, filename string) (func(), error) {
110125
name := DBFSPrefix + filename
111126
remove := func() {
@@ -119,7 +134,26 @@ func TransferLogs(ctx context.Context, filename string) (func(), error) {
119134
}
120135
defer f.Close()
121136

122-
if _, err := storage.Actions.Save(filename, f, -1); err != nil {
137+
var reader io.Reader = f
138+
if strings.HasSuffix(filename, ".zst") {
139+
r, w := io.Pipe()
140+
reader = r
141+
zstdWriter, err := zstd.NewSeekableWriter(w, logZstdBlockSize)
142+
if err != nil {
143+
return nil, fmt.Errorf("zstd NewSeekableWriter: %w", err)
144+
}
145+
go func() {
146+
defer func() {
147+
_ = w.CloseWithError(zstdWriter.Close())
148+
}()
149+
if _, err := io.Copy(zstdWriter, f); err != nil {
150+
_ = w.CloseWithError(err)
151+
return
152+
}
153+
}()
154+
}
155+
156+
if _, err := storage.Actions.Save(filename, reader, -1); err != nil {
123157
return nil, fmt.Errorf("storage save %q: %w", filename, err)
124158
}
125159
return remove, nil
@@ -150,11 +184,22 @@ func OpenLogs(ctx context.Context, inStorage bool, filename string) (io.ReadSeek
150184
}
151185
return f, nil
152186
}
187+
153188
f, err := storage.Actions.Open(filename)
154189
if err != nil {
155190
return nil, fmt.Errorf("storage open %q: %w", filename, err)
156191
}
157-
return f, nil
192+
193+
var reader io.ReadSeekCloser = f
194+
if strings.HasSuffix(filename, ".zst") {
195+
r, err := zstd.NewSeekableReader(f)
196+
if err != nil {
197+
return nil, fmt.Errorf("zstd NewSeekableReader: %w", err)
198+
}
199+
reader = r
200+
}
201+
202+
return reader, nil
158203
}
159204

160205
func FormatLog(timestamp time.Time, content string) string {

modules/packages/conda/metadata.go

+1-2
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@ import (
1313
"code.gitea.io/gitea/modules/json"
1414
"code.gitea.io/gitea/modules/util"
1515
"code.gitea.io/gitea/modules/validation"
16-
17-
"github.com/klauspost/compress/zstd"
16+
"code.gitea.io/gitea/modules/zstd"
1817
)
1918

2019
var (

modules/packages/conda/metadata_test.go

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ import (
1010
"io"
1111
"testing"
1212

13+
"code.gitea.io/gitea/modules/zstd"
14+
1315
"github.com/dsnet/compress/bzip2"
14-
"github.com/klauspost/compress/zstd"
1516
"github.com/stretchr/testify/assert"
1617
)
1718

modules/packages/debian/metadata.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ import (
1414

1515
"code.gitea.io/gitea/modules/util"
1616
"code.gitea.io/gitea/modules/validation"
17+
"code.gitea.io/gitea/modules/zstd"
1718

1819
"github.com/blakesmith/ar"
19-
"github.com/klauspost/compress/zstd"
2020
"github.com/ulikunitz/xz"
2121
)
2222

modules/packages/debian/metadata_test.go

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ import (
1010
"io"
1111
"testing"
1212

13+
"code.gitea.io/gitea/modules/zstd"
14+
1315
"github.com/blakesmith/ar"
14-
"github.com/klauspost/compress/zstd"
1516
"github.com/stretchr/testify/assert"
1617
"github.com/ulikunitz/xz"
1718
)

modules/setting/actions.go

+19
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ var (
1717
Enabled bool
1818
LogStorage *Storage // how the created logs should be stored
1919
LogRetentionDays int64 `ini:"LOG_RETENTION_DAYS"`
20+
LogCompression logCompression `ini:"LOG_COMPRESSION"`
2021
ArtifactStorage *Storage // how the created artifacts should be stored
2122
ArtifactRetentionDays int64 `ini:"ARTIFACT_RETENTION_DAYS"`
2223
DefaultActionsURL defaultActionsURL `ini:"DEFAULT_ACTIONS_URL"`
@@ -54,6 +55,20 @@ const (
5455
// please consider to use `uses: https://the_url_you_want_to_use/username/action_name@version` instead.
5556
)
5657

58+
type logCompression string
59+
60+
func (c logCompression) IsValid() bool {
61+
return c.IsNone() || c.IsZstd()
62+
}
63+
64+
func (c logCompression) IsNone() bool {
65+
return c == "" || strings.ToLower(string(c)) == "none"
66+
}
67+
68+
func (c logCompression) IsZstd() bool {
69+
return strings.ToLower(string(c)) == "zstd"
70+
}
71+
5772
func loadActionsFrom(rootCfg ConfigProvider) error {
5873
sec := rootCfg.Section("actions")
5974
err := sec.MapTo(&Actions)
@@ -100,5 +115,9 @@ func loadActionsFrom(rootCfg ConfigProvider) error {
100115
Actions.EndlessTaskTimeout = sec.Key("ENDLESS_TASK_TIMEOUT").MustDuration(3 * time.Hour)
101116
Actions.AbandonedJobTimeout = sec.Key("ABANDONED_JOB_TIMEOUT").MustDuration(24 * time.Hour)
102117

118+
if !Actions.LogCompression.IsValid() {
119+
return fmt.Errorf("invalid [actions] LOG_COMPRESSION: %q", Actions.LogCompression)
120+
}
121+
103122
return nil
104123
}

modules/zstd/option.go

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
// Copyright 2024 The Gitea Authors. All rights reserved.
2+
// SPDX-License-Identifier: MIT
3+
4+
package zstd
5+
6+
import "github.com/klauspost/compress/zstd"
7+
8+
type WriterOption = zstd.EOption
9+
10+
var (
11+
WithEncoderCRC = zstd.WithEncoderCRC
12+
WithEncoderConcurrency = zstd.WithEncoderConcurrency
13+
WithWindowSize = zstd.WithWindowSize
14+
WithEncoderPadding = zstd.WithEncoderPadding
15+
WithEncoderLevel = zstd.WithEncoderLevel
16+
WithZeroFrames = zstd.WithZeroFrames
17+
WithAllLitEntropyCompression = zstd.WithAllLitEntropyCompression
18+
WithNoEntropyCompression = zstd.WithNoEntropyCompression
19+
WithSingleSegment = zstd.WithSingleSegment
20+
WithLowerEncoderMem = zstd.WithLowerEncoderMem
21+
WithEncoderDict = zstd.WithEncoderDict
22+
WithEncoderDictRaw = zstd.WithEncoderDictRaw
23+
)
24+
25+
type EncoderLevel = zstd.EncoderLevel
26+
27+
const (
28+
SpeedFastest EncoderLevel = zstd.SpeedFastest
29+
SpeedDefault EncoderLevel = zstd.SpeedDefault
30+
SpeedBetterCompression EncoderLevel = zstd.SpeedBetterCompression
31+
SpeedBestCompression EncoderLevel = zstd.SpeedBestCompression
32+
)
33+
34+
type ReaderOption = zstd.DOption
35+
36+
var (
37+
WithDecoderLowmem = zstd.WithDecoderLowmem
38+
WithDecoderConcurrency = zstd.WithDecoderConcurrency
39+
WithDecoderMaxMemory = zstd.WithDecoderMaxMemory
40+
WithDecoderDicts = zstd.WithDecoderDicts
41+
WithDecoderDictRaw = zstd.WithDecoderDictRaw
42+
WithDecoderMaxWindow = zstd.WithDecoderMaxWindow
43+
WithDecodeAllCapLimit = zstd.WithDecodeAllCapLimit
44+
WithDecodeBuffersBelow = zstd.WithDecodeBuffersBelow
45+
IgnoreChecksum = zstd.IgnoreChecksum
46+
)

0 commit comments

Comments
 (0)