ObjectStore failure resulting in Rabbit virus like behaviour i.e storage is full.

### Describe the bug

Risingwave writes to state store even when nothing is going on. This keeps on until there is no storage left on the secondary storage device. 

### Error message/log

```text
:recovery_attempt: risingwave_meta::barrier::context::recovery: post applied reschedule for jobs in offline scaling jobs=[63, 177, 13, 132, 178, 192, 189, 182, 162, 15]
2025-10-23T17:58:28.731216541Z  INFO      rw-standalone-meta failure_recovery{error=gRPC request to stream service failed: Internal error: failed to complete epoch: [6 4294967295](tel:64294967295) EpochPair { curr: 9437643190108160, prev: 9437643121360896 } Storage error: Hummock error: Other error: failed to sync: ObjectStore failed with IO error: Unexpected (persistent) at Writer::close, context: { timeout: 5 } => io operation timeout reached
```

### To Reproduce

Create a source, materialized view, table and external sink to clickhouse etc.
Drop and recreate the sources and materialized views periodically, often accompanying a change in schema. 
Leave device on standby for a few hours.

### Expected behavior

I did not expect my SSD storage to fill up.

### How did you deploy RisingWave?

via Docker Compose. Here it is:

---
x-image: &image
  image: ${RW_IMAGE:-risingwavelabs/risingwave:v2.5.1}
services:
  risingwave-standalone:
    <<: *image
    command: "standalone --meta-opts=\" \
                    --listen-addr 0.0.0.0:5690 \
                    --advertise-addr 0.0.0.0:5690 \
                    --dashboard-host 0.0.0.0:5691 \
                    --prometheus-host 0.0.0.0:1250 \
                    --prometheus-endpoint http://prometheus-0:9500 \
                    --backend sql \
                    --sql-endpoint postgres://postgres:@postgres-0:5432/metadata \
                    --state-store hummock+minio://hummockadmin:hummockadmin@minio-0:9301/hummock001 \
                    --data-directory hummock_001 \
                    --config-path /risingwave.toml\" \
                 --compute-opts=\" \
                    --config-path /risingwave.toml \
                    --listen-addr 0.0.0.0:5688 \
                    --prometheus-listener-addr 0.0.0.0:1250 \
                    --advertise-addr 0.0.0.0:5688 \
                    --async-stack-trace verbose \
                    --parallelism 8 \
                    --total-memory-bytes 6442450944 \
                    --role both \
                    --meta-address http://0.0.0.0:5690 \
                    --memory-manager-target-bytes 6012954215 \" \
                 --frontend-opts=\" \
                   --config-path /risingwave.toml \
                   --listen-addr 0.0.0.0:4566 \
                   --advertise-addr 0.0.0.0:4566 \
                   --prometheus-listener-addr 0.0.0.0:1250 \
                   --health-check-listener-addr 0.0.0.0:6786 \
                   --meta-addr http://0.0.0.0:5690 \
                   --frontend-total-memory-bytes=1073741824\" \
                 --compactor-opts=\" \
                   --listen-addr 0.0.0.0:6660 \
                   --prometheus-listener-addr 0.0.0.0:1250 \
                   --advertise-addr 0.0.0.0:6660 \
                   --meta-address http://0.0.0.0:5690 \
                   --compactor-total-memory-bytes=1073741824\""
    expose:
      - "6660"
      - "4566"
      - "5688"
      - "5690"
      - "1250"
      - "5691"
    ports:
      - "4566:4566"
      - "5690:5690"
      - "5691:5691"
      - "1250:1250"
    depends_on:
      - postgres-0
      - minio-0
    volumes:
      - "./risingwave.toml:/risingwave.toml"
    environment:
      RUST_BACKTRACE: "1"
      # If ENABLE_TELEMETRY is not set, telemetry will start by default
      ENABLE_TELEMETRY: ${ENABLE_TELEMETRY:-true}
      RW_TELEMETRY_TYPE: ${RW_TELEMETRY_TYPE:-"docker-compose"}
      RW_SECRET_STORE_PRIVATE_KEY_HEX: ${RW_SECRET_STORE_PRIVATE_KEY_HEX:-0123456789abcdef0123456789abcdef}
      RW_LICENSE_KEY: ${RW_LICENSE_KEY:-}
    container_name: risingwave-standalone
    healthcheck:
      test:
        - CMD-SHELL
        - bash -c 'printf \"GET / HTTP/1.1\n\n\" > /dev/tcp/127.0.0.1/6660; exit $$?;'
        - bash -c 'printf \"GET / HTTP/1.1\n\n\" > /dev/tcp/127.0.0.1/5688; exit $$?;'
        - bash -c '> /dev/tcp/127.0.0.1/4566; exit $$?;'
        - bash -c 'printf \"GET / HTTP/1.1\n\n\" > /dev/tcp/127.0.0.1/5690; exit $$?;'
      interval: 1s
      timeout: 5s
    restart: always
    deploy:
      resources:
        limits:
          memory: 8G
        reservations:
          memory: 8G

  postgres-0:
    image: "postgres:15-alpine"
    environment:
      - POSTGRES_HOST_AUTH_METHOD=trust
      - POSTGRES_USER=postgres
      - POSTGRES_DB=metadata
      - POSTGRES_INITDB_ARGS=--encoding=UTF-8 --lc-collate=C --lc-ctype=C
    expose:
      - "5432"
    ports:
      - "8432:5432"
    volumes:
      - "postgres-0:/var/lib/postgresql/data"
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U postgres" ]
      interval: 2s
      timeout: 5s
      retries: 5
    restart: always

  grafana-0:
    image: "grafana/grafana-oss:latest"
    command: [ ]
    expose:
      - "3001"
    ports:
      - "3001:3001"
    depends_on: [ ]
    volumes:
      - "grafana-0:/var/lib/grafana"
      - "./grafana.ini:/etc/grafana/grafana.ini"
      - "./grafana-risedev-datasource.yml:/etc/grafana/provisioning/datasources/grafana-risedev-datasource.yml"
      - "./grafana-risedev-dashboard.yml:/etc/grafana/provisioning/dashboards/grafana-risedev-dashboard.yml"
      - "./dashboards:/dashboards"
    environment: {}
    container_name: grafana-0
    healthcheck:
      test:
        - CMD-SHELL
        - bash -c 'printf \"GET / HTTP/1.1\n\n\" > /dev/tcp/127.0.0.1/3001; exit $$?;'
      interval: 1s
      timeout: 5s
      retries: 5
    restart: always

  minio-0:
    image: "quay.io/minio/minio:latest"
    command:
      - server
      - "--address"
      - "0.0.0.0:9301"
      - "--console-address"
      - "0.0.0.0:9400"
      - /data
    expose:
      - "9301"
      - "9400"
    ports:
      - "9301:9301"
      - "9400:9400"
    depends_on: [ ]
    volumes:
      - "minio-0:/data"
    entrypoint: "

      /bin/sh -c '

      set -e

      mkdir -p \"/data/hummock001\"

      /usr/bin/docker-entrypoint.sh \"$$0\" \"$$@\"

      '"
    environment:
      MINIO_CI_CD: "1"
      MINIO_PROMETHEUS_AUTH_TYPE: public
      MINIO_PROMETHEUS_URL: "http://prometheus-0:9500"
      MINIO_ROOT_PASSWORD: hummockadmin
      MINIO_ROOT_USER: hummockadmin
      MINIO_DOMAIN: "minio-0"
    container_name: minio-0
    healthcheck:
      test:
        - CMD-SHELL
        - bash -c 'printf \"GET / HTTP/1.1\n\n\" > /dev/tcp/127.0.0.1/9301; exit $$?;'
      interval: 1s
      timeout: 5s
      retries: 5
    restart: always

  prometheus-0:
    image: "prom/prometheus:latest"
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.console.libraries=/usr/share/prometheus/console_libraries"
      - "--web.console.templates=/usr/share/prometheus/consoles"
      - "--web.listen-address=0.0.0.0:9500"
      - "--storage.tsdb.retention.time=30d"
    expose:
      - "9500"
    ports:
      - "9500:9500"
    depends_on: [ ]
    volumes:
      - "prometheus-0:/prometheus"
      - "./prometheus.yaml:/etc/prometheus/prometheus.yml"
    environment: {}
    container_name: prometheus-0
    healthcheck:
      test:
        - CMD-SHELL
        - sh -c 'printf "GET /-/healthy HTTP/1.0\n\n" | nc localhost 9500; exit $$?;'
      interval: 1s
      timeout: 5s
      retries: 5
    restart: always

  message_queue:
    image: "redpandadata/redpanda:latest"
    command:
      - redpanda
      - start
      - "--smp"
      - "1"
      - "--reserve-memory"
      - 0M
      - "--memory"
      - 4G
      - "--overprovisioned"
      - "--node-id"
      - "0"
      - "--check=false"
      - "--kafka-addr"
      - "PLAINTEXT://0.0.0.0:29092,OUTSIDE://0.0.0.0:9092"
      - "--advertise-kafka-addr"
      - "PLAINTEXT://message_queue:29092,OUTSIDE://localhost:9092"
    expose:
      - "29092"
      - "9092"
      - "9644"
    ports:
      - "29092:29092"
      - "9092:9092"
      - "9644:9644"
      - "8081:8081"
    depends_on: [ ]
    volumes:
      - "message_queue:/var/lib/redpanda/data"
    environment: {}
    container_name: message_queue
    healthcheck:
      test: curl -f localhost:9644/v1/status/ready
      interval: 1s
      timeout: 5s
      retries: 5
    restart: always
volumes:
  postgres-0:
    external: false
  grafana-0:
    external: false
  minio-0:
    external: false
  prometheus-0:
    external: false
  message_queue:
    external: false

### The version of RisingWave

Version 2.5.1

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ObjectStore failure resulting in Rabbit virus like behaviour i.e storage is full. #23570

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ObjectStore failure resulting in Rabbit virus like behaviour i.e storage is full. #23570

Description

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions