Automatic cron that trim events_log is never executed #65

stouch · 2025-02-06T20:01:07Z

In my understanding of the jitsu install on my kubernetes cluster, I thought there was some CRON that recurrently trim events_log to keep only the latest rows (so we can browse the last events in the console, but we don't keep the old ones).

But this trim task is never made and so I have to connect every two weeks to my cluster to the jitsu pod and I manually do :

clickhouse client -u jitsu
# password

USE newjitsu_metrics;

ALTER TABLE events_log DELETE WHERE timestamp < toDateTime('<SOME_RECENT_DATE>');

There is no reason to keep these logs, right ? Because they already ve been pushed to our connected databases..

Why this trim task never occurs ?

The text was updated successfully, but these errors were encountered:

echozio · 2025-02-06T22:08:32Z

Do you not see the CronJob in your Kubernetes cluster or does it just not actually trim the events_log table? Chart versions v1.8.0 and later should create the CronJob unless explicitly disabled.

The trim job only does this, which you can also do manually:

curl --silent --output /dev/null --show-error -H "Authorization: Bearer $CONSOLE_AUTH_TOKEN" "http://jitsu-console:3000/api/admin/events-log-trim"

If that doesn't behave as you expect the console logs might tell you why.

From a quick look at the code responsible for this it doesn't seem to delete events of a certain age but rather events over a set limit (default: 200k), but there may also be other reasons. Relevant upstream code is here: https://github.com/jitsucom/jitsu/blob/6ddde4fbda0a27d84ebe62cf092a4ac5beb391e0/webapps/console/pages/api/admin/events-log-trim.ts

stouch · 2025-02-17T13:14:32Z

I can't see the output of the CRON in the cluster. But it occurs (finished 13h ago, during the night for us in FR).

The GET on api/admin/events-log-trim returns:

{
  "status": "ok"
}

Right now, SELECT timestamp FROM newjitsu_metrics.events_log ORDER BY timestamp ASC LIMIT 1; gives 2025-02-12 00:00:23 (since I deleted everything recently)

and SELECT COUNT(*) FROM newjitsu_metrics.events_log gives ~650K

echozio · 2025-02-17T14:40:56Z

Hard to say from the count alone whether or not it should have trimmed anything. I'd suggest looking at the console logs right after calling the events-log-trim endpoint. A query you could try (from the code linked in my previous comment) is:

select actorId, type, count(*) from newjitsu_metrics.events_log group by actorId, type having count(*) > 250000;

If you get any results right after running a trim there might be something going wrong in the trim process, otherwise your events are likely spread across actorIds or types in such a way that they don't exceed the limit. In that case perhaps lowering the limit by setting EVENTS_LOG_SIZE to something less than 200,000 in the console's environment will get it behaving the way you want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic cron that trim events_log is never executed #65

Automatic cron that trim events_log is never executed #65

stouch commented Feb 6, 2025

echozio commented Feb 6, 2025

stouch commented Feb 17, 2025 •

edited

Loading

echozio commented Feb 17, 2025

Automatic cron that trim events_log is never executed #65

Automatic cron that trim events_log is never executed #65

Comments

stouch commented Feb 6, 2025

echozio commented Feb 6, 2025

stouch commented Feb 17, 2025 • edited Loading

echozio commented Feb 17, 2025

stouch commented Feb 17, 2025 •

edited

Loading