Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic cron that trim events_log is never executed #65

Open
stouch opened this issue Feb 6, 2025 · 3 comments
Open

Automatic cron that trim events_log is never executed #65

stouch opened this issue Feb 6, 2025 · 3 comments

Comments

@stouch
Copy link

stouch commented Feb 6, 2025

In my understanding of the jitsu install on my kubernetes cluster, I thought there was some CRON that recurrently trim events_log to keep only the latest rows (so we can browse the last events in the console, but we don't keep the old ones).

But this trim task is never made and so I have to connect every two weeks to my cluster to the jitsu pod and I manually do :

clickhouse client -u jitsu
# password

USE newjitsu_metrics;

ALTER TABLE events_log DELETE WHERE timestamp < toDateTime('<SOME_RECENT_DATE>');

There is no reason to keep these logs, right ? Because they already ve been pushed to our connected databases..

Why this trim task never occurs ?

@echozio
Copy link
Contributor

echozio commented Feb 6, 2025

Do you not see the CronJob in your Kubernetes cluster or does it just not actually trim the events_log table? Chart versions v1.8.0 and later should create the CronJob unless explicitly disabled.

The trim job only does this, which you can also do manually:

curl --silent --output /dev/null --show-error -H "Authorization: Bearer $CONSOLE_AUTH_TOKEN" "http://jitsu-console:3000/api/admin/events-log-trim"

If that doesn't behave as you expect the console logs might tell you why.

From a quick look at the code responsible for this it doesn't seem to delete events of a certain age but rather events over a set limit (default: 200k), but there may also be other reasons. Relevant upstream code is here: https://github.com/jitsucom/jitsu/blob/6ddde4fbda0a27d84ebe62cf092a4ac5beb391e0/webapps/console/pages/api/admin/events-log-trim.ts

@stouch
Copy link
Author

stouch commented Feb 17, 2025

I can't see the output of the CRON in the cluster. But it occurs (finished 13h ago, during the night for us in FR).

Image

The GET on api/admin/events-log-trim returns:

{
  "status": "ok"
}

Right now, SELECT timestamp FROM newjitsu_metrics.events_log ORDER BY timestamp ASC LIMIT 1; gives 2025-02-12 00:00:23 (since I deleted everything recently)

and SELECT COUNT(*) FROM newjitsu_metrics.events_log gives ~650K

@echozio
Copy link
Contributor

echozio commented Feb 17, 2025

Hard to say from the count alone whether or not it should have trimmed anything. I'd suggest looking at the console logs right after calling the events-log-trim endpoint. A query you could try (from the code linked in my previous comment) is:

select actorId, type, count(*) from newjitsu_metrics.events_log group by actorId, type having count(*) > 250000;

If you get any results right after running a trim there might be something going wrong in the trim process, otherwise your events are likely spread across actorIds or types in such a way that they don't exceed the limit. In that case perhaps lowering the limit by setting EVENTS_LOG_SIZE to something less than 200,000 in the console's environment will get it behaving the way you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants