Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: download processed-logs in batches of 50 each #3812

Merged
merged 1 commit into from
Jul 14, 2024

Conversation

trivikr
Copy link
Member

@trivikr trivikr commented Jul 7, 2024

The download stats stopped getting populated in #3697, likely as producing summaries took more than 1000 seconds. It likely happened as downloading and processing files took time.

This PR downloads files in batches of 50 each.
Tests show that time taken for producing summaries reduced from ~6 mins to ~26 seconds for 20240706 with negligible impact of CPU and memory consumption


Before

From logs

2024-07-07 04:00:03.059 PDT 20240706/
2024-07-07 04:06:19.196 PDT Upload complete: nodejs.org-access.log.20240706.json

Output:
nodejs.org-access.log.20240706.before.json

CPU usage: ~20%, memory usage: ~24% before

After

From logs

2024-07-07 16:32:24.123 PDT 20240706/
2024-07-07 16:32:50.183 PDT Upload complete: nodejs.org-access.log.20240706.json

Output:
nodejs.org-access.log.20240706.after.json

CPU usage: ~25%, memory usage: ~28% after

@trivikr trivikr force-pushed the summaries-download-parallel branch from 57a9829 to 62c66ce Compare July 14, 2024 02:11
@trivikr
Copy link
Member Author

trivikr commented Jul 14, 2024

Can this get some reviews?

This fix will ensure that producing summaries will taken less than 30 seconds. This avoids situations like in #3697 which likely happened as producing summaries took more than 1000 seconds.

@targos targos merged commit d97e82f into nodejs:main Jul 14, 2024
1 check passed
@trivikr trivikr deleted the summaries-download-parallel branch July 14, 2024 15:24
@trivikr
Copy link
Member Author

trivikr commented Jul 14, 2024

Deployed.

For this test case, producing summaries still took 7 minutes, despite batching.

Before

2024-07-14 04:00:03.396 PDT 20240713/
2024-07-14 04:07:20.718 PDT Upload complete: nodejs.org-access.log.20240713.json
CPU usage: ~32%, memory usage: ~28% before

After

2024-07-14 08:30:18.099 PDT 20240713/
2024-07-14 08:37:01.322 PDT Upload complete: nodejs.org-access.log.20240713.json
CPU usage: ~10%, memory usage: ~28% after

Let's still keep the batching fix, as it's expected to speed up producing summaries theoretically, and had shown results in preliminary testing.

@trivikr
Copy link
Member Author

trivikr commented Jul 19, 2024

As per the logs, producing summaries has been taking ~10 mins over the last three days which isn't expected post this change.

2024-07-16 04:00:03.743 PDT 20240715/
2024-07-16 04:10:24.959 PDT Upload complete: nodejs.org-access.log.20240715.json
...
2024-07-17 04:00:03.700 PDT 20240716/
2024-07-17 04:10:08.729 PDT Upload complete: nodejs.org-access.log.20240716.json
...
2024-07-18 04:00:03.517 PDT 20240717/
2024-07-18 04:11:25.209 PDT Upload complete: nodejs.org-access.log.20240717.json

I noticed it's because traffic wasn't served by the latest version.

I ran update-traffic to serve the latest version, and verified that it's processed in less than two minutes now

2024-07-18 17:49:34.830 PDT 20240715/
2024-07-18 17:50:52.032 PDT Upload complete: nodejs.org-access.log.20240715.json
...
2024-07-18 17:51:12.802 PDT 20240716/
2024-07-18 17:52:37.145 PDT Upload complete: nodejs.org-access.log.20240716.json
...
2024-07-18 17:53:08.170 PDT 20240717/
2024-07-18 17:54:27.877 PDT Upload complete: nodejs.org-access.log.20240717.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants