Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The download counts for updated produce summaries are 10X higher than the previous ones #3810

Closed
trivikr opened this issue Jul 6, 2024 · 6 comments · Fixed by #3811
Closed

Comments

@trivikr
Copy link
Member

trivikr commented Jul 6, 2024

Describe the bug

The download counts for updated produce summaries are 10X higher than the previous ones.

Steps to reproduce

Run the following test code

import { request } from "undici";

const start = new Date("04/10/2024");
const end = new Date("04/30/2024");

for (let date = start; date <= end; date.setDate(date.getDate() + 1)) {
  const dateString = date.toISOString().split("T")[0].replace(/-/g, "");

  const baseUrl = "http://storage.googleapis.com/access-logs-summaries-nodejs/"
  const fileName = `nodejs.org-access.log.${dateString}.json`;

  const response = await request(`${baseUrl}${fileName}`);
  const { os } = await response.body.json();
  const totalRequests = Object.values(os).reduce((acc, requests) => acc + requests, 0);
  console.log(`${dateString}: ${totalRequests}`);
}

Observed behavior

The counts are higher for some dates which were repopulated #3697 (comment), although there's no clear correlation.

20240410: 44443066
20240411: 50168573
20240412: 55149948
20240413: 1514636
20240414: 1506204
20240415: 12479084
20240416: 27070278
20240417: 41119517
20240418: 5719212
20240419: 4976855
20240420: 1444435
20240421: 1497210
20240422: 55920552
20240423: 5845360
20240424: 5616159
20240425: 5367285
20240426: 4942371
20240427: 1503351
20240428: 1480123
20240429: 5398854
20240430: 71739631

Expected behavior

Counts being around the seven digits per day.

Additional context

Noticed during cross verification in #3697 (comment)

@trivikr
Copy link
Member Author

trivikr commented Jul 6, 2024

I think the issue is with storing counts in a global variable in

const counts = { bytes: 0, total: 0 }

To save time, I'd run queries for multiple dates in parallel in #3697 (comment), and it would have resulted in metrics getting mixed

@trivikr
Copy link
Member Author

trivikr commented Jul 7, 2024

Fix posted in #3811


For dates which showed metrics wildly different from the norm, the data was repopulated using the script

import { GoogleAuth } from "google-auth-library";
import { Agent } from "undici";

const cloudRunUrl = "https://produce-summaries-kdtacnjogq-uc.a.run.app";
const auth = new GoogleAuth();
const client = await auth.getIdTokenClient(cloudRunUrl);

globalThis[Symbol.for("undici.globalDispatcher.1")] = new Agent({
  headersTimeout: 30 * 60 * 1000,
});

const start = new Date("04/01/2024");
const end = new Date("04/04/2024");

const dates = [];
for (let date = start; date <= end; date.setDate(date.getDate() + 1)) {
  dates.push(date.toISOString().split("T")[0].replace(/-/g, ""));
}

await Promise.all(
  dates.map(async (dateString) => {
    const startTime = Date.now();
    console.log(`Fetching for ${dateString}`);
    const url = `${cloudRunUrl}/date/${dateString}`;

    try {
      const token = await client.idTokenProvider.fetchIdToken(cloudRunUrl);
      const response = await fetch(url, {
        method: "POST",
        headers: { Authorization: `Bearer ${token}` },
      });

      if (!response.ok) {
        throw new Error(`HTTP error! status: ${response.status}`);
      }

      console.log(`Successful in ${Date.now() - startTime}ms for ${dateString}`);
    } catch (error) {
      console.error(`Error in ${Date.now() - startTime}ms for ${dateString}`, error);
    }
  })
);

This issue will be closed automatically when fix is merged.

@trivikr
Copy link
Member Author

trivikr commented Jul 7, 2024

Verified that the download stats summarized by https://nodedownloads.nodeland.dev/ are in range.

Screenshot Screenshot 2024-07-06 at 7 25 24 PM

@trivikr
Copy link
Member Author

trivikr commented Jul 7, 2024

The summary was missing for the following dates from the past. I repopulated that too:

20210830
20211031
20211129
20211222
20220214
20220313
20220323
20220827
20230815

@trivikr
Copy link
Member Author

trivikr commented Jul 7, 2024

The content was available under processed-logs-nodejs for March 2021, so I populated summary for the month of March 2021 too

https://storage.googleapis.com/access-logs-summaries-nodejs/?marker=nodejs.org-access.log.20210228.json

@trivikr
Copy link
Member Author

trivikr commented Jul 10, 2024

The content was available under processed-logs-nodejs for Feb 2021, so I populated summary for the month for it too

https://storage.googleapis.com/access-logs-summaries-nodejs/?marker=nodejs.org-access.log.20210131.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant