Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE #2736

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

Courey
Copy link
Contributor

@Courey Courey commented Nov 26, 2024

DO NOT MERGE. Please put approval in comment. ticket will be closed once approved.
Resolves #2647

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 0% with 63 lines in your changes missing coverage. Please review.

Project coverage is 5.86%. Comparing base (356e2ac) to head (6321502).
Report is 146 commits behind head on main.

Files with missing lines Patch % Lines
app/backfill-eventId.js 0.00% 63 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main   #2736      +/-   ##
========================================
- Coverage   6.23%   5.86%   -0.37%     
========================================
  Files        167     172       +5     
  Lines       4248    4396     +148     
  Branches     472     486      +14     
========================================
- Hits         265     258       -7     
- Misses      3981    4136     +155     
  Partials       2       2              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +7 to +8
// THIS CODE IS CUT AND PASTED FROM THE APP CODE IN /routes/circulars/circulars.lib.ts
// running as a node script it would not import the module from the file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable copying and pasting this. If you are not able to import it from a standalone Node.js script, then please convert this to a .ts source and use esbuild to compile it down to JS. You should be able to hack something into https://github.com/nasa-gcn/gcn.nasa.gov/blob/main/esbuild.config.js to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain to me why? It's a literal cut and paste (though I will have to adapt the GRB matcher to include ones without a letter). Copying and pasting directly from the current app code and updating that subject matcher shouldn't be an issue. I'm not sure what compiling etc would buy us over a copy and paste. I also have the issue that I need to include eventIds on circulars that have GRBs without a letter, but they are not permissible anymore, so it shouldn't be added to the app subject matchers. I'm not sure what hacking around in the esbuild would get us that would be any better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is standalone and will only be run once on prod. It also requires changes compared to the regular subject matchers to account for changes in GRB naming over time. If there are no problems with the logic of how this is being implemented, let's just go with how this is currently implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain to me why? It's a literal cut and paste (though I will have to adapt the GRB matcher to include ones without a letter). Copying and pasting directly from the current app code and updating that subject matcher shouldn't be an issue. I'm not sure what compiling etc would buy us over a copy and paste. I also have the issue that I need to include eventIds on circulars that have GRBs without a letter, but they are not permissible anymore, so it shouldn't be added to the app subject matchers. I'm not sure what hacking around in the esbuild would get us that would be any better

For the same reasons why we try to avoid duplicate copies of all but the simplest code in general: there is the risk that the two copies will not be the same, that bugs in one won't be fixed in the other, that they will become incompatible, etc. Even though this PR contains code that we do not plan to reuse, there is still the very real risk that the new copy in the PR will not be the same as the original copy in the main source code.

We have added new regular expressions several times while this PR has been open. In order to review this properly, one would need to manually check that the two copies are the same in this PR, and then we would need to re-check it any time one makes changes to either this PR or to the original code on the main branch.

I do not think that this is an acceptable risk, but @jracusin should have the final say.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. I can just check the subject matchers before I run them. I have both in my current version. I also have to adapt the GRB matcher so it includes GRBs without a letter in them. Currently the subject matchers will only match on GRBs with a letter. So there is already going to be a difference than what would be imported.

Comment on lines 75 to 113
for await (const page of pages) {
if (limitHit) break
const chunked = chunk(page.Items || [], 25)

for (const chunk of chunked) {
if (limitHit) break
let writes = []

for (const record of chunk) {
if (limitHit) break

const circular = unmarshall(record)
const parsedEventId = parseEventFromSubject(circular.subject)

if (!circular.eventId && parsedEventId) {
circular.eventId = parsedEventId
writes.push(circular)
}
}

if (writes.length + totalWriteCount > writeLimit) {
const overage = writes.length + totalWriteCount - writeLimit
const writesToLimitTo = writes.length - overage
writes = writes.slice(0, writesToLimitTo)
limitHit = true
}

if (writes.length > 0) {
const command = new BatchWriteCommand({
RequestItems: {
[TableName]: writes.map((circ) => ({
PutRequest: {
Item: {
...circ,
},
},
})),
},
})
Copy link
Member

@lpsinger lpsinger Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite complicated. I would be more comfortable with the correctness of a functional programming implementation using TC39 Async Iterator Helpers (see polyfills available from core-js).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to make this one-time use script functional? Does our other code comply with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of loops and conditionals here, which makes the code too complicated to review quickly.

As for whether we have existing code like this, I don't know of any other examples where we have a paginated input, which is filtered, and then which must be paginated again for insertion. That is the source of the complexity.

@Courey
Copy link
Contributor Author

Courey commented Feb 20, 2025

I wish these major changes had been requested when I first asked for in depth review before I started running out of test data. I can still run this script on test, there is still data there to test on, but I am totally out of data pertaining to the synonym slug backfill, all that is left is production data. This will bloat the time for this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backfill eventID in Circulars
3 participants