Skip to content

feat: add minimal implementation of changesStream for registry mirroring #1025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

elrrrrrrr
Copy link

Hi, jsr team

This PR introduces a minimal implementation of the changesStream interface (#110) to enable registry mirroring capabilities. Given that 11 months have passed since our initial discussion and the feature remains unscheduled, I've taken the liberty to propose this implementation based on our proven approach in npmmirror.

Key implementation notes:

  1. Implements simple polling mechanism via application/json endpoint
  2. Follows npmmirror pattern (compared to npm's heavier stream interface)
  3. Maintains backward compatibility with existing infrastructure

For maintainers:

  • This implementation prioritizes simplicity and maintainability
  • Being new to Rust, I welcome guidance on code quality and best practices 😋

Context:
As developers in mainland China experience consistent network challenges accessing the JSR registry directly, this feature would significantly improve developer experience for one of the world's largest developer communities.

Thanks for building this vital infrastructure for the JavaScript ecosystem.
Looking forward to your feedback! 🙏🏻

@crowlKats crowlKats requested a review from lucacasonato April 2, 2025 09:48
Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the changes endpoint need to return changes in order?

If so, this doesn't work. There is no guaruantee in Postgres that a sequential table is actually sequential on insert because multiple transactions can be in flight at once, and sequential IDs are assigned at the start of a transaction, not once the transaction commits.

I suggest you change this as follows:

  • Instead of a seperate database query to insert changes into the change stream, do this using TRIGGER in postgres on the package_versions table. Add a trigger that on insert creates the changes entry.

  • Add a row_version column to the changes table that is typed as xid8 NOT NULL DEFAULT pg_current_xact_id().

  • On query, limit the returned results to WHERE row_id <= pg_snapshot_xmin(pg_current_snapshot()).

What this will do is ensure that we only show changes where all previous transactions have already committed or failed.

@elrrrrrrr elrrrrrrr marked this pull request as draft April 2, 2025 10:16
@elrrrrrrr elrrrrrrr marked this pull request as ready for review April 2, 2025 14:58
@elrrrrrrr elrrrrrrr marked this pull request as draft April 2, 2025 15:03
@elrrrrrrr
Copy link
Author

Does the changes endpoint need to return changes in order?

If so, this doesn't work. There is no guaruantee in Postgres that a sequential table is actually sequential on insert because multiple transactions can be in flight at once, and sequential IDs are assigned at the start of a transaction, not once the transaction commits.

I suggest you change this as follows:

  • Instead of a seperate database query to insert changes into the change stream, do this using TRIGGER in postgres on the package_versions table. Add a trigger that on insert creates the changes entry.
  • Add a row_version column to the changes table that is typed as xid8 NOT NULL DEFAULT pg_current_xact_id().
  • On query, limit the returned results to WHERE row_id <= pg_snapshot_xmin(pg_current_snapshot()).

What this will do is ensure that we only show changes where all previous transactions have already committed or failed.

There is no strict time order requirement for npm changes. When the npmmirror system receives a _change record, it will query the upstream registry for the full manifest based on the change.id.

Disorder or duplication within a certain period of time are acceptable.

There are some other changeTypes in npm, such as tag changes, maintainer changes, etc. These changes are not all triggered by db changes.

In npmmirror, we uniformly distribute events, and these behaviors are completely asynchronous.
https://github.com/cnpm/cnpmcore/blob/master/app/core/event/ChangesStream.ts#L68

@elrrrrrrr elrrrrrrr marked this pull request as ready for review April 2, 2025 15:57
@elrrrrrrr elrrrrrrr requested a review from lucacasonato April 2, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants