Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP-568: Resharding V3 #568

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

NEP-568: Resharding V3 #568

wants to merge 33 commits into from

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Oct 24, 2024

NEP Status (Updated by NEP Moderators)

Status: Voting

SME reviews:

Protocol Work Group voting indications (❔ | 👍 | 👎 ):

@flmel flmel added the S-draft/needs-author-revision A NEP in the DRAFT stage that needs an author revision. label Oct 24, 2024
@flmel flmel changed the title NEP-XXX: Resharding v3 NEP-568: Resharding v3 Oct 24, 2024
@flmel
Copy link
Member

flmel commented Oct 28, 2024

Hi @wacban – thank you for starting this proposal. As the moderator, I labeled this PR as "Needs author revision" because we assume you are still working on it since you submitted it in "Draft" mode.
Please ping the @near/nep-moderators once you are ready for us to review it

Filling in the future possibilities section - and looking for more ideas :)
Filling the specification section about flat state.
@render render bot temporarily deployed to resharding - nomicon PR #568 November 6, 2024 15:43 Destroyed
Some beautification courtesy ChatGPT.

I double checked everything, we aren't changing any meaning.
@shreyan-gupta shreyan-gupta changed the title NEP-568: Resharding v3 NEP-568: Resharding V3 Dec 16, 2024
@walnut-the-cat walnut-the-cat added S-review/needs-sme-review A NEP in the REVIEW stage is waiting for Subject Matter Expert review. and removed S-draft/needs-author-revision A NEP in the DRAFT stage that needs an author revision. labels Dec 16, 2024
@walnut-the-cat
Copy link
Contributor

As the moderator,

I believe this proposal is ready for SME review.

@near/wg-protocol , could you help assign SMEs who can review the proposal?

Thank you.

@shreyan-gupta shreyan-gupta marked this pull request as ready for review December 17, 2024 04:59
@shreyan-gupta shreyan-gupta requested a review from a team as a code owner December 17, 2024 04:59
@flmel flmel added the WG-protocol Protocol Standards Work Group should be accountable label Dec 18, 2024
@PiVortex PiVortex mentioned this pull request Dec 23, 2024
9 tasks
@bowenwang1996
Copy link
Collaborator

As a working group member, I nominate @birchmd and @mfornet as SME reviewers.

Copy link
Contributor

@birchmd birchmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reviewer I would like to thank the team for putting together such a well-written and detailed proposal! Designing this feature was no easy task, and the proposal here seems to address all the intricacies nicely.

I have left two questions, but otherwise the design looks good to me.


#### Flat State's Status Persistence

Every shard's Flat State has a status associated with it and stored in the database, called `FlatStorageStatus`. We propose extending the existing object by adding a new enum variant named `FlatStorageStatus::Resharding`. This approach has two benefits. First, the progress of any Flat State resharding is persisted to disk, making the operation resilient to a node crash or restart. Second, resuming resharding on node restart shares the same code path as Flat State creation (see `FlatStorageShardCreator`), reducing code duplication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node crashes during resharding then the HybridArena MemTrie objects also need to be reconstructed to resume block processing. How does this recovery happen? I assume it is easy to reconstruct the FrozenArena from the Flat state, but what about the changes on top of it? Are they persisted somewhere? If they are not then I suppose the node would need to recreate the MemTrie by re-applying the chunks that have happened so far in the epoch.

Copy link
Contributor Author

@wacban wacban Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trisfald Can you reply please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the HybridArena Memtrie will need to be reconstructed upon crash or even a simple restart.

I think at the moment stage of nearcore/master we don't support seamless node restarts yet. There are a couple possible solution to this problem since all changes to the state are always persisted, in a way or another. In increasing order of complexity:

  • Wait for flat storage to complete resharding and load the memtrie for children shards as usual
  • Recreate the memtrie of the parent shard as it was at the resharding block, split it again, and apply changes
  • Create memtries from current flat storage state of children (even if not complete) and apply remaining changes
  • More complex solutions which leverage decoupling of memtrie from flat storage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @Trisfald . I brought this up here because crash recovery is mentioned as part of the motivation for the design. Maybe it is worth adding a section to the MemTrie specification about how restarts during resharding will be handled.

@walnut-the-cat
Copy link
Contributor

@mfornet , could you be able to review this NEP?

@walnut-the-cat
Copy link
Contributor

@mfornet , another ping to make sure it catches your eyes. As we are trying to release resharding v3 with upcoming release, it will be great if you could share your thoughts.

Thank you.

github-merge-queue bot pushed a commit to near/nearcore that referenced this pull request Jan 24, 2025
# Feature to stabilize

This PR stabilizes Resharding V3. It introduces a new implementation for
resharding and two new shard layouts for the production networks.

# Context

- NEP: near/NEPs#568
- Implementation: #11881

# Testing and QA
This feature was extensively tested in unit tests, testloop tests and in
forknet.

# Checklist
- [ ] Link to nightly nayduck run https://nayduck.nearone.org/#/run/1142
- [x] Update CHANGELOG.md to include this protocol feature in the
`Unreleased` section.
@render render bot temporarily deployed to resharding - nomicon PR #568 January 30, 2025 14:35 Destroyed
Copy link
Member

@mfornet mfornet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wacban, @walnut-the-cat, sorry for the delay reviewing this PR.

I really appreciate all the details layed out in this document, given the complexity of the proposal. The questions I was annotating were answered in the document itself.

The design looks good.

@walnut-the-cat walnut-the-cat added S-voting/needs-wg-voting-indication A NEP in the VOTING stage that needs the working group voting indication. and removed S-review/needs-sme-review A NEP in the REVIEW stage is waiting for Subject Matter Expert review. labels Feb 10, 2025
@flmel
Copy link
Member

flmel commented Feb 20, 2025

@near/wg-protocol Can you please fully read this NEP and comment in the thread if you are leaning towards approving or rejecting it? Please make sure to include your rationale and any feedback that you have for the author.

@birchmd
Copy link
Contributor

birchmd commented Feb 20, 2025

As a subject matter expert and working group member I lean towards approving this proposal. You can see more detailed comments from me in my SME review.

@bowenwang1996
Copy link
Collaborator

As a working group member I lean towards approving this proposal. Thank you for writing such a detailed proposal!

@mfornet
Copy link
Member

mfornet commented Feb 21, 2025

As a working group member I support approving this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-NEP A NEAR Enhancement Proposal (NEP). S-voting/needs-wg-voting-indication A NEP in the VOTING stage that needs the working group voting indication. WG-protocol Protocol Standards Work Group should be accountable
Projects
Status: In Review 👀
Status: VOTING
Development

Successfully merging this pull request may close these issues.