Try nuking ShardLayout::V0 #12313

eagr · 2024-10-25T09:47:06Z

eagr · 2024-10-25T09:54:30Z

can you guys do something like cargo test -p near-chain-configs without dependency issues? @wacban

wacban · 2024-10-25T10:28:16Z

can you guys do something like cargo test -p near-chain-configs without dependency issues? @wacban

It fails for me actually, that's not great. I typically run it on the whole workspace and just filter to the tests that I want. Also we use nextest framework, rather than test, though I have no clue as to why. It's suboptimal but I never bothered to optimize this part of my work flow.

cargo nextest run <test>

wacban · 2024-10-25T10:29:05Z

If you feel like fixing it, go for it. It looks like it's only a matter of adding some dependencies to the cargo file.

wacban · 2024-10-25T10:29:57Z

JFYI this PR is marked as draft, please make it as ready for review when it is.

eagr · 2024-10-26T09:07:05Z

If you feel like fixing it, go for it. It looks like it's only a matter of adding some dependencies to the cargo file.

It seems like this is expected behavior. If it's not bothering anyone else, not sure if it needs fixing. And it could be easily mitigated by adding an --all-features flag to the command.

core/primitives/src/shard_layout.rs

eagr · 2024-10-28T07:05:27Z

core/primitives/src/shard_layout.rs

+    }
+
+    /// Construct a layout with given number of shards
+    pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self {


got a better idea for the fn name?

Maybe multi_shard, just to mach the single_shard one?

That was my first thought but it could also be used to create a single-shard layout, so I changed my mind. But if you like that name I'm also down with it. :)

eagr · 2024-10-28T07:13:04Z

nearcore/src/config.rs

@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds(
        .map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed))
        .collect::<Vec<_>>();

-    let shard_layout = ShardLayout::v0(num_shards, 0);
+    let shard_layout = ShardLayout::of_num_shards(num_shards, 0);


This would cause some sanity check to fail as you could see from the CI logs. It seems like some json parsing issue. Not sure whether if you'd like to keep it as it was or to update the config somewhere else to make it work.

Let's try updating the config and if it doesn't work leave as is.

nearcore/src/config.rs

eagr · 2024-10-28T07:17:51Z

integration-tests/src/tests/client/features/stateless_validation.rs

-    let error_message = format!("{}", error).to_lowercase();
-    tracing::info!(target: "test", "error message: {}", error_message);
-    assert!(error_message.contains("shard"));
+    let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer);


There's a panic from get_shard_index() after switching to V2.

Ah that's pretty bad. Feel free to either:

Fix it (may be complicated / lots of code if you need to add error handling)

Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.

Make the default shard layout V1 (hopefully this works?)

I'll try 3 (should probably work from the look of the code) which seems like a nice middle ground before finishing transition to V2

wacban

looks nice, answered some questions

wacban · 2024-10-28T09:32:55Z

core/chain-configs/src/genesis_config.rs

+    // FIXME eagr what should be the default?
+    #[default(ShardLayout::v0(1, 0))]


Ideally it should use the single_shard method that returns the most recent (today it's V2) shard layout.

nit: The convention here seems to be to use the default_ function to provide the default value.

core/primitives/src/shard_layout.rs

wacban · 2024-10-28T09:34:36Z

core/primitives/src/shard_layout.rs

+    }
+
+    /// Construct a layout with given number of shards
+    pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self {


Maybe multi_shard, just to mach the single_shard one?

core/primitives/src/shard_layout.rs

wacban · 2024-10-28T09:39:29Z

integration-tests/src/tests/client/features/stateless_validation.rs

-    let error_message = format!("{}", error).to_lowercase();
-    tracing::info!(target: "test", "error message: {}", error_message);
-    assert!(error_message.contains("shard"));
+    let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer);


Ah that's pretty bad. Feel free to either:

Fix it (may be complicated / lots of code if you need to add error handling)

Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.

Make the default shard layout V1 (hopefully this works?)

nearcore/src/config.rs

wacban · 2024-10-28T09:44:16Z

nearcore/src/config.rs

@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds(
        .map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed))
        .collect::<Vec<_>>();

-    let shard_layout = ShardLayout::v0(num_shards, 0);
+    let shard_layout = ShardLayout::of_num_shards(num_shards, 0);


Let's try updating the config and if it doesn't work leave as is.

tools/database/src/corrupt.rs

wacban

looks good,

I think serde doesn't like (de)serializing maps with non-string keys, like the ones in V2 and it breaks the tests. Feel free to fallback to V1 is it's too crazy to fix in this PR.

wacban · 2024-10-29T09:08:02Z

core/primitives/src/shard_layout.rs

-    #[test]
-    fn test_shard_layout_v0() {
-        let num_shards = 4;
-        let shard_layout = ShardLayout::v0(num_shards, 0);
-        let mut shard_id_distribution: HashMap<ShardId, _> =
-            shard_layout.shard_ids().map(|shard_id| (shard_id.into(), 0)).collect();
-        let mut rng = StdRng::from_seed([0; 32]);
-        for _i in 0..1000 {
-            let s: Vec<u8> = (&mut rng).sample_iter(&Alphanumeric).take(10).collect();
-            let s = String::from_utf8(s).unwrap();
-            let account_id = s.to_lowercase().parse().unwrap();
-            let shard_id = account_id_to_shard_id(&account_id, &shard_layout);
-            assert!(shard_id < num_shards);
-            *shard_id_distribution.get_mut(&shard_id).unwrap() += 1;
-        }
-        let expected_distribution: HashMap<ShardId, _> = [
-            (ShardId::new(0), 247),
-            (ShardId::new(1), 268),
-            (ShardId::new(2), 233),
-            (ShardId::new(3), 252),
-        ]
-        .into_iter()
-        .collect();
-        assert_eq!(shard_id_distribution, expected_distribution);
-    }


Please keep this one, the V0 may still be used when replaying some very old blocks.

codecov · 2024-10-30T18:31:40Z

Codecov Report

Attention: Patch coverage is 92.37288% with 9 lines in your changes missing coverage. Please review.

Project coverage is 71.23%. Comparing base (8e30ccd) to head (e964b75).
Report is 10 commits behind head on master.

Files with missing lines	Patch %	Lines
core/primitives/src/shard_layout.rs	91.11%	0 Missing and 8 partials ⚠️
chain/chain/src/test_utils.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12313      +/-   ##
==========================================
+ Coverage   71.19%   71.23%   +0.03%     
==========================================
  Files         839      839              
  Lines      169743   170209     +466     
  Branches   169743   170209     +466     
==========================================
+ Hits       120851   121240     +389     
- Misses      43633    43688      +55     
- Partials     5259     5281      +22

Flag	Coverage Δ
backward-compatibility	`0.16% <0.00%> (-0.01%)`	⬇️
db-migration	`0.16% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.26% <48.97%> (+0.03%)`	⬆️
integration-tests	`39.00% <47.45%> (+0.01%)`	⬆️
linux	`70.68% <92.37%> (+0.02%)`	⬆️
linux-nightly	`70.80% <92.37%> (+0.02%)`	⬆️
macos	`50.48% <90.67%> (+0.05%)`	⬆️
pytests	`1.57% <50.00%> (+0.03%)`	⬆️
sanity-checks	`1.38% <48.97%> (+0.03%)`	⬆️
unittests	`64.18% <90.67%> (+0.03%)`	⬆️
upgradability	`0.21% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eagr · 2024-10-30T19:08:44Z

The failing test could be fixed by adding some feature flags. But that's from master, does it need to be fixed here?

wacban

Looks good to me, thank you for this contribution! Just a few final nits. Most are optional, I only really care about restoring the assertion in the resharding test.

wacban · 2024-10-31T08:37:27Z

core/primitives/src/shard_layout.rs

+            shards_split_map: None,
+            shards_parent_map: None,
+            version,
+        })
    }

    /// Return a V0 Shardlayout


Can you mark it as deprecated? I don't know how to do this properly in rust, if it's not straight forward then a comment should do.

How about marking ShardLayout::V0 as deprecated? This way any usage of V0 would raise a deprecation warning including calling v0().

How about both? :)

On second thought, probably just on v0() as there're still many occurrences of V0 in the code which will probably cause clippy to fail on ci as it raises a deprecation warning (need something like #[allow(deprecated)] to suppress the warning).

wacban · 2024-10-31T08:37:59Z

chain/chain/src/resharding/event_type.rs

-        // Shard layouts V0 and V1 are rejected.
-        assert!(ReshardingEventType::from_shard_layout(
-            &ShardLayout::v0_single_shard(),
-            block,
-            prev_block
-        )
-        .is_err());


Can we keep this?

wacban · 2024-10-31T08:38:52Z

chain/epoch-manager/src/tests/mod.rs

@@ -2294,7 +2294,7 @@ fn test_protocol_version_switch_with_shard_layout_change() {
        epoch_manager.get_epoch_info(&epochs[1]).unwrap().protocol_version(),
        new_protocol_version - 1
    );
-    assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::v0_single_shard(),);
+    assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::single_shard(),);


mini nit: remove the trailing comma

wacban · 2024-10-31T08:42:12Z

core/primitives/src/shard_layout.rs

+        let id_to_index_map =
+            layout.id_to_index_map.iter().map(|(k, v)| (k.to_string(), *v)).collect();


This is completely fine but I would be tempted to write a generic function that converts a Map<ShardId, T> to Map<String, T> and use it for all the maps in the shard layout. We're looking at whooping potential savings of ~2 lines of code so up to you if you think it's worth it :)

wacban · 2024-10-31T08:46:16Z

core/primitives/src/shard_layout.rs

+        let id_to_index_map = layout
+            .id_to_index_map
+            .into_iter()
+            .map(|(k, v)| Ok((k.parse::<u64>()?.into(), v)))
+            .collect::<Result<_, Self::Error>>()?;


Ditto about generic function for this but here it may actually make some sense because it's less trivial logic.

wacban · 2024-10-31T08:47:37Z

core/primitives/src/shard_layout.rs

+impl TryFrom<SerdeShardLayoutV2> for ShardLayoutV2 {
+    type Error = Box<dyn std::error::Error + Send + Sync>;
+
+    fn try_from(layout: SerdeShardLayoutV2) -> Result<Self, Self::Error> {


mini nit: May unpack the layout as first step and then use the unpacked values directly? It may be a bit prettier and it would be more obvious that there isn't unnecessary cloning.

core/primitives/src/shard_layout.rs

wacban · 2024-10-31T08:49:01Z

core/primitives/src/shard_layout.rs

+    }
+}
+
+impl<'de> serde::Deserialize<'de> for ShardLayoutV2 {


nit: I think the convention is to call the lifespan 'a.

eagr · 2024-10-31T11:58:24Z

core/primitives/src/shard_layout.rs

+            shards_split_map: None,
+            shards_parent_map: None,
+            version,
+        })
    }

    /// Return a V0 Shardlayout


How about marking ShardLayout::V0 as deprecated? This way any usage of V0 would raise a deprecation warning including calling v0().

eagr · 2024-10-31T12:01:21Z

core/primitives/src/shard_layout.rs

+    }
+
+    /// Can be used to construct a multi-shard layout, mostly for test purposes
+    pub fn multi_shard(num_shards: NumShards, version: ShardVersion) -> Self {


how about n_shard() in the sense of creating an N-shard layout?

I'm not a fan tbh. How about just new or new_test?

wacban · 2024-10-31T12:05:51Z

I tried the pre-merge tests and unfortunately some are failing. Those are the most expensive tests that only run before merging to master. I tried a simple debug but I couldn't fix it easily. I'm afraid we may need to restore kv_runtime to use v0 for now to make it pass. I'm testing this change again on a fork from your PR:
38ca3a1
test run reference for myself:
https://nayduck.nearone.org/#/run/564

wacban · 2024-11-01T15:54:49Z

@eagr Is it ready for merging or are you still working on anything?

wacban · 2024-11-01T16:24:09Z

Actually still needs fixes for the expensive tests, can you cherry pick the commits from here?
https://github.com/near/nearcore/tree/waclaw-deprec-shard-v0-1

wacban · 2024-11-01T16:27:03Z

I was able just to push those here directly, should be ready for merging now.

eagr · 2024-11-01T17:13:36Z

If it looks alright to you then let's go. Thanks man, for all the guidance.

wacban · 2024-11-02T14:30:46Z

Merged finally after some more CI issues. Thank you for your contribution!

eagr · 2024-11-02T19:11:26Z

It doesn't feel the job is quite done though. Let me know when it's time to really get rid of it. Meanwhile, any other issues you could point me to?

wacban · 2024-11-02T20:58:44Z

It doesn't feel the job is quite done though. Let me know when it's time to really get rid of it.

I'll be looking into it now as I need to get V2 to a good state for resharding V3.

Meanwhile, any other issues you could point me to?

Does any of the following sound interesting?

refactor the hacky BorshDeserialization for Receipt using the Read::chain trick
Remove Cow from StateStoredReceiptV0, the reason why cow was used, if I recall correctly in one place the receipt cannot be passed as owned because the receipt is used second time for balance check. Hopefully this can be refactored and all the cow madness can be deleted.

eagr · 2024-11-03T20:23:37Z

I guess no need to create issues for these? I could just link to the comment from the PR.

wacban · 2024-11-04T09:09:35Z

@eagr Up to you, I'm fine without issues, I would also be happy to write down proper issues with some more context, just let me know.

eagr · 2024-11-06T04:18:14Z

refactor the hacky BorshDeserialization for Receipt using the Read::chain trick

This may need further discussion, maybe in a separate thread. Is there a particular reason that it needs to change? It may be hacky but it's also kinda common in serialization to leverage certain characteristics of data to save bytes. If there's no practical (or potential) issues with this then it feels like an it-aint-broke-dont-fix-it situation, more so if the serialized data is being persisted.

eagr requested a review from a team as a code owner October 25, 2024 09:47

eagr requested a review from Longarithm October 25, 2024 09:47

eagr marked this pull request as draft October 25, 2024 09:50

eagr force-pushed the deprec-shard-v0 branch from 44c2b52 to 264738b Compare October 25, 2024 09:59

eagr force-pushed the deprec-shard-v0 branch from 0d367db to e8a78cc Compare October 27, 2024 04:58

v0_single_shard() -> single_shard()

d709cf9

eagr force-pushed the deprec-shard-v0 branch 3 times, most recently from 02b02f4 to 7f64b44 Compare October 28, 2024 04:50

try removing v0()

7ba337a

eagr force-pushed the deprec-shard-v0 branch from 7f64b44 to 7ba337a Compare October 28, 2024 04:58

create_localnet_configs_from_seeds()

4133145

eagr marked this pull request as ready for review October 28, 2024 06:18

init_configs()

4067758

eagr force-pushed the deprec-shard-v0 branch from e93bd9a to 4067758 Compare October 28, 2024 06:24

eagr commented Oct 28, 2024

View reviewed changes

core/primitives/src/shard_layout.rs Show resolved Hide resolved

eagr commented Oct 28, 2024

View reviewed changes

nearcore/src/config.rs Outdated Show resolved Hide resolved

eagr commented Oct 28, 2024

View reviewed changes

wacban reviewed Oct 28, 2024

View reviewed changes

eagr added 3 commits October 29, 2024 04:31

todo for failing tests

5b7d17e

bring back V0 for database tools

af3296a

update genesis_config.json

2a2710d

wacban reviewed Oct 29, 2024

View reviewed changes

eagr added 2 commits October 31, 2024 02:08

update genesis config

0bf8d43

Merge branch 'master' into deprec-shard-v0

1c502af

wacban approved these changes Oct 31, 2024

View reviewed changes

eagr commented Oct 31, 2024

View reviewed changes

eagr added 3 commits November 1, 2024 16:17

refactor

114cf77

deprecate v0()

0bd8bfa

suppress v0 warnings

e964b75

eagr force-pushed the deprec-shard-v0 branch from e51bc13 to e964b75 Compare November 1, 2024 14:47

wacban added 2 commits November 1, 2024 16:04

undo kv runtime

79b5228

fix state_parts_dump_check.py

21b937d

wacban enabled auto-merge November 1, 2024 16:27

allow deprecated

0eb0078

Merge branch 'master' into deprec-shard-v0

6ba3803

wacban added this pull request to the merge queue Nov 2, 2024

Merged via the queue into near:master with commit 864270f Nov 2, 2024
51 of 52 checks passed

eagr mentioned this pull request Nov 15, 2024

Nuke cows in Receipt #12470

Draft

		// FIXME eagr what should be the default?
		#[default(ShardLayout::v0(1, 0))]

		let id_to_index_map =
		layout.id_to_index_map.iter().map(\|(k, v)\| (k.to_string(), *v)).collect();

Try nuking ShardLayout::V0 #12313

Try nuking ShardLayout::V0 #12313

Conversation

eagr commented Oct 25, 2024 • edited Loading

eagr commented Oct 25, 2024

wacban commented Oct 25, 2024

wacban commented Oct 25, 2024

wacban commented Oct 25, 2024

eagr commented Oct 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 30, 2024 • edited Loading

Codecov Report

eagr commented Oct 30, 2024

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban commented Oct 31, 2024 • edited Loading

wacban commented Nov 1, 2024

wacban commented Nov 1, 2024

wacban commented Nov 1, 2024

eagr commented Nov 1, 2024

wacban commented Nov 2, 2024

eagr commented Nov 2, 2024 • edited Loading

wacban commented Nov 2, 2024

eagr commented Nov 3, 2024

wacban commented Nov 4, 2024

eagr commented Nov 6, 2024 • edited Loading

eagr commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading

wacban commented Oct 31, 2024 •

edited

Loading

eagr commented Nov 2, 2024 •

edited

Loading

eagr commented Nov 6, 2024 •

edited

Loading