-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try nuking ShardLayout::V0 #12313
Try nuking ShardLayout::V0 #12313
Conversation
can you guys do something like |
It fails for me actually, that's not great. I typically run it on the whole workspace and just filter to the tests that I want. Also we use
|
If you feel like fixing it, go for it. It looks like it's only a matter of adding some dependencies to the cargo file. |
JFYI this PR is marked as draft, please make it as ready for review when it is. |
It seems like this is expected behavior. If it's not bothering anyone else, not sure if it needs fixing. And it could be easily mitigated by adding an |
02b02f4
to
7f64b44
Compare
core/primitives/src/shard_layout.rs
Outdated
} | ||
|
||
/// Construct a layout with given number of shards | ||
pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got a better idea for the fn name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe multi_shard
, just to mach the single_shard
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first thought but it could also be used to create a single-shard layout, so I changed my mind. But if you like that name I'm also down with it. :)
nearcore/src/config.rs
Outdated
@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds( | |||
.map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed)) | |||
.collect::<Vec<_>>(); | |||
|
|||
let shard_layout = ShardLayout::v0(num_shards, 0); | |||
let shard_layout = ShardLayout::of_num_shards(num_shards, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would cause some sanity check to fail as you could see from the CI logs. It seems like some json parsing issue. Not sure whether if you'd like to keep it as it was or to update the config somewhere else to make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try updating the config and if it doesn't work leave as is.
let error_message = format!("{}", error).to_lowercase(); | ||
tracing::info!(target: "test", "error message: {}", error_message); | ||
assert!(error_message.contains("shard")); | ||
let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a panic from get_shard_index()
after switching to V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's pretty bad. Feel free to either:
- Fix it (may be complicated / lots of code if you need to add error handling)
- Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.
- Make the default shard layout V1 (hopefully this works?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try 3 (should probably work from the look of the code) which seems like a nice middle ground before finishing transition to V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks nice, answered some questions
// FIXME eagr what should be the default? | ||
#[default(ShardLayout::v0(1, 0))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally it should use the single_shard
method that returns the most recent (today it's V2) shard layout.
nit: The convention here seems to be to use the default_
function to provide the default value.
core/primitives/src/shard_layout.rs
Outdated
} | ||
|
||
/// Construct a layout with given number of shards | ||
pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe multi_shard
, just to mach the single_shard
one?
let error_message = format!("{}", error).to_lowercase(); | ||
tracing::info!(target: "test", "error message: {}", error_message); | ||
assert!(error_message.contains("shard")); | ||
let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's pretty bad. Feel free to either:
- Fix it (may be complicated / lots of code if you need to add error handling)
- Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.
- Make the default shard layout V1 (hopefully this works?)
nearcore/src/config.rs
Outdated
@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds( | |||
.map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed)) | |||
.collect::<Vec<_>>(); | |||
|
|||
let shard_layout = ShardLayout::v0(num_shards, 0); | |||
let shard_layout = ShardLayout::of_num_shards(num_shards, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try updating the config and if it doesn't work leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good,
I think serde doesn't like (de)serializing maps with non-string keys, like the ones in V2 and it breaks the tests. Feel free to fallback to V1 is it's too crazy to fix in this PR.
core/primitives/src/shard_layout.rs
Outdated
#[test] | ||
fn test_shard_layout_v0() { | ||
let num_shards = 4; | ||
let shard_layout = ShardLayout::v0(num_shards, 0); | ||
let mut shard_id_distribution: HashMap<ShardId, _> = | ||
shard_layout.shard_ids().map(|shard_id| (shard_id.into(), 0)).collect(); | ||
let mut rng = StdRng::from_seed([0; 32]); | ||
for _i in 0..1000 { | ||
let s: Vec<u8> = (&mut rng).sample_iter(&Alphanumeric).take(10).collect(); | ||
let s = String::from_utf8(s).unwrap(); | ||
let account_id = s.to_lowercase().parse().unwrap(); | ||
let shard_id = account_id_to_shard_id(&account_id, &shard_layout); | ||
assert!(shard_id < num_shards); | ||
*shard_id_distribution.get_mut(&shard_id).unwrap() += 1; | ||
} | ||
let expected_distribution: HashMap<ShardId, _> = [ | ||
(ShardId::new(0), 247), | ||
(ShardId::new(1), 268), | ||
(ShardId::new(2), 233), | ||
(ShardId::new(3), 252), | ||
] | ||
.into_iter() | ||
.collect(); | ||
assert_eq!(shard_id_distribution, expected_distribution); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this one, the V0 may still be used when replaying some very old blocks.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12313 +/- ##
==========================================
+ Coverage 71.19% 71.23% +0.03%
==========================================
Files 839 839
Lines 169743 170209 +466
Branches 169743 170209 +466
==========================================
+ Hits 120851 121240 +389
- Misses 43633 43688 +55
- Partials 5259 5281 +22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The failing test could be fixed by adding some feature flags. But that's from master, does it need to be fixed here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for this contribution! Just a few final nits. Most are optional, I only really care about restoring the assertion in the resharding test.
shards_split_map: None, | ||
shards_parent_map: None, | ||
version, | ||
}) | ||
} | ||
|
||
/// Return a V0 Shardlayout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mark it as deprecated? I don't know how to do this properly in rust, if it's not straight forward then a comment should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about marking ShardLayout::V0
as deprecated? This way any usage of V0
would raise a deprecation warning including calling v0()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about both? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, probably just on v0()
as there're still many occurrences of V0
in the code which will probably cause clippy to fail on ci as it raises a deprecation warning (need something like #[allow(deprecated)]
to suppress the warning).
// Shard layouts V0 and V1 are rejected. | ||
assert!(ReshardingEventType::from_shard_layout( | ||
&ShardLayout::v0_single_shard(), | ||
block, | ||
prev_block | ||
) | ||
.is_err()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this?
chain/epoch-manager/src/tests/mod.rs
Outdated
@@ -2294,7 +2294,7 @@ fn test_protocol_version_switch_with_shard_layout_change() { | |||
epoch_manager.get_epoch_info(&epochs[1]).unwrap().protocol_version(), | |||
new_protocol_version - 1 | |||
); | |||
assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::v0_single_shard(),); | |||
assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::single_shard(),); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini nit: remove the trailing comma
core/primitives/src/shard_layout.rs
Outdated
let id_to_index_map = | ||
layout.id_to_index_map.iter().map(|(k, v)| (k.to_string(), *v)).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely fine but I would be tempted to write a generic function that converts a Map<ShardId, T> to Map<String, T> and use it for all the maps in the shard layout. We're looking at whooping potential savings of ~2 lines of code so up to you if you think it's worth it :)
core/primitives/src/shard_layout.rs
Outdated
let id_to_index_map = layout | ||
.id_to_index_map | ||
.into_iter() | ||
.map(|(k, v)| Ok((k.parse::<u64>()?.into(), v))) | ||
.collect::<Result<_, Self::Error>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about generic function for this but here it may actually make some sense because it's less trivial logic.
impl TryFrom<SerdeShardLayoutV2> for ShardLayoutV2 { | ||
type Error = Box<dyn std::error::Error + Send + Sync>; | ||
|
||
fn try_from(layout: SerdeShardLayoutV2) -> Result<Self, Self::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini nit: May unpack the layout as first step and then use the unpacked values directly? It may be a bit prettier and it would be more obvious that there isn't unnecessary cloning.
} | ||
} | ||
|
||
impl<'de> serde::Deserialize<'de> for ShardLayoutV2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the convention is to call the lifespan 'a
.
shards_split_map: None, | ||
shards_parent_map: None, | ||
version, | ||
}) | ||
} | ||
|
||
/// Return a V0 Shardlayout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about marking ShardLayout::V0
as deprecated? This way any usage of V0
would raise a deprecation warning including calling v0()
.
} | ||
|
||
/// Can be used to construct a multi-shard layout, mostly for test purposes | ||
pub fn multi_shard(num_shards: NumShards, version: ShardVersion) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about n_shard()
in the sense of creating an N-shard layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan tbh. How about just new
or new_test
?
I tried the pre-merge tests and unfortunately some are failing. Those are the most expensive tests that only run before merging to master. I tried a simple debug but I couldn't fix it easily. I'm afraid we may need to restore kv_runtime to use v0 for now to make it pass. I'm testing this change again on a fork from your PR: ![]() |
@eagr Is it ready for merging or are you still working on anything? |
Actually still needs fixes for the expensive tests, can you cherry pick the commits from here? |
I was able just to push those here directly, should be ready for merging now. |
If it looks alright to you then let's go. Thanks man, for all the guidance. |
Merged finally after some more CI issues. Thank you for your contribution! |
It doesn't feel the job is quite done though. Let me know when it's time to really get rid of it. Meanwhile, any other issues you could point me to? |
I'll be looking into it now as I need to get V2 to a good state for resharding V3.
Does any of the following sound interesting?
|
I guess no need to create issues for these? I could just link to the comment from the PR. |
@eagr Up to you, I'm fine without issues, I would also be happy to write down proper issues with some more context, just let me know. |
This may need further discussion, maybe in a separate thread. Is there a particular reason that it needs to change? It may be hacky but it's also kinda common in serialization to leverage certain characteristics of data to save bytes. If there's no practical (or potential) issues with this then it feels like an it-aint-broke-dont-fix-it situation, more so if the serialized data is being persisted. |
helps #12176