Skip to content

confused with torchrec/distributed/sharding/dynamic_sharding.py #3395

@haolujun

Description

@haolujun
Example 1 NOTE: the ordering by rank:
Rank 0 sends table_0, shard_0 to Rank 1.
Rank 2 sends table_1, shard_0 to Rank 1.
Rank 2 sends table_1, shard_1 to Rank 0
Rank 3 sends table_0, shard_1 to Rank 0

NOTE: table_1 comes first due to its source rank being 'first'
On Rank 0:output_tensor = [
    <table_1, shard_0>, # from rank 2
    <table_0, shard_1>  # from rank 3
]

On Rank 1: output_tensor = [
    <table_0, shard_0>, # from rank 0
    <table_1, shard_0>  # from rank 2
]

May be follow is right ?

On Rank 0: output_tensor = [
<table_1, shard_1>, # from rank 2
<table_0, shard_1> # from rank3
]

and if this is right, there are more errors with commit in torchrec/distributed/sharding/dynamic_sharding.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions