Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sector indexing and data retrieval #688

Merged
merged 40 commits into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
1434303
feat: enable reader to skip blocks
cernicc Jan 28, 2025
b5542b1
remove comment
cernicc Jan 28, 2025
13ffd46
small change
cernicc Jan 28, 2025
9c8ec50
build fixes
cernicc Jan 28, 2025
0dc3265
add aditional tests
cernicc Jan 28, 2025
5449278
integrate retrieval server
cernicc Jan 28, 2025
cc5a9bc
fixes after rebase
cernicc Jan 28, 2025
56f5c7e
storage indexer
cernicc Feb 3, 2025
b1e4670
small fixes
cernicc Feb 3, 2025
034e0ff
uncomment
cernicc Feb 3, 2025
3c5e0ae
Merge commit '10295d74e411781c289713e7fca0ead1c6a98ce2' into feat/635…
cernicc Feb 3, 2025
d306919
small fixes
cernicc Feb 3, 2025
a0d220e
tests fix
cernicc Feb 3, 2025
5320371
remove unsealed sector
cernicc Feb 4, 2025
3e73a0c
small change
cernicc Feb 4, 2025
6b8dc39
Merge commit '4a160b67fb8b1094b0f730abe558bb2c448e130a' into feat/635…
cernicc Feb 4, 2025
26d6c61
small changes
cernicc Feb 4, 2025
45b5776
comment fix
cernicc Feb 4, 2025
e42d9e2
some changes
cernicc Feb 4, 2025
fce5781
change indexing
cernicc Feb 4, 2025
8b57f41
revert not needed changes
cernicc Feb 4, 2025
eb0e9c6
fix test
cernicc Feb 4, 2025
2ae00d8
Merge remote-tracking branch 'origin/develop' into feat/635/provider-…
cernicc Feb 5, 2025
4cae24b
test indexer
cernicc Feb 5, 2025
5b0d436
some changes
cernicc Feb 5, 2025
11a95a1
Merge remote-tracking branch 'origin/develop' into feat/635/provider-…
cernicc Feb 10, 2025
002eb4a
index on prove commit completed
cernicc Feb 10, 2025
bf75938
Merge remote-tracking branch 'origin/develop' into feat/635/provider-…
cernicc Feb 10, 2025
d502f04
after merge fix
cernicc Feb 10, 2025
dcd7ff4
Merge remote-tracking branch 'origin/develop' into feat/635/provider-…
cernicc Feb 14, 2025
364b213
merge fixes
cernicc Feb 14, 2025
817969a
Merge commit '1a3719974373cc5f8bb2f7958e7c27461f3c60aa' into feat/635…
cernicc Feb 17, 2025
abde832
remove mater utils
cernicc Feb 17, 2025
83fc15b
more review suggestions
cernicc Feb 17, 2025
835eb52
convert to symlink
cernicc Feb 17, 2025
fb5e86a
rename generic type
cernicc Feb 17, 2025
915fa14
change comment
cernicc Feb 17, 2025
78446ad
docs
cernicc Feb 17, 2025
4a53246
small change
cernicc Feb 17, 2025
751acbd
remove unused import
cernicc Feb 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 6 additions & 3 deletions mater/lib/src/cid.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@ use tokio::io::{AsyncRead, AsyncReadExt};

use crate::{async_varint::read_varint, IDENTITY_CODE};

/// Extension trait for [`Cid`](ipld_core::cid::Cid)
pub trait CidExt {
async fn read_bytes_async<R>(r: R) -> Result<(Self, usize), Error>
/// Reads the bytes from a byte stream.
fn read_bytes_async<R>(r: R) -> impl std::future::Future<Output = Result<(Self, usize), Error>>
where
Self: Sized,
R: AsyncRead + Unpin;
Expand All @@ -13,8 +15,10 @@ pub trait CidExt {
fn get_identity_data(&self) -> Option<&[u8]>;
}

/// Extension trait for [`Multihash`](ipld_core::cid::multihash::Multihash)
pub trait MultihashExt {
async fn read_async<R>(r: R) -> Result<(Self, usize), Error>
/// Reads the bytes from a byte stream.
fn read_async<R>(r: R) -> impl std::future::Future<Output = Result<(Self, usize), Error>>
where
Self: Sized,
R: AsyncRead + Unpin;
Expand Down Expand Up @@ -62,7 +66,6 @@ impl<const S: usize> MultihashExt for Multihash<S> {
/// https://github.com/multiformats/rust-multihash/blob/90a6c19ec71ced09469eec164a3586aafeddfbbd/src/multihash.rs#L271
async fn read_async<R>(mut r: R) -> Result<(Self, usize), Error>
where
Self: Sized,
R: AsyncRead + Unpin,
{
let (code, code_bytes_read): (u64, usize) = read_varint(&mut r).await?;
Expand Down
3 changes: 2 additions & 1 deletion mater/lib/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@ mod v1;
mod v2;

// We need to re-expose this because `read_block` returns `(Cid, Vec<u8>)`.
pub use cid::{CidExt, MultihashExt};
pub use file_reader::CarExtractor;
pub use ipld_core::cid::Cid;
pub use multicodec::{DAG_PB_CODE, IDENTITY_CODE, RAW_CODE};
pub use stores::{create_filestore, Blockstore, Config, FileBlockstore};
pub use v1::{Header as CarV1Header, Reader as CarV1Reader, Writer as CarV1Writer};
pub use v1::{BlockMetadata, Header as CarV1Header, Reader as CarV1Reader, Writer as CarV1Writer};
pub use v2::{
verify_cid, Characteristics, Header as CarV2Header, Index, IndexEntry, IndexSorted,
MultihashIndexSorted, Reader as CarV2Reader, SingleWidthIndex, Writer as CarV2Writer,
Expand Down
49 changes: 36 additions & 13 deletions storage-provider/common/src/sector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,11 @@ pub struct UnsealedSector {
/// Indexes match with corresponding deals in [`Sector::deals`].
pub piece_infos: Vec<PieceInfo>,

/// Tracks locations of the actual pieces added to the unsealed sector. This
/// vector does not contain padding pieces. It contains the actual pieces
/// corresponding with the deals from the users.
pub pieces_locations: Vec<(Commitment<CommP>, PathBuf)>,

/// Tracks all of the deals that have been added to the sector.
pub deals: Vec<(DealId, DealProposal)>,

Expand Down Expand Up @@ -97,6 +102,7 @@ impl UnsealedSector {
occupied_sector_space: 0,
piece_infos: vec![],
deals: vec![],
pieces_locations: vec![],
unsealed_path,
})
}
Expand All @@ -111,24 +117,29 @@ impl UnsealedSector {
) -> Result<(), SectorError> {
self.deals.push((deal_id, deal));

// would love to use something like scoped spawn blocking
let pieces = self.piece_infos.clone();
let unsealed_path = self.unsealed_path.clone();
let handle: JoinHandle<Result<(PieceInfo, u64), SectorError>> =
tokio::task::spawn_blocking(move || {
let unsealed_sector = std::fs::File::options().append(true).open(unsealed_path)?;

tracing::info!("Preparing piece...");
let (padded_reader, piece_info) = prepare_piece(piece_path, commitment)?;
tracing::info!("Adding piece...");
let occupied_piece_space =
add_piece(padded_reader, piece_info, &pieces, unsealed_sector)?;

Ok((piece_info, occupied_piece_space))
tokio::task::spawn_blocking({
let pieces = self.piece_infos.clone();
let unsealed_path = self.unsealed_path.clone();
let piece_path = piece_path.clone();

move || {
let unsealed_sector =
std::fs::File::options().append(true).open(unsealed_path)?;

tracing::info!("Preparing piece...");
let (padded_reader, piece_info) = prepare_piece(piece_path, commitment)?;
tracing::info!("Adding piece...");
let occupied_piece_space =
add_piece(padded_reader, piece_info, &pieces, unsealed_sector)?;

Ok((piece_info, occupied_piece_space))
}
});

let (piece_info, occupied_piece_space) = handle.await??;
self.piece_infos.push(piece_info);
self.pieces_locations.push((commitment, piece_path));
self.occupied_sector_space += occupied_piece_space;

Ok(())
Expand Down Expand Up @@ -289,6 +300,11 @@ pub struct PreCommittedSector {
/// Indexes match with corresponding deals in [`Sector::deals`].
pub piece_infos: Vec<PieceInfo>,

/// Tracks locations of the actual pieces added to the unsealed sector. This
/// vector does not contain padding pieces. It contains the actual pieces
/// corresponding with the deals from the users.
pub pieces_locations: Vec<(Commitment<CommP>, PathBuf)>,

/// Tracks all of the deals that have been added to the sector.
pub deals: Vec<(DealId, DealProposal)>,

Expand Down Expand Up @@ -349,6 +365,7 @@ impl PreCommittedSector {
seal_proof: unsealed.seal_proof,
sector_number: unsealed.sector_number,
piece_infos: unsealed.piece_infos,
pieces_locations: unsealed.pieces_locations,
deals: unsealed.deals,
cache_path,
sealed_path,
Expand Down Expand Up @@ -503,6 +520,11 @@ pub struct ProvenSector {
/// Indexes match with corresponding deals in [`Sector::deals`].
pub piece_infos: Vec<PieceInfo>,

/// Tracks locations of the actual pieces added to the unsealed sector. This
/// vector does not contain padding pieces. It contains the actual pieces
/// corresponding with the deals from the users.
pub pieces_locations: Vec<(Commitment<CommP>, PathBuf)>,

/// Tracks all of the deals that have been added to the sector.
pub deals: Vec<(DealId, DealProposal)>,

Expand All @@ -526,6 +548,7 @@ impl ProvenSector {
Self {
sector_number: sector.sector_number,
piece_infos: sector.piece_infos,
pieces_locations: sector.pieces_locations,
deals: sector.deals,
cache_path: sector.cache_path,
sealed_path: sector.sealed_path,
Expand Down
2 changes: 2 additions & 0 deletions storage-provider/server/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,14 @@ opencl = ["polka-storage-proofs/opencl", "polka-storage-proofs/std", "polka-stor
mater = { workspace = true }
polka-storage-proofs = { workspace = true, default-features = false }
polka-storage-provider-common = { workspace = true, features = ["clap"] }
polka-storage-retrieval = { workspace = true }
primitives = { workspace = true, features = ["clap", "serde", "std"] }
storagext = { workspace = true, features = ["clap"] }

async-trait = { workspace = true }
axum = { workspace = true, features = ["macros", "multipart"] }
base64 = { workspace = true }
blockstore = { workspace = true }
chrono = { workspace = true, features = ["serde"] }
ciborium = { workspace = true }
cid = { workspace = true, features = ["serde", "std"] }
Expand Down
11 changes: 11 additions & 0 deletions storage-provider/server/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ fn default_node_address() -> Url {
Url::parse(DEFAULT_NODE_ADDRESS).expect("DEFAULT_NODE_ADDRESS must be a valid Url")
}

fn default_retrieval_address() -> Multiaddr {
"/ip4/127.0.0.1/tcp/8002"
.parse()
.expect("multiaddres is correct")
}

#[derive(Debug, Clone, Deserialize, Args)]
#[group(multiple = true, conflicts_with = "config")]
#[serde(deny_unknown_fields)]
Expand All @@ -63,6 +69,11 @@ pub struct ConfigurationArgs {
#[arg(long, default_value_t = default_node_address())]
pub(crate) node_url: Url,

/// Storage provider retrieval service listen address.
#[serde(default = "default_retrieval_address")]
#[arg(long, default_value_t = default_retrieval_address())]
pub(crate) retrieval_listen_address: Multiaddr,

/// RocksDB storage directory.
/// Defaults to a temporary random directory, like `/tmp/<random>/deals_database`.
#[arg(long)]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ use cid::{
multihash::{self, Multihash},
Cid,
};
use primitives::{sector::SectorNumber, DealId};
use serde::{Deserialize, Serialize};
use uuid::Uuid;

pub mod rdb;
pub mod rdb_ext;

/// Convert a [`Multihash`] into a key (converts [`Multihash::digest`] to base-64).
///
Expand Down Expand Up @@ -182,16 +182,6 @@ impl Default for PieceInfo {
}
}

/// Identifier for a retrieval deal (unique to a client)
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize)]
pub struct DealId(u64);

impl From<u64> for DealId {
fn from(value: u64) -> Self {
Self(value)
}
}

// TODO(@jmg-duarte,14/06/2024): validate miner address

/// The storage provider address.
Expand All @@ -217,19 +207,6 @@ impl From<String> for StorageProviderAddress {
}
}

/// Numeric identifier for a sector. It is usually relative to a storage provider.
///
/// For more information on sectors, see:
/// <https://spec.filecoin.io/#section-systems.filecoin_mining.sector>
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize)]
pub struct SectorNumber(u64);

impl From<u64> for SectorNumber {
fn from(value: u64) -> Self {
Self(value)
}
}

/// Information about a single *storage* deal for a given piece.
///
/// Source:
Expand Down
Loading