Skip to content

Commit becf18b

Browse files
testgroundbotgitbook-bot
authored andcommitted
GITBOOK-255: change request with no subject merged in GitBook
1 parent fb36e84 commit becf18b

9 files changed

+141
-1
lines changed
200 KB
Loading
215 KB
Loading
200 KB
Loading
468 KB
Loading

Diff for: README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ Boost exposes libp2p interfaces for making storage and retrieval deals, a web in
88

99
![Web UI - Storage Space screen](<.gitbook/assets/Boost - storage space.png>)
1010

11-
![Web UI - Sealing Pipeline screen](<.gitbook/assets/Boost - sealing pipeline (1).png>)
11+
![Web UI - Sealing Pipeline screen](<.gitbook/assets/Boost - sealing pipeline.png>)

Diff for: SUMMARY.md

+3
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@
3232
* [Troubleshooting](troubleshooting.md)
3333
* [Experimental Features](experimental-features/README.md)
3434
* [FVM Contract Deals](experimental-features/fvm-contract-deals.md)
35+
* [Local Index Directory](experimental-features/local-index-directory/README.md)
36+
* [Architecture](experimental-features/local-index-directory/architecture.md)
37+
* [Initialisation](experimental-features/local-index-directory/initialisation.md)
3538
* [GraphQL API](graphql-api.md)
3639
* [JSON-RPC API](json-rpc-api.md)
3740
* [FAQ](faq.md)
+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
description: >-
3+
This page describes the Local Index Directory component in Boost, what it is
4+
used for, how it works and how to start using it
5+
---
6+
7+
# Local Index Directory
8+
9+
## Background
10+
11+
The Local Index Directory (_LID_) manages and stores indices of deal data so that it can be retrieved by a content identifier (_cid_).
12+
13+
Currently this task is performed by the _DAG store_ component. The DAG store keeps its indexes on disk on a single machine. LID replaces the DAG store and introduces a horizontally scalable backend database for storing the data - YugabyteDB.
14+
15+
LID is designed to provide a more intuitive experience for the user, by surfacing problems and providing various repair tools.
16+
17+
To summarize, LID is the component which keeps fine-grained metadata about all the deals on Filecoin that a given Storage Provider stores, and without it client would only be able to retrieve full pieces, which generally are between 8GiB and 32GiB in size.
18+
19+
## Storing data on Filecoin
20+
21+
When a client uploads deal data to Boost, LID records the sector that the deal data is stored in and scans the deal data to create an index of all its blocks indexed by block cid. This way cilents can later retrieve subsets of the original deal data, without retrieving the full deal data.
22+
23+
<figure><img src="../../.gitbook/assets/Screenshot 2023-05-18 at 13.39.53.png" alt=""><figcaption><p>How Boost stores deal data from clients</p></figcaption></figure>
24+
25+
## Retrieving data
26+
27+
When a client makes a request for data by cid, LID:\
28+
\- checks which piece the cid is in, and where in the piece the data is\
29+
\- checks which sector the piece is in, and where in the sector the piece is\
30+
\- reads the data from the sector
31+
32+
<figure><img src="../../.gitbook/assets/Screenshot 2023-05-18 at 13.45.14.png" alt=""><figcaption><p>How clients retrieve their data from Boost</p></figcaption></figure>
33+
34+
## Use cases
35+
36+
The retrieval use cases that the Local Index Directory supports are:
37+
38+
#### Graphsync retrieval
39+
40+
_Request one root cid with a selector, receive many blocks_
41+
42+
LID is able to:\
43+
\- look up which piece contains the root cid\
44+
\- look up which sector contains the piece\
45+
\- for each block, get the offset into the piece for the block
46+
47+
#### Bitswap retrieval
48+
49+
_Request one block at a time_
50+
51+
LID is able to:\
52+
\- look up which piece contains the block\
53+
\- get the size of the block (Bitswap asks for the size before getting the block data)\
54+
\- look up which sector contains the piece\
55+
\- get the offset into the piece for the block
56+
57+
#### HTTP retrieval
58+
59+
_Request a whole piece_
60+
61+
LID is able to look up which sector contains the piece.
62+
63+
_Request an individual block_
64+
65+
LID is able to:\
66+
\- look up which piece contains the block\
67+
\- look up which sector contains the piece\
68+
\- get the offset into the piece for the block
69+
70+
_Request a file by root cid_
71+
72+
LID is able to:\
73+
\- look up which piece contains the block\
74+
\- look up which sector contains the piece\
75+
\- for each block, get the offset into the piece for the block
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
description: Local Index Directory architecture and index types
3+
---
4+
5+
# Architecture
6+
7+
When designing the Local Index Directory we considered the needs of various Storage Providers (SPs) and the operational overhead LID would have on their systems. We built a solution for:\
8+
\- small- SPs - holding up to 1PiB), and\
9+
\- mid- and large- size SPs - holding anywhere from 1PiB, up to 100PiB data
10+
11+
Depending on underlying block size and data format, index size can vary in size. Typically block sizes are between 16KiB and 1MiB.
12+
13+
At the moment there are two implementations of LID:\
14+
\- a simple LevelDB implementation, for small SPs who want to keep all information in a single process database.\
15+
\- a scalable YugabyteDB implementation, for medium and large size SPs with tens of thousands of deals.
16+
17+
## Index types
18+
19+
In order to support the described retrieval use cases, LID maintains the following indexes:
20+
21+
#### multihash → \[]piece cid
22+
23+
To look up which pieces contain a block
24+
25+
#### piece cid → sector information {sector ID, offset, size}
26+
27+
To look up which sector a piece is in
28+
29+
#### piece cid → map\<mulithash → block offset / size>
30+
31+
To look up where in the piece a block is and the block’s size
32+
33+
<figure><img src="../../.gitbook/assets/Screenshot 2023-05-18 at 14.01.13.png" alt=""><figcaption></figcaption></figure>
34+
35+
<figure><img src="../../.gitbook/assets/Screenshot 2023-05-18 at 14.01.23.png" alt=""><figcaption></figcaption></figure>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
description: >-
3+
This page explains how to initialise LID and start using it to provide
4+
retrievals to clients
5+
---
6+
7+
# Initialisation
8+
9+
Considering that the Local Index Directory is a new feature, Storage Providers should initialise it after upgrading their Boost deployments.
10+
11+
There are two ways a Storage Provider can do that:
12+
13+
1. **Migrate existing indices from the DAG store into LID**: this solution assumes that the Storage Provider has been keeping an _unsealed_ copy for every sector they prove on-chain, and has already indexed all their deal data into the DAG store.\
14+
\
15+
Typically index sizes for a given sector range between 100KiB up to 1GiB, depending on deal data and its blocks sizes. The DAG store keeps these indices in the repository directory of Boost under the `./dagstore/index` and `./dagstore/datastore` directories. This data should be migrated to LID with the `migrate-lid` utility.\
16+
17+
2. **Recreate indices for deal data based on unsealed copies of sectors**: this solution assumes that the Storage Provider has _unsealed copies_ for every sector they prove on-chain. If this is not the case, then the SP should first trigger an _unseal (UNS)_ job on their system for every sector that contains user data and produce an unseal copy.\
18+
\
19+
SPs can use the `boostd recover lid` utility to produce an index for all deal data within an unsealed sector and store it in LID so that they enable retrievals for the data. Depending on SPs deployment and where unsealed copies are hosted (NFS, Ceph, external disks, etc.) and the performance of the hosting system, producing an index for a 32GiB sector can take anywhere from a few seconds up to a few minutes, as the unsealed copy needs to be processed by the utility.
20+
21+
## Migrate existing indices from the DAG store into LID
22+
23+
TODO
24+
25+
## Recreate indices for deal data based on unsealed copies of sectors
26+
27+
TODO

0 commit comments

Comments
 (0)