Skip to content

Commit

Permalink
chore: fix typo in docs (apache#4378)
Browse files Browse the repository at this point in the history
Signed-off-by: ZhangJian He <[email protected]>
  • Loading branch information
hezhangjian authored and Anup Ghatage committed Jul 12, 2024
1 parent d115386 commit 92dfb95
Show file tree
Hide file tree
Showing 109 changed files with 197 additions and 197 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
* that is used to proportionally control what features are enabled for the system.
*
* <p>In other words, it is a way of altering the control in a system without restarting it.
* It can be used during all stages of developement, its most visible use case is on production.
* It can be used during all stages of development, its most visible use case is on production.
* For instance, during a production release, you can enable or disable individual features,
* control the data flow through the system, thereby minimizing risk of system failures
* in real time.
Expand Down
2 changes: 1 addition & 1 deletion docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Bookkeeper configuration is located in `/opt/bookkeeper/conf` in the docker cont

There are 2 ways to set Bookkeeper configuration:

1, Apply setted (e.g. docker -e kk=vv) environment variables into configuration files. Environment variable names is in format "BK_originalName", in which "originalName" is the key in config files.
1, Apply set (e.g. docker -e kk=vv) environment variables into configuration files. Environment variable names is in format "BK_originalName", in which "originalName" is the key in config files.

2, If you are able to handle your local volumes, use `docker --volume` command to bind-mount your local configure volumes to `/opt/bookkeeper/conf`.

Expand Down
2 changes: 1 addition & 1 deletion site3/website/docs/admin/geo-replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ Let's say that you want to set up geo-replication across clusters in regions A,
The crucial difference between using cluster-specific ZooKeeper and global ZooKeeper is that bookies is that you need to point all bookies to use the global ZooKeeper setup.

## Region-aware placement polocy
## Region-aware placement policy

## Autorecovery
4 changes: 2 additions & 2 deletions site3/website/docs/api/ledger-adv-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ It allows user passing in an `entryId` when adding an entry.

### Creating advanced ledgers

Here's an exmaple:
Here's an example:

```java
byte[] passwd = "some-passwd".getBytes();
Expand Down Expand Up @@ -61,7 +61,7 @@ LedgerHandleAdv handle = bkClient.createLedgerAdv(
> If a ledger already exists when users try to create an advanced ledger with same ledger id,
> a [LedgerExistsException]({{ site.javadoc_base_url }}/org/apache/bookkeeper/client/BKException.BKLedgerExistException.html) is thrown by the bookkeeper client.
Creating advanced ledgers can be done throught a fluent API since 4.6.
Creating advanced ledgers can be done through a fluent API since 4.6.

```java
BookKeeper bk = ...;
Expand Down
2 changes: 1 addition & 1 deletion site3/website/docs/api/ledger-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -667,7 +667,7 @@ ReadHandle rh = bk.newOpenLedgerOp()
If you are opening a ledger in "Recovery" mode, it will basically fence and seal the ledger -- no more entries are allowed
to be appended to it. The writer which is currently appending entries to the ledger will fail with [`LedgerFencedException`]({{ site.javadoc_base_url }}/org/apache/bookkeeper/client/api/BKException.Code#LedgerFencedException).
In constrat, opening a ledger in "NoRecovery" mode, it will not fence and seal the ledger. "NoRecovery" mode is usually used by applications to tailing-read from a ledger.
In constraint, opening a ledger in "NoRecovery" mode, it will not fence and seal the ledger. "NoRecovery" mode is usually used by applications to tailing-read from a ledger.
### Read entries from ledgers
Expand Down
2 changes: 1 addition & 1 deletion site3/website/docs/api/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ The `Ledger API` provides direct access to ledgers and thus enables you to use B

However, in most of use cases, if you want a `log stream`-like abstraction, it requires you to manage things like tracking list of ledgers,
managing rolling ledgers and data retention on your own. In such cases, you are recommended to use [DistributedLog API](distributedlog-api),
with semantics resembling continous log streams from the standpoint of applications.
with semantics resembling continuous log streams from the standpoint of applications.
2 changes: 1 addition & 1 deletion site3/website/docs/development/protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ A ledger's metadata contains the following:

Parameter | Name | Meaning
:---------|:-----|:-------
Identifer | | A 64-bit integer, unique within the system
Identifier | | A 64-bit integer, unique within the system
Ensemble size | **E** | The number of nodes the ledger is stored on
Write quorum size | **Q<sub>w</sub>** | The number of nodes each entry is written to. In effect, the max replication for the entry.
Ack quorum size | **Q<sub>a</sub>** | The number of nodes an entry must be acknowledged on. In effect, the minimum replication for the entry.
Expand Down
2 changes: 1 addition & 1 deletion site3/website/docs/getting-started/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ For example, ledger 0000000001 is split into three parts, 00, 0000, and 00001, a

### Flat ledger manager

> deprecated since 4.7.0, not recommand now.
> deprecated since 4.7.0, not recommend now.
The *flat ledger manager*, implemented in the [`FlatLedgerManager`]({{ site.javadoc_base_url }}/org/apache/bookkeeper/meta/FlatLedgerManager.html) class, stores all ledgers' metadata in child nodes of a single ZooKeeper path. The flat ledger manager creates [sequential nodes](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Sequence+Nodes+--+Unique+Naming) to ensure the uniqueness of the ledger ID and prefixes all nodes with `L`. Bookie servers manage their own active ledgers in a hash map so that it's easy to find which ledgers have been deleted from ZooKeeper and then garbage collect them.

Expand Down
10 changes: 5 additions & 5 deletions site3/website/docs/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ The table below lists parameters that you can set to configure bookies. All conf

| Parameter | Description | Default
| --------- | ----------- | ------- |
| diskUsageThreshold | For each ledger dir, maximum disk space which can be used. Default is 0.95f. i.e. 95% of disk can be used at most after which nothing will be written to that partition. If all ledger dir partions are full, then bookie will turn to readonly mode if 'readOnlyModeEnabled=true' is set, else it will shutdown. Valid values should be in between 0 and 1 (exclusive).<br /> | 0.95 |
| diskUsageThreshold | For each ledger dir, maximum disk space which can be used. Default is 0.95f. i.e. 95% of disk can be used at most after which nothing will be written to that partition. If all ledger dir partitions are full, then bookie will turn to readonly mode if 'readOnlyModeEnabled=true' is set, else it will shutdown. Valid values should be in between 0 and 1 (exclusive).<br /> | 0.95 |
| diskUsageWarnThreshold | The disk free space low water mark threshold. Disk is considered full when usage threshold is exceeded. Disk returns back to non-full state when usage is below low water mark threshold. This prevents it from going back and forth between these states frequently when concurrent writes and compaction are happening. This also prevent bookie from switching frequently between read-only and read-writes states in the same cases. | 0.95 |
| diskUsageLwmThreshold | Set the disk free space low water mark threshold. Disk is considered full when usage threshold is exceeded. Disk returns back to non-full state when usage is below low water mark threshold. This prevents it from going back and forth between these states frequently when concurrent writes and compaction are happening. This also prevent bookie from switching frequently between read-only and read-writes states in the same cases.<br /> | 0.9 |
| diskCheckInterval | Disk check interval in milliseconds. Interval to check the ledger dirs usage. | 10000 |
Expand All @@ -218,8 +218,8 @@ The table below lists parameters that you can set to configure bookies. All conf
| fileInfoCacheInitialCapacity | The minimum total size of the internal file info cache table. Providing a large enough estimate at construction time avoids the need for expensive resizing operations later,<br />but setting this value unnecessarily high wastes memory. The default value is `1/4` of `openFileLimit` if openFileLimit is positive, otherwise it is 64.<br /> | |
| fileInfoMaxIdleTime | The max idle time allowed for an open file info existed in the file info cache. If the file info is idle for a long time, exceed the given time period. The file info will be<br />evicted and closed. If the value is zero or negative, the file info is evicted only when opened files reached `openFileLimit`.<br /> | |
| fileInfoFormatVersionToWrite | The fileinfo format version to write.<br />Available formats are 0-1:<br /> 0: Initial version<br /> 1: persisting explicitLac is introduced<br /><br />By default, it is `1`. If you'd like to disable persisting ExplicitLac, you can set this config to 0 and also journalFormatVersionToWrite should be set to < 6. If there is mismatch then the serverconfig is considered invalid.<br /> | 1 |
| pageSize | Size of a index page in ledger cache, in bytes. A larger index page can improve performance writing page to disk, which is efficent when you have small number of ledgers and these ledgers have similar number of entries. If you have large number of ledgers and each ledger has fewer entries, smaller index page would improve memory usage.<br /> | 8192 |
| pageLimit | How many index pages provided in ledger cache. If number of index pages reaches this limitation, bookie server starts to swap some ledgers from memory to disk. You can increment this value when you found swap became more frequent. But make sure pageLimit*pageSize should not more than JVM max memory limitation, otherwise you would got OutOfMemoryException. In general, incrementing pageLimit, using smaller index page would gain bettern performance in lager number of ledgers with fewer entries case. If pageLimit is -1, bookie server will use 1/3 of JVM memory to compute the limitation of number of index pages.<br /> | -1 |
| pageSize | Size of a index page in ledger cache, in bytes. A larger index page can improve performance writing page to disk, which is efficient when you have small number of ledgers and these ledgers have similar number of entries. If you have large number of ledgers and each ledger has fewer entries, smaller index page would improve memory usage.<br /> | 8192 |
| pageLimit | How many index pages provided in ledger cache. If number of index pages reaches this limitation, bookie server starts to swap some ledgers from memory to disk. You can increment this value when you found swap became more frequent. But make sure pageLimit*pageSize should not more than JVM max memory limitation, otherwise you would got OutOfMemoryException. In general, incrementing pageLimit, using smaller index page would gain better performance in lager number of ledgers with fewer entries case. If pageLimit is -1, bookie server will use 1/3 of JVM memory to compute the limitation of number of index pages.<br /> | -1 |
| numOfMemtableFlushThreads | When entryLogPerLedger is enabled SortedLedgerStorage flushes entries from memTable using OrderedExecutor having numOfMemtableFlushThreads number of threads.<br /> | 8 |


Expand All @@ -228,7 +228,7 @@ The table below lists parameters that you can set to configure bookies. All conf
| Parameter | Description | Default
| --------- | ----------- | ------- |
| dbStorage_writeCacheMaxSizeMb | Size of write cache. Memory is allocated from JVM direct memory. Write cache is used for buffer entries before flushing into the entry log. For good performance, it should be big enough to hold a substantial amount of entries in the flush interval. | 25% of the available direct memory |
| dbStorage_readAheadCacheMaxSizeMb | Size of read cache. Memory is allocated from JVM direct memory. The read cache is pre-filled doing read-ahead whenever a cache miss happens. | 25% of the available direct memroy |
| dbStorage_readAheadCacheMaxSizeMb | Size of read cache. Memory is allocated from JVM direct memory. The read cache is pre-filled doing read-ahead whenever a cache miss happens. | 25% of the available direct memory |
| dbStorage_readAheadCacheBatchSize | How many entries to pre-fill in cache after a read cache miss | 100 |
| dbStorage_rocksDB_blockSize | Size of RocksDB block-cache. RocksDB is used for storing ledger indexes.<br />For best performance, this cache should be big enough to hold a significant portion of the index database which can reach ~2GB in some cases.<br /> | 268435456 |
| dbStorage_rocksDB_writeBufferSizeMB | Size of RocksDB write buffer. RocksDB is used for storing ledger indexes.<br /> | 64 |
Expand Down Expand Up @@ -270,7 +270,7 @@ The table below lists parameters that you can set to configure bookies. All conf
| zkTimeout | ZooKeeper client session timeout in milliseconds. Bookie server will exit if it received SESSION_EXPIRED because it was partitioned off from ZooKeeper for more than the session timeout JVM garbage collection, disk I/O will cause SESSION_EXPIRED. Increment this value could help avoiding this issue. | 10000 |
| zkRetryBackoffStartMs | The Zookeeper client backoff retry start time in millis. | 1000 |
| zkRetryBackoffMaxMs | The Zookeeper client backoff retry max time in millis. | 10000 |
| zkRequestRateLimit | The Zookeeper request limit. It is only enabled when setting a postivie value. | |
| zkRequestRateLimit | The Zookeeper request limit. It is only enabled when setting a positive value. | |
| zkEnableSecurity | Set ACLs on every node written on ZooKeeper, this way only allowed users will be able to read and write BookKeeper metadata stored on ZooKeeper. In order to make ACLs work you need to setup ZooKeeper JAAS authentication all the bookies and Client need to share the same user, and this is usually done using Kerberos authentication. See ZooKeeper documentation | false |

## Statistics
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ if (lac > lastReadEntry) {
WriteHandle writer = bk.newCreateLedgerOp().execute().get();
```

Constrast this with how it is with the current recovery on open mechanism.
Contrast this with how it is with the current recovery on open mechanism.

```
ReadHandle reader = bk.newOpenLedgerOp().withLedgerId(X).execute().get();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
DistributedLog is an extension of Apache BookKeeper, which offers *reopenable* log streams as its storage primitives.
It is tightly built over bookkeeper ledgers, and provides an easier-to-use abstraction and api to use. Applications
can use *named* log streams rather than *numbered* ledgers to store their data. For example, users can use log streams
as files to storge objects, checkpoints and other more general filesystem related use cases.
as files to storage objects, checkpoints and other more general filesystem related use cases.

Moving the distributedlog core library as part of bookkeeper would have following benefits:

- It provides more generic "reopenable" log abstraction. It lowers the barrier for people to use bookkeeper to store
data, and bring in more use cases into bookkeeper ecosystem.
- Using ledgers to build continous log stream has been a pattern that been reimplemented multiple times at multiple places,
- Using ledgers to build continuous log stream has been a pattern that been reimplemented multiple times at multiple places,
from older projects like HDFS namenode log manager, Hedwig to the newer projects like DistributedLog and Pulsar.
- Most of the distributedlog usage is using the distributedlog library which only depends Apache BookKeeper and there is no
additional components introduced. To simplify those usages, it is better to release distributedlog library along with
Expand Down
2 changes: 1 addition & 1 deletion site3/website/src/pages/bps/BP-27-new-bookkeeper-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Usage: bookie-shell cluster [options] [command] [command options]
### Proposed Changes

- Introduced a new module called `bookkeeper-tools` for developing the new CLI.
- The new CLI will use [JCommander](http://jcommander.org) for parse command line paramters: better on supporting this proposal commandline syntax.
- The new CLI will use [JCommander](http://jcommander.org) for parse command line parameters: better on supporting this proposal commandline syntax.
- All the actual logic of the commands will be organized under `org.apache.bookkeeper.tools.cli.commands`. Each command group has its own subpackage and each command will be a class file under that command-group subpackage.
Doing this provides better testability, since the command logic is limited in one file rather than in a gaint shell class. Proposed layout can be found [here](https://github.com/apache/bookkeeper/tree/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/tools/cli/commands).
- For each command: the logic of a command will be moved out of `BookieShell` to its own class `org.apache.bookkeeper.tools.cli.commands.<command-group>.<CommandClass>.java`. The old BookieShell will use the new Command class and delegate the actual logic.
Expand Down
6 changes: 3 additions & 3 deletions site3/website/src/pages/bps/BP-31-durability.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Every delete must go through single routine/path in the code and that needs to i
### Archival bit in the metadata to assist Two phase Deletes
Main aim of this feature is to be as conservative as possible on the delete path. As explained in the stateful explicit deletes section, lack of ledgerId in the metadata means that ledger is deleted. A bug in the client code may erroneously delete the ledger. To protect from that, we want to introduce a archive/backedup bit. A separate backup/archival application can mark the bit after successfully backing up the ledger, and later on main client application will send the delete. If this feature is enabled, BK client will reject and throw an exception if it receives a delete request without archival/backed-up bit is not set. This protects the data from bugs and erroneous deletes.

### Stateful explicit deltes
### Stateful explicit deletes
Current bookkeeper deletes synchronously deletes the metadata in the zookeeper. Bookies implicitly assume that a particular ledger is deleted if it is not present in the metadata. This process has no crosscheck if the ledger is actually deleted. Any ZK corruption or loss of the ledger path znodes will make bookies to delete data on the disk. No cross check. Even bugs in bookie code which ‘determines’ if a ledger is present on the zk or not, may lead to data deletion.

Right way to deal with this is to asynchronously delete metadata after each bookie explicitly checks that a particular ledger is deleted. This way each bookie explicitly checks the ‘delete state’ of the ledger before deleting on the disk data. One of the proposal is to move the deleted ledgers under /deleted/&lt;ledgerId&gt; other idea is to add a delete state, Open->Closed->Deleted.
Expand All @@ -96,9 +96,9 @@ If a bookie is down for long time, what would be the delete policy for the metad
There will be lots of corner case scenarios we need to deal with. For example:
A bookie-1 hosting data for ledger-1 is down for long time
Ledger-1 data has been replicated to other bookies
Ledger-1 is deleted, and its data and metadata is clared.
Ledger-1 is deleted, and its data and metadata is cleared.
Now bookie-1 came back to life. Since our policy is ‘explicit state check delete’ bookie-1 can’t delete ledger-1 data as it can’t explicitly validate that the ledger-1 has been deleted.
One possible solution: keep tomestones of deleted ledgers around for some duration. If a bookie is down for more than that duration, it needs to be decommissioned and add as a new bookie.
One possible solution: keep tombstones of deleted ledgers around for some duration. If a bookie is down for more than that duration, it needs to be decommissioned and add as a new bookie.
Enhance: Archival bit in the metadata to assist Two phase Deletes
Main aim of this feature is to be as conservative as possible on the delete path. As explained in the stateful explicit deletes section, lack of ledgerId in the metadata means that ledger is deleted. A bug in the client code may erroneously delete the ledger. To protect from that, we want to introduce a archive/backedup bit. A separate backup/archival application can mark the bit after successfully backing up the ledger, and later on main client application will send the delete. If this feature is enabled, BK client will reject and throw an exception if it receives a delete request without archival/backed-up bit is not set. This protects the data from bugs and erroneous deletes.

Expand Down
Loading

0 comments on commit 92dfb95

Please sign in to comment.