Skip to content

Commit

Permalink
[ML-23] Update the documents and license (#59)
Browse files Browse the repository at this point in the history
  • Loading branch information
Hong committed Apr 30, 2021
1 parent 6495479 commit 6d89e1e
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 12 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.

##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

# OAP MLlib

## Overview
Expand Down Expand Up @@ -60,7 +64,7 @@ spark.executor.extraClassPath ./oap-mllib-x.x.x.jar

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executors. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.
OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

Expand Down
2 changes: 1 addition & 1 deletion docs/OAP-Installation-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Once finished steps above, you have completed OAP dependencies installation and

Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps.

- [Arrow](https://github.com/Intel-bigdata/arrow)
- [Arrow](https://github.com/oap-project/arrow/tree/arrow-3.0.0-oap-1.1)
- [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/)
- [Memkind](https://anaconda.org/intel/memkind)
- [Vmemcache](https://anaconda.org/intel/vmemcache)
Expand Down
13 changes: 6 additions & 7 deletions docs/User-Guide.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.

##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

# OAP MLlib

## Overview
Expand All @@ -13,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in
OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range.
For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used.

## Online Documentation

You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib).

## Getting Started

Expand Down Expand Up @@ -49,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into

### Spark Configuration

#### General Configuration

Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.

```
Expand All @@ -60,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
```

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

#### Setup `env.sh`
Expand Down
9 changes: 6 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in
OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range.
For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used.

## Online Documentation

You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib).

## Getting Started

Expand Down Expand Up @@ -45,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into

### Spark Configuration

#### General Configuration

Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.

```
Expand All @@ -56,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
```

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

#### Setup `env.sh`
Expand Down

0 comments on commit 6d89e1e

Please sign in to comment.