Skip to content

Commit

Permalink
Merge pull request #79 from HongW2019/doc-1.1.1-r
Browse files Browse the repository at this point in the history
[ML-71]Backport master docs to branch-1.1-spark-3.1.1
  • Loading branch information
zhixingheyi-tian authored Jun 11, 2021
2 parents 5f4a75e + fc243b9 commit 4885d32
Show file tree
Hide file tree
Showing 12 changed files with 13,808 additions and 986 deletions.
423 changes: 422 additions & 1 deletion CHANGELOG.md

Large diffs are not rendered by default.

1,957 changes: 1,957 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

201 changes: 0 additions & 201 deletions LICENSE.txt

This file was deleted.

25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.

##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

# OAP MLlib

## Overview
Expand All @@ -17,13 +21,13 @@ You can find the all the OAP MLlib documents on the [project web page](https://o

### Java/Scala Users Preferred

Use a pre-built OAP MLlib JAR to get started. You can firstly download OAP package from [OAP-JARs-Tarball](https://github.com/Intel-bigdata/OAP/releases/download/v1.1.0-spark-3.0.0/oap-1.1.0-bin-spark-3.0.0.tar.gz) and extract this Tarball to get `oap-mllib-x.x.x-with-spark-x.x.x.jar` under `oap-1.1.0-bin-spark-3.0.0/jars`.
Use a pre-built OAP MLlib JAR to get started. You can firstly download OAP package from [OAP-JARs-Tarball](https://github.com/oap-project/oap-tools/releases/download/v1.1.1-spark-3.1.1/oap-1.1.1-bin-spark-3.1.1.tar.gz) and extract this Tarball to get `oap-mllib-x.x.x.jar` under `oap-1.1.1-bin-spark-3.1.1/jars`.

Then you can refer to the following [Running](#running) section to try out.

### Python/PySpark Users Preferred

Use a pre-built JAR to get started. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled OAP MLlib JAR `oap-mllib-x.x.x-with-spark-x.x.x.jar` in `$HOME/miniconda2/envs/oapenv/oap_jars/`.
Use a pre-built JAR to get started. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled OAP MLlib JAR `oap-mllib-x.x.x.jar` in `$HOME/miniconda2/envs/oapenv/oap_jars/`.

Then you can refer to the following [Running](#running) section to try out.

Expand All @@ -49,13 +53,17 @@ Users usually run Spark application on __YARN__ with __client__ mode. In that ca

```
# absolute path of the jar for uploading
spark.files /path/to/oap-mllib-x.x.x-with-spark-x.x.x.jar
spark.files /path/to/oap-mllib-x.x.x.jar
# absolute path of the jar for driver class path
spark.driver.extraClassPath /path/to/oap-mllib-x.x.x-with-spark-x.x.x.jar
spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
# relative path to spark.files, just specify jar name in current dir
spark.executor.extraClassPath ./oap-mllib-x.x.x-with-spark-x.x.x.jar
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
```

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

#### Setup `env.sh`
Expand Down Expand Up @@ -103,10 +111,10 @@ Intel® oneAPI Toolkits and its components can be downloaded and install from [h

More details about oneAPI can be found [here](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html).

You can refer to [this script](dev/install-build-deps-centos.sh) to install correct dependencies.

Scala and Java dependency descriptions are already included in Maven POM file.

***Note:*** You can refer to [this script](dev/install-build-deps-centos.sh) to install correct dependencies: DPC++/C++, oneDAL, oneTBB, oneCCL.

### Build

#### Building oneCCL
Expand Down Expand Up @@ -161,12 +169,13 @@ To build, run the following commands:
$ cd mllib-dal
$ ./build.sh
```

The target can be built against different Spark versions by specifying profile with <spark-x.x.x>. E.g.
```
$ ./build.sh spark-3.1.1
```
If no profile parameter is given, the Spark version 3.0.0 will be activated by default.
The built JAR package will be placed in `target` directory with the name `oap-mllib-x.x.x-with-spark-x.x.x.jar`.
The built JAR package will be placed in `target` directory with the name `oap-mllib-x.x.x.jar`.

## Examples

Expand Down
Loading

0 comments on commit 4885d32

Please sign in to comment.