From 64954799c113708377f431bc4a57395273f43fce Mon Sep 17 00:00:00 2001 From: Xiaochang Wu Date: Fri, 30 Apr 2021 12:32:35 +0800 Subject: [PATCH 1/2] [ML-23] Add OAP MLlib Specific Configuration to README #58 --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index fe9715885..731a78e70 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,8 @@ Intel® oneAPI Toolkits components used by the project are already included into ### Spark Configuration +#### General Configuration + Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. ``` @@ -56,6 +58,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar spark.executor.extraClassPath ./oap-mllib-x.x.x.jar ``` +#### OAP MLlib Specific Configuration + +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executors. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. + ### Sanity Check #### Setup `env.sh` From 6d89e1e35e9710f62b4a9aa429440aa224d4967b Mon Sep 17 00:00:00 2001 From: Hong Date: Fri, 30 Apr 2021 13:24:06 +0800 Subject: [PATCH 2/2] [ML-23] Update the documents and license (#59) --- README.md | 6 +++++- docs/OAP-Installation-Guide.md | 2 +- docs/User-Guide.md | 13 ++++++------- docs/index.md | 9 ++++++--- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 731a78e70..f5915f5bc 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,7 @@ +##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details. + +##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0). + # OAP MLlib ## Overview @@ -60,7 +64,7 @@ spark.executor.extraClassPath ./oap-mllib-x.x.x.jar #### OAP MLlib Specific Configuration -OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executors. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. ### Sanity Check diff --git a/docs/OAP-Installation-Guide.md b/docs/OAP-Installation-Guide.md index c269b978e..ca1a6f558 100644 --- a/docs/OAP-Installation-Guide.md +++ b/docs/OAP-Installation-Guide.md @@ -36,7 +36,7 @@ Once finished steps above, you have completed OAP dependencies installation and Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps. -- [Arrow](https://github.com/Intel-bigdata/arrow) +- [Arrow](https://github.com/oap-project/arrow/tree/arrow-3.0.0-oap-1.1) - [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/) - [Memkind](https://anaconda.org/intel/memkind) - [Vmemcache](https://anaconda.org/intel/vmemcache) diff --git a/docs/User-Guide.md b/docs/User-Guide.md index 34331ccce..3425d57e3 100644 --- a/docs/User-Guide.md +++ b/docs/User-Guide.md @@ -1,7 +1,3 @@ -##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details. - -##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0). - # OAP MLlib ## Overview @@ -13,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range. For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used. -## Online Documentation - -You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib). ## Getting Started @@ -49,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into ### Spark Configuration +#### General Configuration + Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. ``` @@ -60,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar spark.executor.extraClassPath ./oap-mllib-x.x.x.jar ``` +#### OAP MLlib Specific Configuration + +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. + ### Sanity Check #### Setup `env.sh` diff --git a/docs/index.md b/docs/index.md index 9fb2e0396..3425d57e3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range. For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used. -## Online Documentation - -You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib). ## Getting Started @@ -45,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into ### Spark Configuration +#### General Configuration + Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. ``` @@ -56,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar spark.executor.extraClassPath ./oap-mllib-x.x.x.jar ``` +#### OAP MLlib Specific Configuration + +OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point. + ### Sanity Check #### Setup `env.sh`