Skip to content

Commit 75fbc01

Browse files
authored
Merge pull request #1589 from oneapi-src/2023.1.1_AIKit
2023.1.1 AI Kit Release
2 parents db68fa5 + 9aa4bf1 commit 75fbc01

File tree

114 files changed

+8158
-2014
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+8158
-2014
lines changed

.github/workflows/github-pages.yml

+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
name: github-samples-app
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
8+
workflow_dispatch:
9+
10+
# schedule:
11+
# - cron: '55 13 * * *'
12+
13+
jobs:
14+
pages:
15+
name: Build GitHub Pages
16+
runs-on: ubuntu-latest
17+
18+
steps:
19+
- name: Set up Python
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: "3.8"
23+
24+
- uses: actions/checkout@v3
25+
name: Check out app/dev # checks out app/dev in top-level dir
26+
with:
27+
ref: 'refs/heads/app/dev'
28+
29+
- uses: actions/checkout@v3
30+
name: Check out master # checks out master in subdirectory
31+
with:
32+
ref: 'refs/heads/master'
33+
path: master
34+
35+
- name: Build JSON DB
36+
run: |
37+
python3 -m pip install -r src/requirements.txt
38+
echo master
39+
python3 src/db.py master
40+
41+
- name: Remove JSON pre-prod
42+
run: |
43+
rm -rf src/docs/sample_db_pre.json
44+
45+
- name: Build Sphinx
46+
run: |
47+
python3 -m sphinx -W -b html src/docs/ src/docs/_build/
48+
echo $PWD
49+
echo ${{ github.ref }}
50+
51+
- name: Add GPU-Occupancy-Calculator
52+
env:
53+
GPU_OCC_CALC: src/docs/_build/Tools/GPU-Occupancy-Calculator/
54+
run: |
55+
mkdir -p ${GPU_OCC_CALC}
56+
cp -v ${{ github.workspace }}/master/Tools/GPU-Occupancy-Calculator/index.html ${GPU_OCC_CALC}/index.html
57+
58+
- name: Push docs
59+
if: ${{ github.ref == 'refs/heads/master' }} # only if this workflow is run from the master branch, push docs
60+
env:
61+
GITHUB_USER: ${{ github.actor }}
62+
GITHUB_TOKEN: ${{ github.token }}
63+
GITHUB_REPO: ${{ github.repository }}
64+
run: |
65+
cd src/docs/_build/
66+
touch .nojekyll
67+
git init
68+
git remote add origin "https://${GITHUB_USER}:${GITHUB_TOKEN}@github.com/${GITHUB_REPO}"
69+
git add -A
70+
git status
71+
git config --global user.name "GitHub Actions"
72+
git config --global user.email "[email protected]"
73+
git commit -sm "$(date)"
74+
git branch -M gh-pages
75+
git push -u origin -f gh-pages

AI-and-Analytics/End-to-end-Workloads/Census/README.md

+26-23
Original file line numberDiff line numberDiff line change
@@ -11,22 +11,22 @@ The `Census` sample code illustrates how to use Intel® Distribution of Modin* f
1111
## Purpose
1212
This sample code demonstrates how to run the end-to-end census workload using the AI Toolkit without any external dependencies.
1313

14-
Intel® Distribution of Modin* uses Ray to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel® Distribution of Modin* provides integration and compatibility with existing Pandas code. Intel® Extension for Scikit-learn* dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver to get the solution faster.
14+
Intel® Distribution of Modin* uses HDK to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel® Distribution of Modin* provides integration and compatibility with existing Pandas code. Intel® Extension for Scikit-learn* dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver to get the solution faster.
1515

1616
## Prerequisites
1717

1818
| Optimized for | Description
1919
| :--- | :---
2020
| OS | 64-bit Ubuntu* 18.04 or higher
2121
| Hardware | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
22-
| Software | Intel® AI Analytics Toolkit (AI Kit) (Python version 3.7, Intel® Distribution of Modin*) <br> Intel® Extension for Scikit-learn* <br> NumPy <br> Ray
22+
| Software | Intel® AI Analytics Toolkit (AI Kit) (Python version 3.8 or newer, Intel® Distribution of Modin*) <br> Intel® Extension for Scikit-learn* <br> NumPy
2323

2424
The Intel® Distribution of Modin* and Intel® Extension for Scikit-learn* libraries are available together in [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
2525

2626

2727
## Key Implementation Details
2828

29-
This end-to-end workload sample code is implemented for CPU using the Python language. Once you have installed AI Kit, the Conda environment is prepared with Python version 3.7 (or newer), Intel Distribution of Modin*, Ray, Intel® Extension for Scikit-Learn, and NumPy.
29+
This end-to-end workload sample code is implemented for CPU using the Python language. Once you have installed AI Kit, the Conda environment is prepared with Python version 3.8 (or newer), Intel Distribution of Modin*, Intel® Extension for Scikit-Learn, and NumPy.
3030

3131
In this sample, you will use Intel® Distribution of Modin* to ingest and process U.S. census data from 1970 to 2010 in order to build a ridge regression-based model to find the relation between education and total income earned in the US.
3232

@@ -74,23 +74,29 @@ To learn more about the extensions and how to configure the oneAPI environment,
7474
7575
### On Linux*
7676

77-
1. Install the Intel® Distribution of Modin* python environment.
77+
1. Install the Intel® Distribution of Modin* python environment (Only python 3.8 - 3.10 are supported).
7878
```
79-
conda create -y -n intel-aikit-modin intel-aikit-modin -c intel
79+
conda create -n modin-hdk python=3.x -y
8080
```
8181
2. Activate the Conda environment.
8282
```
83-
conda activate intel-aikit-modin
83+
conda activate modin-hdk
8484
```
85-
3. Install Jupyter Notebook.
85+
3. Install modin-hdk, Intel® Extension for Scikit-learn* and related libraries.
8686
```
87-
conda install jupyter nb_conda_kernels
87+
conda install modin-hdk -c conda-forge -y
88+
pip install scikit-learn-intelex
89+
pip install matplotlib
8890
```
89-
4. Install OpenCensus.
91+
4. Install Jupyter Notebook
9092
```
91-
pip install opencensus
93+
pip install jupyter ipykernel
9294
```
93-
5. Change to the sample directory, and open Jupyter Notebook.
95+
5. Add kernel to Jupyter Notebook.
96+
```
97+
python -m ipykernel install --user --name modin-hdk
98+
```
99+
6. Change to the sample directory, and open Jupyter Notebook.
94100
```
95101
jupyter notebook
96102
```
@@ -127,20 +133,17 @@ To learn more about the extensions and how to configure the oneAPI environment,
127133
2. Open a web browser, and navigate to https://devcloud.intel.com. Select **Work with oneAPI**.
128134
3. From Intel® DevCloud for oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started), locate the ***Connect with Jupyter* Lab*** section (near the bottom).
129135
4. Click **Sign in to Connect** button. (If you are already signed in, the link should say ***Launch JupyterLab****.)
130-
5. Once JupyterLab opens, select **no kernel**.
131-
6. You might need to [clone the samples](#clone-the-samples-in-intel®-devcloud) from GitHub. If the samples are already present, skip this step.
132-
7. Change to the sample directory.
133-
8. Open `census_modin.ipynb`.
134-
9. Click **Run** to run the cells.
135-
10. Alternatively, run the entire workbook by selecting **Restart kernel and re-run whole notebook**.
136-
137-
#### Clone the Samples in Intel® DevCloud
138-
If the samples are not already present in your Intel® DevCloud account, download them.
139-
1. From JupyterLab, select **File** > **New** > **Terminal**.
140-
2. In the terminal, clone the samples from GitHub.
136+
5. Open a terminal from Launcher
137+
6. Follow [step 1-5](#on-linux) to create conda environment
138+
7. Clone the samples from GitHub. If the samples are already present, skip this step.
141139
```
142140
git clone https://github.com/oneapi-src/oneAPI-samples.git
143141
```
142+
8. Change to the sample directory.
143+
9. Open `census_modin.ipynb`.
144+
10. Select kernel "modin-hdk"
145+
11. Click **Run** to run the cells.
146+
12. Alternatively, run the entire workbook by selecting **Restart kernel and re-run whole notebook**.
144147

145148
## Example Output
146149

@@ -152,4 +155,4 @@ This is an example Cell Output for `census_modin.ipynb` run in Jupyter Notebook.
152155

153156
Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
154157

155-
Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
158+
Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).

AI-and-Analytics/End-to-end-Workloads/Census/census_modin.ipynb

+18-15
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
]
2222
},
2323
{
24+
"attachments": {},
2425
"cell_type": "markdown",
2526
"metadata": {
2627
"pycharm": {
@@ -29,7 +30,7 @@
2930
},
3031
"source": [
3132
"In this example we will be running an end-to-end machine learning workload with US census data from 1970 to 2010.\n",
32-
"It uses Intel® Distribution of Modin with Ray as backend compute engine for ETL, and uses Ridge Regression algorithm from Intel scikit-learn-extension library to train and predict the co-relation between US total income and education levels."
33+
"It uses Intel® Distribution of Modin with HDK (Heterogeneous Data Kernels) as backend compute engine for ETL, and uses Ridge Regression algorithm from Intel scikit-learn-extension library to train and predict the co-relation between US total income and education levels."
3334
]
3435
},
3536
{
@@ -73,14 +74,15 @@
7374
]
7475
},
7576
{
77+
"attachments": {},
7678
"cell_type": "markdown",
7779
"metadata": {
7880
"pycharm": {
7981
"name": "#%% md\n"
8082
}
8183
},
8284
"source": [
83-
"Import Modin and set Ray as the compute engine. This engine uses analytical database OmniSciDB to obtain high single-node scalability for specific set of dataframe operations. "
85+
"Import Modin and set HDK as the compute engine. This engine provides a set of components for federating analytic queries to an execution backend based on OmniSciDB to obtain high single-node scalability for specific set of dataframe operations. "
8486
]
8587
},
8688
{
@@ -97,16 +99,7 @@
9799
"import modin.pandas as pd\n",
98100
"\n",
99101
"import modin.config as cfg\n",
100-
"from packaging import version\n",
101-
"import modin\n",
102-
"\n",
103-
"cfg.IsExperimental.put(\"True\")\n",
104-
"cfg.Engine.put('native')\n",
105-
"# Since modin 0.12.0 OmniSci engine activation process slightly changed\n",
106-
"if version.parse(modin.__version__) <= version.parse('0.11.3'):\n",
107-
" cfg.Backend.put('omnisci')\n",
108-
"else:\n",
109-
" cfg.StorageFormat.put('omnisci')\n"
102+
"cfg.StorageFormat.put('hdk')\n"
110103
]
111104
},
112105
{
@@ -288,13 +281,23 @@
288281
"mean MSE ± deviation: 0.032564569 ± 0.000041799\n",
289282
"mean COD ± deviation: 0.995367533 ± 0.000005869"
290283
]
284+
},
285+
{
286+
"cell_type": "code",
287+
"execution_count": null,
288+
"metadata": {},
289+
"outputs": [],
290+
"source": [
291+
"# release resources\n",
292+
"%reset -f"
293+
]
291294
}
292295
],
293296
"metadata": {
294297
"kernelspec": {
295-
"display_name": "Python 3",
298+
"display_name": "modin-hdk",
296299
"language": "python",
297-
"name": "python3"
300+
"name": "modin-hdk"
298301
},
299302
"language_info": {
300303
"codemirror_mode": {
@@ -306,7 +309,7 @@
306309
"name": "python",
307310
"nbconvert_exporter": "python",
308311
"pygments_lexer": "ipython3",
309-
"version": "3.7.11"
312+
"version": "3.9.16"
310313
}
311314
},
312315
"nbformat": 4,

AI-and-Analytics/End-to-end-Workloads/Census/sample.json

+5-3
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,12 @@
1616
"steps": [
1717
"set -e # Terminate the script on first error",
1818
"source $(conda info --base)/etc/profile.d/conda.sh # Bypassing conda's disability to activate environments inside a bash script: https://github.com/conda/conda/issues/7980",
19-
"conda create -y -n intel-aikit-modin intel-aikit-modin -c intel",
20-
"conda activate intel-aikit-modin",
19+
"conda create -n modin-hdk python=3.9 -y",
20+
"conda activate modin-hdk",
21+
"conda install modin-hdk -c conda-forge -y",
2122
"conda install -y jupyter # Installing 'jupyter' for extended abilities to execute the notebook",
22-
"pip install opencensus # Installing 'runipy' for extended abilities to execute the notebook",
23+
"pip install scikit-learn-intelex # Installing Intel® Extension for Scikit-learn*",
24+
"pip install matplotlib",
2325
"jupyter nbconvert --to notebook --execute census_modin.ipynb"
2426
]
2527
}

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/clean.sh

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
#!/bin/bash
2+
13
rm -R RIRS_NOISES
24
rm -R tmp
35
rm -R speechbrain

0 commit comments

Comments
 (0)