We assume that you have access to an Intrepydd release with the pyddc command, along with a standard Python environment.
The recommended use of the Intrepydd v0.2 release in implementing a data analytics workflow is as follows:
- Create a pure Python implementation of the workflow, using standard libraries such as NumPy, CombBLAS, and PyTorch.
- Use a standard Python profiler to identify the performance-critical code regions of the Python implementation.
- Select a performance-critical code region in the Python code that is a promising candidate to convert to Intrepydd code. (If the Python code uses a library that is not supported by Intrepydd, then you will need to implement the functionality of that library yourself, e.g. by using for/pfor loops.)
- Insert calls to evaluate the Energy-Delay-Squared goal metric, in Joules-Seconds^2, for the core computation in the pure Python implementation (ignoring initialization, data input, and data output).
- Move the performance-critical function to a new function in a single Intrepydd file for the workflow (.pydd extension).
- Add type declarations for function parameters and return values to the new Intrepydd function; sometimes, additional type declarations may be needed for some of the internal assignment statements in the function.
- Compile the .pydd file with the pyddc command to automatically generate C/C++ code from the .pydd file.
- Execute the new Python main program with calls to the optimized Intrepydd code, and record its new Energy-Delay-Squared goal metric (in Joules-Seconds^2).
- Try improving the performance of the code in the .pydd file by algorithmic improvements, parallelization and locality optimizations suggested in the page on Code optimization techniques.
- Repeat steps 7-9 for the current performance-critical code region.
- Repeat steps 2-10 for additional performance-critical code regions.