Pipeline for machine learning binary problems, wrote completely with numpy.
- Pipeline: Class used to create a pipe:
setStages()
: set an array of PipelineStageaddStages()
: add an array of PipelineStagefit()
: get DTR (nFeature, nSample) and LTR (nSample,), it will return amodel
. It call iterativelycompute()
of a PipelineStage
- PipelineStage: Interface for a stage of a Pipeline
compute()
: main method to start the stage
- VoidStage: A stage that doesn't do anything
- Model: Interface of a model
transform()
: get DTE (nFeature, nSample) and LTE (nSample,), it will return the scores. It call iterativelycompute()
of a PipelineStage of preprocessing stages.
- CrossValidator: Class used to do CrossValidation
setEstimator()
: set a PipelinesetNumFolds()
: set the number of Foldsfit()
: get DTR and LTR, and it creates k-folds randomly from them, then it calls for each foldfit()
of the Pipeline. At the end it returns the scores of the DTR.
- MVG
(PipelineStage)
: Multivariate Gaussian - NaiveBayesMVG
(PipelineStage)
: Multivariate Gaussian (Diag) - TiedMVG
(PipelineStage)
: Multivariate Gaussian (Single Cov)setPiT()
: set the re-balancing factor
- TiedNaiveBayesMVG
(PipelineStage)
: Multivariate Gaussian (Diag Single Cov)setPiT()
: set the re-balancing factor
- LogisticRegression
(PipelineStage)
: Logistic RegressionsetLambda()
: set the Lambda factor, it is the Regulizer Factor, 0 := Overfitting, 1 := UnderfittingsetPiT()
: set the re-balancing factorsetExpanded()
: setTrue
orFalse
, if you want to use the Quadratic or Linear Model
- SVM
(PipelineStage)
: Support Vector MachinesetK()
: set the K factor, usually 1setC()
: set the C factor, 0 := Big Margin, 1 := Small MarginsetPiT()
: set the re-balancing factorsetPolyKernel()
: use a Polynmial Kernel, set c factor and d (degree)setRBFKernel()
: use a RBG Kernel, set Gamma factorsetNoKern()
: no Kernel
- GMM
(PipelineStage)
: Gaussian Mixture Model Clustering model used for ClassificationsetDiagonal()
: use Diagonal Matrices of GMM densitysetTied()
: use same matrice for all components of GMMsetIterationLBG()
: set the number of iteration of LBG, the final number of component is a power of 2 of the iterationsetAlpha()
: set alpha factor, for LBG algorithm, it's the rescaling factor for the new starting points of the components at each iteration of LBGsetPsi()
: set psi factor, used to avoid the problem of generative solutions, it's a limitation on the variation of the covatiance matrices of a GMM component
All the Class here implement the Model Interface, the main method is tranform()
All the pre-processing stage, are PipelineStage, the main method is compute()
- PCA
- LDA
- ZNorm
- L2Norm
- Gaussianization
Lots of usefull tools, used inside the whole project