mborn1
diff --git a/‎docs/source/_docs/analysis/customization.rst
Lines changed: 213 additions & 0 deletions b/‎docs/source/_docs/analysis/customization.rst
Lines changed: 213 additions & 0 deletions
diff --git a/‎src/pyoptex/analysis/estimators/sams/entropy.py
Lines changed: 5 additions & 137 deletions b/‎src/pyoptex/analysis/estimators/sams/entropy.py
Lines changed: 5 additions & 137 deletions
@@ -72,3 +72,216 @@ only three attributes must be set:
 * selection_metrics\_ : The values of the selection metric as a 1-D numpy floating point array. a
   higher selection metric indicates a better model.
 * metric_name\_ : The name of the selection metric as a string. Used for interpretation.
+
+.. _a_cust_sams:
+
+Simulated annealing model selection (SAMS)
+------------------------------------------
+
+Simulated annealing model selection, or SAMS, was devised by
+`Wolters and Bingham (2012) <https://www.tandfonline.com/doi/abs/10.1198/TECH.2011.08157>`_.
+It is a model selection algorithm, which instead of looking at the statistical
+significance, like is most commonly used, simulates multiple models and looks at what the
+good fitting models have in common. The algorithm works in three stages:
+
+* The simulation stage: here the algorithm simulates many models of a fixed size using simulated annealing,
+  and sorts them by their
+  :math:`R^2`. Commonly it simulates 10000 or 20000 models, however it depends on the
+  problem at hand.
+* The reduction stage: here the algorithm takes the simulated models and looks what the
+  most common 1-factor, 2-factor, 3-factor, etc. combinations are. In other words, it looks
+  at which submodel of size k occurs most frequently in the good fitting models.
+* The selection stage: here the algorithm takes the most occuring submodels of each size and
+  compares them to determine an ordering. The ordering is based on the entropy which is explained
+  later.
+
+As you may notice, the algorithm does not output just one model. It outputs multiple models,
+ordered by which model is has the most confidence in. The last two stages of the algorithm
+use the result of the first stage to automatically determine an ordering, however, the user
+may also manually look at a raster plot of the results which looks as follows:
+
+
+Each row is a model, each column is a potential term in the model, and the color indicates the
+coefficient of the term. This means that any term not in the model has a coefficient of zero, which is
+plotted in white. By looking at largely colored columns, we can determine which
+submodels occur most often.
+
+.. note::
+  In some events, multiple distinct models may perform equally well. Such a scenario is
+  difficult to detect in the raster plot, and also by the entropy criterion. Luckily, we
+  can also cluster the results in the raster plot making them more visible. The different
+  terms in each model are binary encoded if the effect is present or not. On this representation,
+  a kmeans clustering is run. Such a scenario 
+  is investigated in `Wolters and Bingham (2012) <https://www.tandfonline.com/doi/abs/10.1198/TECH.2011.08157>`_.
+
+See :py:class:`SamsRegressor <pyoptex.analysis.estimators.sams.estimator.SamsRegressor>` for information
+on the parameters.
+
+
+
+Entropy calculations
+^^^^^^^^^^^^^^^^^^^^
+
+The entropy is the most effective addition of the algorithm to perform automated model
+selection. The entropy is computed as
+
+.. math::
+
+    e = f_{o} * log_2(f_{o} / f_{t}) + (1 - f_{o}) * log_2((1 - f_{o}) / (1 - f_{t}))
+
+where :math:`f_{o}` is the observed frequency of the submodel in the simulation phase, and
+:math:`f_{t}` is the theoretical frequency this submodel would occur when randomly sampling
+hereditary models.
+
+In `Wolters and Bingham (2012) <https://www.tandfonline.com/doi/abs/10.1198/TECH.2011.08157>`_,
+the authors performed some simulations on screening designs for different model selection algorithms.
+The oracle method requires prior knowledge about the true model. Each term is tested for significance.
+The AICc method comes from Akaike's Information Criterion (corrected). The authors noted that
+a search through the hereditary models was performed, from which the best according to the AICc was
+selected. This, together with the Bayes Information Criterion (BIC) is commonly applied in practice.
+The last method is the new SAMS method with entropy selection.
+
+.. list-table:: Part of the simulations results from Wolters and Bingham (2012)
+  :align: center
+  :widths: 1 1 1 1 1
+
+  * - Method 
+    - Correct
+    - Underfitted
+    - Overfitted
+    - (Partialy) Wrong
+  * - Oracle
+    - 62.8
+    - 37.2
+    - 0
+    - 0
+  * - AICc
+    - 7.2
+    - 0.7
+    - 53.8
+    - 38.3
+  * - SAMS
+    - 43.3
+    - 16.2
+    - 15.8
+    - 24.7
+
+The SAMS method with entropy significantly outperforms any other method with 43.3% of models
+found to be correct. In addition, the oracle method, which has prior knowledge about the true
+model, also only found 62.8% of the models. AICc only found about 7.2% of the models making it 
+not very suitable for this kind of scenario.
+  
+.. _samplers_sams:
+
+There is one downside to the entropy criterion. Only in the specific case where the model
+is a (partial) response surface model with weak heredity can :math:`f_{t}`
+be computed exactly. To make sure the algorithm is generic enough, a fallback was implemented
+to compute an approximation of the entropy using a model sampler. Three different
+samplers are implemented:
+:py:func:`sample_model_dep_onebyone <pyoptex.utils.model.sample_model_dep_onebyone>`,
+:py:func:`sample_model_dep_mcmc <pyoptex.analysis.estimators.models.model.sample_model_dep_mcmc>`
+and :py:func:`sample_model_dep_random <pyoptex.utils.model.sample_model_dep_random>`.
+
+For each of these samplers, we ran similar simulations to
+`Wolters and Bingham (2012) <https://www.tandfonline.com/doi/abs/10.1198/TECH.2011.08157>`_.
+We start from a PB12 design (Plackett-Burman). Next, we generate a random hereditary model
+by sampling 1 to 4 main effects, :math:`n_{main}`, and sequentially sampling :math:`4 - n_{main}`
+interaction effects. Note that this is a weak heredity submodel of a partial response surface
+design where each factor has linear effects and two-factor interactions. 
+
+The results are
+
+.. list-table:: Simulations of different entropy approximations
+  :align: center
+  :widths: 1 1 1 1 1
+
+  * - Method 
+    - Correct
+    - Underfitted
+    - Overfitted
+    - (Partialy) Wrong
+  * - Exact entropy
+    - 43.7
+    - 30.3
+    - 10.5
+    - 15.5
+  * - One-by-one
+    - 37.3
+    - 12.6
+    - 23.8
+    - 26.3
+  * - Markov-chain Monte carlo (mcmc)
+    - 38.8
+    - 12.3
+    - 23.3
+    - 25.6
+  * - Random 
+    - 36.8
+    - 10.1
+    - 26.1
+    - 27.0
+
+The first row is the exact entropy method as used in
+`Wolters and Bingham (2012) <https://www.tandfonline.com/doi/abs/10.1198/TECH.2011.08157>`_.
+Note that all three samplers, even though they perform worse than the exact entropy based on the percentage
+of correct models, still perform significantly better than AICc. When loosing the classification by
+also classiying models underfitted or overfitted by one term as correct, the exact entropy method
+has 61.1% accuracy, the one-by-one has 59.1%, the mcmc has 59.3%, and the random has 56.5%.
+
+By default, the one-by-one 
+sampler is used as it performs almost equally as good as the mcmc method, but computes faster.
+
+.. _warning_sams:
+
+.. warning::
+  The implementation of SAMS uses the samplers by default, however, the exact method
+  may be used by specifying the `entropy_model_order` parameter in
+  :py:class:`SamsRegressor <pyoptex.analysis.estimators.sams.estimator.SamsRegressor>`.
+  However, a large warning should be given to this parameter as it comes with certain
+  assertions (which are covered in many, but not all scenarios).
+
+  First, the heredity mode must be 'weak', otherwise the sampling method is still
+  applied. Second, the model must be generated using
+  :py:func:`partial_rsm_names <pyoptex.utils.model.partial_rsm_names>` followed by
+  :py:func:`model2Y2X <pyopytex.utils.model.model2Y2X>`. Third, the factors must
+  be ordered: first all factors which can have a quadratic effect, second 
+  all factors which can not have quadratic effects, but can have two-factor interactions,
+  and third all factors which can only have a main effect. Finally, the dependency
+  matrix must be generated using
+  :py:func:`order_dependencies <pyoptex.utils.model.order_dependencies>`.
+
+  As an example. Create three factors
+
+  >>> factors = [
+  >>>   Factor('A'), Factor('B'), Factor('C')
+  >>> ]
+
+  Next, create the model orders. The order of the factor names in the dictionary
+  **must** be the same as those in the list of factors. They also must be
+  ordered `quad` - `tfi` - `lin`.
+
+  >>> entropy_model_order = {'A': 'quad', 'B': 'tfi', 'C': 'lin'}
+  
+  Create the model using :py:func:`partial_rsm_names <pyoptex.utils.model.partial_rsm_names>`.
+  Note that the `quad` elements are first, then the `tfi`, and finally the `lin` elements.
+  The dictionary parameters **must** be in the same order as the factors.
+
+  >>> model = partial_rsm_names(entropy_model_order)
+  >>> Y2X = model2Y2X(model, factors)
+
+  Next, create the dependencies from the model
+
+  >>> dep = order_dependencies(model, factors)
+
+  Finally, we can fit SAMS using the exact entropy formula
+
+  >>> regr = SamsRegressor(
+  >>>     factors, Y2X, 
+  >>>     mode='weak', dependencies=dep,
+  >>>     forced_model=np.array([0], np.int\_),
+  >>>     entropy_model_order=entropy_model_order)
+  >>> )
+
+  
+
+  
@@ -4,143 +4,11 @@
 
 import numpy as np
 
-from ....utils.model import sample_model_dep
+from ....utils.model import sample_model_dep_onebyone
 from ....utils.numba import numba_int2bool
 
-from .models.model import Model
-def sample_mcmc(dep, size, forced=None, mode=None, N=1, skip=10):
-    # Create the SAMS modeller
-    m = Model(np.zeros((0, len(dep))), np.zeros((0,)), mode=mode, forced=forced, dep=dep)
-
-    # Initialize a random model
-    model = np.zeros((size,), dtype=np.int_)
-    m.init(model)
-
-    # Intialize the samples
-    samples = np.zeros((N, size), dtype=np.int_)
-
-    # Warmup phase
-    for i in range(1000):
-        m.mutate(model)
-
-    # Main sampling loop
-    for i in range(N*skip):
-        # Mutate the model
-        m.mutate(model)
-
-        # Every skip, store the result
-        if i % skip == 0:
-            samples[int(i/skip)] = model
-
-    return samples
-
-def sample_random(dep, size, forced=None, mode=None, N=1):
-    assert mode == 'weak', 'Mode must be weak'
-
-    #########################
-    # Initialize number of dependencies
-    nb_dep = np.ma.masked_where(~dep, np.zeros_like(dep, dtype=np.int_)).harden_mask()
-
-    # At the true positions in these columns, set a 1
-    affected = ~np.any(dep, axis=1)
-    nb_dep[:, affected] = 1
-    affected = np.any(dep[:, affected], axis=1)
-
-    while np.any(affected):
-        # Alter the affected positions
-        nb_dep[:, affected] = np.min(nb_dep[affected], axis=1).compressed() + 1
-        affected = np.any(dep[:, affected], axis=1)
-
-    #########################
-
-    # Initialize the models
-    models = np.zeros((N, size), dtype=np.int_)
-    models[:, :forced.size] = forced
-
-    # Fix the forced model
-    if forced is not None and forced.size > 0:
-        # Convert submodel to binary array
-        affected = model[:forced.size]
-        submodelb = np.zeros(len(dep), dtype=np.int_)
-        submodelb[affected] = 1
-        
-        # Update the model
-        nb_dep[:, affected] -= 1
-        affected = np.any(dep[:, affected], axis=1)
-        while np.any(affected):
-            # Alter the affected positions
-            nb_dep[:, affected] = np.min(nb_dep[affected], axis=1) - submodelb[affected] + 1
-            affected = np.any(dep[:, affected], axis=1)
-    
-    # Sample all models
-    for model in models:
-        # Initialize i
-        i = forced.size
-        j = forced.size
-        nb_dep_ = nb_dep.copy()
-
-        # Loop until a full model
-        while i < size:
-
-            # Compute the minimal path for each term
-            min_path = np.min(nb_dep_, axis=1).filled(0)
-
-            # Sample the first
-            choices = np.ones(len(dep), dtype=np.bool_)
-            choices[min_path >= size - i] = False # Remove those with too many dependencies
-            choices[model[:i]] = False # Remove already in the model
-            choices = np.flatnonzero(choices)
-            model[i] = np.random.choice(choices)
-
-            # TODO: purely random sampling is a problem for true sampling
-
-            # Check if already hereditary
-            if min_path[model[i]] > 0:
-                # Update with dependencies
-                choices = np.copy(dep[model[i]])
-                choices[min_path >= size - i - 1] = False
-                choices[model[:i+1]] = False
-                choices = np.flatnonzero(choices)
-
-                # Check if there are any choices
-                while choices.size != 0:
-                    # Sample a new term
-                    i += 1
-                    model[i] = np.random.choice(choices)
-
-                    # Check for heredity
-                    if min_path[model[i]] <= 0:
-                        break
-
-                    # Determine new choices
-                    choices = np.copy(dep[model[i]])
-                    choices[min_path >= size - i - 1] = False
-                    choices[model[:i+1]] = False
-                    choices = np.flatnonzero(choices)
-
-            # Increase the model size        
-            i += 1
-
-            # Convert submodel to binary array
-            affected = model[j:i]
-            submodelb = np.zeros(len(dep), dtype=np.int_)
-            submodelb[affected] = 1
-            
-            # Update the model
-            nb_dep_[:, affected] -= 1
-            affected = np.any(dep[:, affected], axis=1)
-            while np.any(affected):
-                # Alter the affected positions
-                nb_dep_[:, affected] = np.min(nb_dep_[affected], axis=1) - submodelb[affected] + 1
-                affected = np.any(dep[:, affected], axis=1)
-
-            # Set j to i for next iteration
-            j = i
-
-    return models
-
 def entropies_approx(submodels, freqs, model_size, dep, mode, 
-                     forced=None, N=10000, eps=1e-6):
+                     forced=None, N=10000, sampler=sample_model_dep_onebyone, eps=1e-6):
     """
     Compute the approximate entropy by sampling N random models
     and observing the frequency of each submodel.
@@ -177,6 +45,8 @@ def entropies_approx(submodels, freqs, model_size, dep, mode,
     N : int
         The number of random samples to draw to compute the
         theoretical frequency of a submodel.
+    sampler : func(dep, model_size, N, forced, mode)
+        The sampler to use when generating random hereditary models.
     eps : float
         A numerical stability parameter in computing the entropy.
 
@@ -186,9 +56,7 @@ def entropies_approx(submodels, freqs, model_size, dep, mode,
         An array of floats of the same length as the submodels.
     """
     # Generate random samples
-    # samples = sample_model_dep(dep, model_size, N, forced, mode)
-    # samples = sample_mcmc(dep, model_size, forced, mode, N, skip=10)
-    samples = sample_random(dep, model_size, forced, mode, N)
+    samples = sampler(dep, model_size, N, forced, mode)
 
     # Convert samples to a boolean array
     samples = numba_int2bool(samples, len(dep))