Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

# Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

## 📋 Summary

The MMCS (Multiphase Monte Carlo Sampling) implementation performs an expensive matrix transpose operation that creates an unnecessary full matrix copy. This impacts performance, especially for high-dimensional sampling scenarios with large sample sizes.

## 🔍 Problem Description

### Current Implementation

In `include/sampling/mmcs.hpp` (line 234) and related files, the code performs:

```cpp
MT Samples = TotalRandPoints.transpose(); //do not copy TODO!
for (int i = 0; i < total_samples; i++)
{
    Samples.col(i) = T * Samples.col(i) + T_shift;
}
S.conservativeResize(P.dimension(), total_number_of_samples_in_P0 + total_samples);
S.block(0, total_number_of_samples_in_P0, P.dimension(), total_samples) = 
    Samples.block(0, 0, P.dimension(), total_samples);
```

### The Problem

**Current Flow (Inefficient):**

```
TotalRandPoints          [EXPENSIVE COPY]        Samples              S (Final)
(n × d matrix)          transpose() creates    (d × n matrix)       (d × n matrix)
┌─────────────┐         full copy              ┌─────────────┐     ┌─────────────┐
│ Row 0: x₁   │ ──────────────────────────────>│ Col 0: x₁   │     │ Col 0: Tx₁  │
│ Row 1: x₂   │         ❌ Memory Copy          │ Col 1: x₂   │     │ Col 1: Tx₂  │
│ Row 2: x₃   │         ❌ O(n×d) operation    │ Col 2: x₃   │     │ Col 2: Tx₃  │
│    ...      │                                  │    ...      │     │    ...      │
│ Row n: xₙ   │                                  │ Col n: xₙ   │     │ Col n: Txₙ  │
└─────────────┘                                  └─────────────┘     └─────────────┘
     │                                                  │                    │
     │                                                  │                    │
     │                    Transform: T * col + shift   │                    │
     │                                                  └────────────────────┘
     │                                                          │
     └──────────────────────────────────────────────────────────┘
                    Total: 2× memory operations + transpose overhead
```

**Memory Flow:**
1. Read `TotalRandPoints` (n×d elements)
2. **Allocate new matrix** `Samples` (d×n elements) ← **WASTED MEMORY**
3. **Copy and transpose** all elements ← **EXPENSIVE OPERATION**
4. Transform each column: `T * Samples.col(i) + T_shift`
5. Copy transformed data to `S`

**Issues:**
1. **Memory Overhead**: Creates a full copy of the transposed matrix
   - For typical case: 50 dimensions × 5000 samples = 250,000 elements
   - Memory: ~2MB (double precision) wasted per iteration
   
2. **Computation Overhead**: O(n×d) transpose operation
   - Unnecessary for the actual computation needed
   
3. **Memory Bandwidth**: Doubles memory traffic
   - Read from `TotalRandPoints` → Write to `Samples` → Read from `Samples` → Write to `S`

### Matrix Dimensions

```
TotalRandPoints: (num_samples × dimension)
  - Each row represents one sample point
  - Shape: [total_samples, P.dimension()]

Samples (transposed copy): (dimension × num_samples)
  - Each column represents one sample point (transposed)
  - Shape: [P.dimension(), total_samples]

S (final result): (dimension × total_samples)
  - Each column is a transformed sample
  - Shape: [P.dimension(), total_number_of_samples_in_P0 + total_samples]
```

## 💡 Proposed Solution

### Optimized Implementation

Instead of creating a transposed copy, work directly with rows of `TotalRandPoints`:

```cpp
// Optimized: avoid transpose copy by working directly with rows
S.conservativeResize(P.dimension(), total_number_of_samples_in_P0 + total_samples);
for (int i = 0; i < total_samples; i++)
{
    // Transform each sample: T * sample + T_shift, then store as column in S
    S.col(total_number_of_samples_in_P0 + i).noalias() = 
        T * TotalRandPoints.row(i).transpose() + T_shift;
}
```

### Optimized Flow

**Optimized Flow (Efficient):**

```
TotalRandPoints                                        S (Final)
(n × d matrix)                                         (d × n matrix)
┌─────────────┐                                       ┌─────────────┐
│ Row 0: x₁   │ ──Direct: T * row(i).T + shift──────>│ Col 0: Tx₁  │
│ Row 1: x₂   │    ✅ No copy                        │ Col 1: Tx₂  │
│ Row 2: x₃   │    ✅ Single pass                     │ Col 2: Tx₃  │
│    ...      │    ✅ Efficient                       │    ...      │
│ Row n: xₙ   │                                       │ Col n: Txₙ  │
└─────────────┘                                       └─────────────┘
     │                                                       │
     └───────────────────────────────────────────────────────┘
                    Direct transformation, no intermediate copy
```

**Memory Flow:**
1. Read `TotalRandPoints.row(i)` (single row, d elements)
2. Transform: `T * row(i).transpose() + T_shift` (in-place computation)
3. Assign directly to `S.col(i)` with `noalias()` ← **NO COPY**
4. Repeat for all rows

**Key Improvement:**
- ❌ **Before**: Read → Copy → Transpose → Transform → Copy = 5 operations
- ✅ **After**: Read → Transform → Assign = 3 operations

**Benefits:**
1. ✅ **No Memory Copy**: Eliminates intermediate `Samples` matrix
2. ✅ **Direct Computation**: Transform and assign in one step
3. ✅ **Better Cache Usage**: Single pass through data
4. ✅ **Uses `noalias()`**: Prevents Eigen from creating temporaries


### References

- [Eigen Documentation - Aliasing](https://eigen.tuxfamily.org/dox/group__TopicAliasing.html)
- [Eigen Documentation - Matrix Operations](https://eigen.tuxfamily.org/dox/group__TutorialMatrixArithmetic.html)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling #413

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

📋 Summary

🔍 Problem Description

Current Implementation

The Problem

Matrix Dimensions

💡 Proposed Solution

Optimized Implementation

Optimized Flow

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling #413

Description

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

📋 Summary

🔍 Problem Description

Current Implementation

The Problem

Matrix Dimensions

💡 Proposed Solution

Optimized Implementation

Optimized Flow

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions