- Profile code with `profile on; myFunc(); profile viewer` before optimizing — the Profile Viewer identifies the actual bottleneck rather than guessing.
- Replace element-wise `for` loops with vectorized matrix operations: `C = A + B` instead of `for i=1:n; C(i) = A(i) + B(i); end`.
- Use `parfor` loops from the Parallel Computing Toolbox for embarrassingly parallel iterations — drop-in replacement for `for` when iterations are independent.
- Pre-allocate output arrays with `zeros`, `ones`, or `cell(m,n)` before filling in a loop — MATLAB's JIT optimizes pre-allocated loops far better than dynamically grown arrays.
- Use `single` precision instead of `double` for large arrays when 32-bit accuracy is sufficient — it halves memory usage and improves cache performance.
- Use `bsxfun` (pre-R2016b) or implicit broadcasting (R2016b+) to avoid creating large intermediate arrays with `repmat` in element-wise operations.
- Cache expensive computations inside functions with `persistent` variables — they survive between calls without requiring global state.