- Profile before optimizing: use `cProfile`, `line_profiler`, or `py-spy` to find actual bottlenecks.
- Use list comprehensions and generator expressions over loops for data transformation — they're faster in CPython.
- Use `cProfile` and `line_profiler` to identify bottlenecks — move hot loops to NumPy vectorized ops or C extensions.
- Use `functools.lru_cache` or `@cache` for memoizing pure functions with repeated calls.
- Use `concurrent.futures.ThreadPoolExecutor` for I/O-bound parallelism, `ProcessPoolExecutor` for CPU-bound.
- Use `__slots__` on frequently instantiated classes to reduce memory overhead.
- Use `collections.deque` for O(1) append/pop from both ends instead of list.