Python + ML/Data Science

You are building a machine learning project with Python 3.12+, using scikit-learn and/or PyTorch for modeling, pandas/polars for data processing, and MLflow or Weights & Biases for experiment tracking. Reproducibility and clean separation between data, features, and modeling code are primary concerns.

Code Style & Structure

- Use full type hints on all function signatures. Document tensor/array shapes in docstrings: `# shape: (batch, seq_len, hidden_size)`.

- Follow PEP 8. Format with `ruff format`. Lint with `ruff check --select ALL`.

- Avoid magic numbers. Define all hyperparameters in a config dataclass or YAML file. Never hardcode learning rates, batch sizes, or architecture parameters in model files.

- Use `dataclasses` or Pydantic v2 `BaseModel` for configuration objects. Load configs at runtime.

Project Structure

```

data/

raw/ # Original, immutable input data — never modified

processed/ # Cleaned, feature-engineered, ready-for-model data

src/

data/

loaders.py # Dataset loading, splitting (train/val/test)

transforms.py # Feature engineering, preprocessing pipelines

validation.py # Schema validation with pandera

models/

architectures.py # Model class definitions

training.py # Training loop, optimizer setup, scheduling

evaluation.py # Metrics, confusion matrix, threshold analysis

features/ # Feature selection, importance analysis

utils/

reproducibility.py # Seed setting, deterministic flag configuration

logging.py # Structured logging, MLflow run context

configs/ # YAML experiment configs per model variant

notebooks/ # Exploration only — no production code

tests/

models/ # Saved checkpoints, ONNX exports

```

Data Processing

- Validate data at every pipeline entry point: check dtypes, null percentages, value ranges, and row counts.

- Use `pandera` or `great_expectations` schemas to assert data contracts. Fail fast on schema drift.

- Write idempotent pipelines — running the same pipeline twice on the same input must produce identical output.

- Log data statistics at each transformation step: shape, null counts, min/max, mean/std.

- Store intermediate results as Parquet files. Parquet preserves dtypes and is 10–20x faster to read than CSV.

- Use `sklearn.pipeline.Pipeline` to encapsulate preprocessing + model into a single estimator. It prevents data leakage from fit state during cross-validation.

Model Development

- Implement the `fit/predict` contract for custom estimators. Subclass `sklearn.base.BaseEstimator` and `TransformerMixin` or `ClassifierMixin`.

- Keep model architectures (PyTorch `nn.Module`) separate from training loops. Training logic lives in `training.py`.

- Implement early stopping based on validation loss with a patience counter. Never train for a fixed number of epochs without a stopping criterion.

- Use cross-validation (`StratifiedKFold` for classification, `KFold` for regression) for model selection. Use the held-out test set only for final evaluation — never for model selection.

- Save checkpoints including model state, optimizer state, epoch number, and best validation metric.

- Use `torch.inference_mode()` (not `no_grad()`) for evaluation and inference. It disables gradient tracking more completely.

Experiment Tracking

- Track every experiment run. Never run untracked experiments — use MLflow `mlflow.autolog()` or explicit `mlflow.log_params/metrics/artifacts`.

- Log: all hyperparameters, dataset version (hash or DVC tag), git commit hash, per-epoch metrics, and final model artifact.

- Use descriptive run names: `f"resnet50-lr{lr}-bs{batch_size}-{datetime.now():%Y%m%d}"`.

- Register production-ready models in the MLflow Model Registry with stage transitions (Staging → Production).

- Store confusion matrices, ROC curves, and feature importance plots as run artifacts.

Reproducibility

- Set seeds for all random sources: `random.seed(42)`, `np.random.seed(42)`, `torch.manual_seed(42)`, `torch.cuda.manual_seed_all(42)`.

- Set `torch.backends.cudnn.deterministic = True` and `torch.backends.cudnn.benchmark = False` when exact reproducibility is required (costs ~10% throughput).

- Pin all dependency versions in `pyproject.toml` with exact versions. Use `uv lock` or `pip-compile` for lockfiles.

- Use DVC to version datasets. Tag dataset versions in experiment logs.

Testing

- Unit-test data transformations with `pytest`. Assert output shape, dtype, and no-null guarantees.

- Smoke-test the full training pipeline: train for 2 batches on synthetic data. Verify loss decreases.

- Assert model output shapes and dtypes for every model variant.

- Test that serialization round-trips are correct: save → load → predict → same results.

- Run tests in CI on every commit with `pytest --tb=short`.

Performance

- Profile data loading bottlenecks with `torch.utils.data.DataLoader(num_workers=4, prefetch_factor=2, pin_memory=True)` for GPU training.

- Use `torch.compile(model)` (PyTorch 2.0+) for ~20–30% training speedup on modern hardware.

- Use mixed precision training with `torch.cuda.amp.autocast()` and `GradScaler` to reduce memory usage and speed up GPU compute.

- Profile with `torch.profiler` to identify kernel bottlenecks. Use `torch.utils.bottleneck` for quick high-level profiling.

Related Templates

Python + FastAPI

High-performance Python API development with FastAPI, Pydantic, and async patterns.

Python + Django

Django web development with class-based views, ORM best practices, and DRF.

Python + Flask

Lightweight Python web development with Flask and extensions.

Back to Templates

Python + ML/Data Science

Rules for ML and data science projects covering experiment tracking, model development, and reproducibility.

Python/

PyTorch

Details

Language

Python

Framework

PyTorch

Rules Content

AGENTS.md

Edit in Builder

Python Machine Learning Agent Rules

Project Context

Code Style & Structure

- Use full type hints on all function signatures. Document tensor/array shapes in docstrings: `# shape: (batch, seq_len, hidden_size)`.

- Follow PEP 8. Format with `ruff format`. Lint with `ruff check --select ALL`.

- Avoid magic numbers. Define all hyperparameters in a config dataclass or YAML file. Never hardcode learning rates, batch sizes, or architecture parameters in model files.

- Use `dataclasses` or Pydantic v2 `BaseModel` for configuration objects. Load configs at runtime.

Project Structure

```

data/

raw/ # Original, immutable input data — never modified

processed/ # Cleaned, feature-engineered, ready-for-model data

src/

data/

loaders.py # Dataset loading, splitting (train/val/test)

transforms.py # Feature engineering, preprocessing pipelines

validation.py # Schema validation with pandera

models/

architectures.py # Model class definitions

training.py # Training loop, optimizer setup, scheduling

evaluation.py # Metrics, confusion matrix, threshold analysis

features/ # Feature selection, importance analysis

utils/

reproducibility.py # Seed setting, deterministic flag configuration

logging.py # Structured logging, MLflow run context

configs/ # YAML experiment configs per model variant

notebooks/ # Exploration only — no production code

tests/

models/ # Saved checkpoints, ONNX exports

```

Data Processing

- Validate data at every pipeline entry point: check dtypes, null percentages, value ranges, and row counts.

- Use `pandera` or `great_expectations` schemas to assert data contracts. Fail fast on schema drift.

- Write idempotent pipelines — running the same pipeline twice on the same input must produce identical output.

- Log data statistics at each transformation step: shape, null counts, min/max, mean/std.

- Store intermediate results as Parquet files. Parquet preserves dtypes and is 10–20x faster to read than CSV.

- Use `sklearn.pipeline.Pipeline` to encapsulate preprocessing + model into a single estimator. It prevents data leakage from fit state during cross-validation.

Model Development

- Implement the `fit/predict` contract for custom estimators. Subclass `sklearn.base.BaseEstimator` and `TransformerMixin` or `ClassifierMixin`.

- Keep model architectures (PyTorch `nn.Module`) separate from training loops. Training logic lives in `training.py`.

- Implement early stopping based on validation loss with a patience counter. Never train for a fixed number of epochs without a stopping criterion.

- Use cross-validation (`StratifiedKFold` for classification, `KFold` for regression) for model selection. Use the held-out test set only for final evaluation — never for model selection.

- Save checkpoints including model state, optimizer state, epoch number, and best validation metric.

- Use `torch.inference_mode()` (not `no_grad()`) for evaluation and inference. It disables gradient tracking more completely.

Experiment Tracking

- Track every experiment run. Never run untracked experiments — use MLflow `mlflow.autolog()` or explicit `mlflow.log_params/metrics/artifacts`.

- Log: all hyperparameters, dataset version (hash or DVC tag), git commit hash, per-epoch metrics, and final model artifact.

- Use descriptive run names: `f"resnet50-lr{lr}-bs{batch_size}-{datetime.now():%Y%m%d}"`.

- Register production-ready models in the MLflow Model Registry with stage transitions (Staging → Production).

- Store confusion matrices, ROC curves, and feature importance plots as run artifacts.

Reproducibility

- Set seeds for all random sources: `random.seed(42)`, `np.random.seed(42)`, `torch.manual_seed(42)`, `torch.cuda.manual_seed_all(42)`.

- Set `torch.backends.cudnn.deterministic = True` and `torch.backends.cudnn.benchmark = False` when exact reproducibility is required (costs ~10% throughput).

- Pin all dependency versions in `pyproject.toml` with exact versions. Use `uv lock` or `pip-compile` for lockfiles.

- Use DVC to version datasets. Tag dataset versions in experiment logs.

Testing

- Unit-test data transformations with `pytest`. Assert output shape, dtype, and no-null guarantees.

- Smoke-test the full training pipeline: train for 2 batches on synthetic data. Verify loss decreases.

- Assert model output shapes and dtypes for every model variant.

- Test that serialization round-trips are correct: save → load → predict → same results.

- Run tests in CI on every commit with `pytest --tb=short`.

Performance

- Profile data loading bottlenecks with `torch.utils.data.DataLoader(num_workers=4, prefetch_factor=2, pin_memory=True)` for GPU training.

- Use `torch.compile(model)` (PyTorch 2.0+) for ~20–30% training speedup on modern hardware.

- Use mixed precision training with `torch.cuda.amp.autocast()` and `GradScaler` to reduce memory usage and speed up GPU compute.

- Profile with `torch.profiler` to identify kernel bottlenecks. Use `torch.utils.bottleneck` for quick high-level profiling.

Related Templates

Python + FastAPI

High-performance Python API development with FastAPI, Pydantic, and async patterns.

Python + Django

Django web development with class-based views, ORM best practices, and DRF.

Python + Flask

Lightweight Python web development with Flask and extensions.