max-engine

MAX Engine

Modular MAX engine patterns for AI inference and high-performance Mojo compute graphs

Details

Language / Topic
mojoMojo
Category
framework
Compatible Frameworks
max-engine

Rules

balanced
- Load models with `engine.InferenceSession` and call `session.execute(model_path, inputs)` — never use raw Mojo tensor ops for inference when MAX handles the graph.
- Pass inputs as `max.engine.TensorMap` with named keys matching the model's input tensor names — mismatched names raise a runtime error.
- Use `MAX_DEVICE=cpu` or `MAX_DEVICE=gpu` environment variables to select compute device at runtime rather than hardcoding device strings.
- Compile models once with `max build` and cache the `.maxrt` artifact — recompiling on every run adds startup latency to production inference pipelines.
- Use `session.execute_async` for non-blocking inference in server contexts — it returns a future that integrates with async Mojo or Python event loops.
- Validate tensor shapes against the model's declared input shapes before calling `execute` to surface shape mismatches with clear error messages early.
- Use `TensorSpec` to declare expected dtypes and ranks when building preprocessing pipelines — this documents the contract between preprocessing and the model.