MAX Engine

Modular MAX engine patterns for AI inference and high-performance Mojo compute graphs

Mojo

framework

Use in Builder All Mojo Options

Details

Language / Topic

Mojo

Rules

MAX Engine · balanced balanced

- Load models with `engine.InferenceSession` and call `session.execute(model_path, inputs)` — never use raw Mojo tensor ops for inference when MAX handles the graph.

- Pass inputs as `max.engine.TensorMap` with named keys matching the model's input tensor names — mismatched names raise a runtime error.

- Use `MAX_DEVICE=cpu` or `MAX_DEVICE=gpu` environment variables to select compute device at runtime rather than hardcoding device strings.

- Compile models once with `max build` and cache the `.maxrt` artifact — recompiling on every run adds startup latency to production inference pipelines.

- Use `session.execute_async` for non-blocking inference in server contexts — it returns a future that integrates with async Mojo or Python event loops.

- Validate tensor shapes against the model's declared input shapes before calling `execute` to surface shape mismatches with clear error messages early.

- Use `TensorSpec` to declare expected dtypes and ranks when building preprocessing pipelines — this documents the contract between preprocessing and the model.