gpt-oss-20B, Qwen3, Gemma 3, Mistral — converted with Apple’s official recipes, unmodified, with hashes and measured benchmarks.
Core AI is the successor to Core ML, announced at WWDC 2026. Apple publishes official export recipes for a handful of models in coreai-models.
But that’s conversion scripts only — no pre-converted models are distributed (in the Core ML era, they were).
So I ran the conversions on my own machine and published the resulting .aimodel bundles for 7 models on Hugging Face:

https://huggingface.co/mlboydaisuke
Every model card includes measured benchmark numbers. There’s a sample app, too.

https://github.com/john-rocky/coreai-samples
All measurements use Apple’s official llm-benchmark (512-token prompt / 1024 generated / greedy / warm).
Why distribute pre-converted models at all?
“Anyone can produce an .aimodel by running coreai.llm.export " — true. There are still two reasons to host the artifacts.
1. Conversion is heavy; inference is light
The export needs a lot of RAM (the gpt-oss-20B conversion ran on a 128 GB Mac).
Running, on the other hand, only needs enough memory to mmap the artifact. With pre-converted bundles hosted, you can try a model on the machine you actually run it on.
Bonus: exporting Mistral-7B yourself means a 27 GB download (the source repo ships a duplicate consolidated.safetensors). Pulling the 4.1 GB converted bundle is dramatically faster.
2. An.aimodel is a build artifact, not a pure function of the recipe
This is the real reason.
The same export command, the same code, the same wheels produced a 2.2× slower artifact when the exporting OS changed from macOS 26 to 27β (Qwen3–0.6B: 1,121 → 484 tok/s).
The macOS 26 artifact carries the native quantized-Linear lowering (zero dequant ops in the program); the 27β re-export lowers to explicit dequant. Details in the forensics write-up in my benchmark repo.
In other words, “the recipe is public” does not guarantee reproducibility. Hosted artifacts with hashes are the reproducible ground truth.
Every published bundle is hash-identical to the one measured in apple-silicon-llm-bench (I verified the round trip: re-downloaded gpt-oss-20B from Hugging Face and confirmed the SHA-256 of main.mlirb matches the value on the model card).
The Qwen3–0.6B repo also ships the “fast one” — the macos-26-export bundle is included as-is.
How to use them
Download
hf download mlboydaisuke/gpt-oss-20b-CoreAI-official
From a Swift app (via FoundationModels)
import FoundationModels
import CoreAILanguageModels
let model = try await CoreAILanguageModel(resourcesAt: modelURL) // → the macos/ folderlet session = LanguageModelSession(model: model)
let response = try await session.respond(to: "What is quantum computing?")
CLI (from a coreai-models checkout)
swift run -c release llm-runner --model <bundle>/macos --prompt "Hello"
swift run -c release llm-benchmark --model <bundle>/macos
GUI
In CoreAIChatMac, just point “Choose Models Folder…” at the download directory.
A note on the iOS bundles (Qwen3 0.6B / 4B)
iOS cannot JIT the exported IR, so AOT compilation is mandatory before the model runs on device:
xcrun coreai-build compile <ir>.aimodel \
--platform iOS --preferred-compute neural-engine --architecture h18p
# h18p = iPhone 17 Pro. Then point assets.main in metadata.json at the .aimodelc
I covered these gotchas in my Core AI vs MLX vs CoreML benchmark.
Notes on gpt-oss-20B
- OpenAI’s shipped MXFP4 quantization passes straight through (no additional quantization; the conversion takes ~3 minutes)
- On an M4 Max: 78 tok/s decode / 1,252 tok/s prefill / 2.1 s warm load / 33.9 GB peak RSS
COREAI_CHUNK_THRESHOLDacts as a prefill-memory dial on this MoE: a 4,096-token prefill runs at 1,439 tok/s with an 18 GB dirty footprint unchunked, or 766 tok/s at 1.7 GB with chunk-128
Related repos:
- 📊 Benchmark (methodology, raw JSONL, Swift adapters — all public): https://github.com/john-rocky/apple-silicon-llm-bench
- 🧰 Community models: https://github.com/john-rocky/coreai-model-zoo
- 📱 Sample apps: https://github.com/john-rocky/coreai-samples
🐣
I’m a freelance engineer writing about AI.
If any of this sounds familiar, feel free to reach out:
“I want to build an AI app.”
“We built an AI feature, but we’re not sure how to run it in production.”
I take on both at prices without the middleman markup.
Work inquiries: rockyshikoku@gmail.com