gpt-oss-20B, Qwen3, Gemma 3, Mistral — converted with Apple’s official recipes, unmodified, with hashes and measured benchmarks.

Core AI is the successor to Core ML, announced at WWDC 2026. Apple publishes official export recipes for a handful of models in coreai-models.

But that’s conversion scripts only — no pre-converted models are distributed (in the Core ML era, they were).

So I ran the conversions on my own machine and published the resulting .aimodel bundles for 7 models on Hugging Face:

https://huggingface.co/mlboydaisuke

Every model card includes measured benchmark numbers. There’s a sample app, too.

https://github.com/john-rocky/coreai-samples

All measurements use Apple’s official llm-benchmark (512-token prompt / 1024 generated / greedy / warm).

Why distribute pre-converted models at all?

“Anyone can produce an .aimodel by running coreai.llm.export " — true. There are still two reasons to host the artifacts.

1. Conversion is heavy; inference is light

The export needs a lot of RAM (the gpt-oss-20B conversion ran on a 128 GB Mac).

Running, on the other hand, only needs enough memory to mmap the artifact. With pre-converted bundles hosted, you can try a model on the machine you actually run it on.

Bonus: exporting Mistral-7B yourself means a 27 GB download (the source repo ships a duplicate consolidated.safetensors). Pulling the 4.1 GB converted bundle is dramatically faster.

2. An.aimodel is a build artifact, not a pure function of the recipe

This is the real reason.

The same export command, the same code, the same wheels produced a 2.2× slower artifact when the exporting OS changed from macOS 26 to 27β (Qwen3–0.6B: 1,121 → 484 tok/s).

The macOS 26 artifact carries the native quantized-Linear lowering (zero dequant ops in the program); the 27β re-export lowers to explicit dequant. Details in the forensics write-up in my benchmark repo.

In other words, “the recipe is public” does not guarantee reproducibility. Hosted artifacts with hashes are the reproducible ground truth.

Every published bundle is hash-identical to the one measured in apple-silicon-llm-bench (I verified the round trip: re-downloaded gpt-oss-20B from Hugging Face and confirmed the SHA-256 of main.mlirb matches the value on the model card).

The Qwen3–0.6B repo also ships the “fast one” — the macos-26-export bundle is included as-is.

How to use them

Download

hf download mlboydaisuke/gpt-oss-20b-CoreAI-official

From a Swift app (via FoundationModels)

import FoundationModels
import CoreAILanguageModels
let model = try await CoreAILanguageModel(resourcesAt: modelURL) // → the macos/ folderlet session = LanguageModelSession(model: model)
let response = try await session.respond(to: "What is quantum computing?")

CLI (from a coreai-models checkout)

swift run -c release llm-runner --model <bundle>/macos --prompt "Hello"
swift run -c release llm-benchmark --model <bundle>/macos

GUI

In CoreAIChatMac, just point “Choose Models Folder…” at the download directory.

A note on the iOS bundles (Qwen3 0.6B / 4B)

iOS cannot JIT the exported IR, so AOT compilation is mandatory before the model runs on device:

xcrun coreai-build compile <ir>.aimodel \
    --platform iOS --preferred-compute neural-engine --architecture h18p
# h18p = iPhone 17 Pro. Then point assets.main in metadata.json at the .aimodelc

I covered these gotchas in my Core AI vs MLX vs CoreML benchmark.

Notes on gpt-oss-20B

Related repos:

🐣

I’m a freelance engineer writing about AI.

If any of this sounds familiar, feel free to reach out:

“I want to build an AI app.”
“We built an AI feature, but we’re not sure how to run it in production.”

I take on both at prices without the middleman markup.

Work inquiries: rockyshikoku@gmail.com

X · Medium · GitHub

Powered by Forestry.md