Three prerequisite questions

Before selecting an AI engine: is the problem an ML problem, is the available data sufficient, and is the economic lift sufficient to justify a production model.

Categories of work

The relevant categories at Aramas are tabular ML for structured prediction, LLMs for unstructured text, and vision models for documents. Applying LLMs to all categories is not cost-effective.

insights.aiEngine.p2b

LLM selection

Per-token cost is one factor among several. Latency under load, output controllability, refusal behaviour on uncertain inputs, and observability in production are equally relevant.

Batch versus interactive

Frontier providers offer batch APIs at reduced pricing. For workloads that are not user-facing — overnight summarisation, classification, signal generation — batch is appropriate.

Self-hosted models

Open-source models are appropriate for high-volume, cost-sensitive workloads where frontier reasoning is not required.

Gateway architecture

LLM providers should not be called directly from application code. A gateway service handles routing, caching, and per-caller usage tracking, and enables provider substitution without application changes.

Default recommendation

Use a frontier API for high-value reasoning, batch APIs for non-interactive workloads, and open-source models where volume justifies operational overhead. Self-training is rarely warranted.

← Analysis.