The Agent ExaminerIndependent Review Authority

About

Methodology

Alfe is one of the platforms we cover, and it is scored by the exact same rubric as every other platform. This page publishes that rubric so our scores are auditable. See also our disclosure.

Facts vs. opinion

We separate two things. Facts— pricing, hosting model, memory support, MCP support, integrations, model providers, open-source status — are drawn from each vendor’s own public materials and carry a source link and the date we accessed it. Anything we could not confirm from a primary source is labelled Unverified rather than guessed.

Scores are our editorial opinion. They are not measurements, and (for now) they are not the result of hands-on testing — see the limitations below.

The scoring rubric

Every platform is rated 0–5 on the same six dimensions. The meaning of each score is fixed in advance and applied identically to every platform, including Alfe. A dimension reflects that specific quality only — a platform that leads on one dimension will trail on others.

Developer experience
How quickly a competent developer can get an agent running and iterate — docs, SDK ergonomics, examples, setup friction.0 = effectively undocumented or unusable; 3 = workable with some friction; 5 = fast start, strong docs, low friction.
Pricing transparency & value
How transparent and predictable pricing is, and how much you get before paying — not whether it is cheap.0 = no public pricing; 3 = published but hard to predict, or thin free tier; 5 = clear, predictable, generous free/open path.
Scalability
Whether the platform is built to run many/large/long-running agents in production without you re-architecting.0 = local/demo only; 3 = scales with manual effort; 5 = designed for production scale and concurrency.
Memory & state
First-class support for agent memory, state, and persistence across runs.0 = none; 3 = basic persistence you wire yourself; 5 = rich, built-in memory (semantic recall, durable state).
Integrations
Breadth and quality of built-in connectors, tools, and channels.0 = none; 3 = a useful core set; 5 = broad, well-maintained catalogue.
MCP nativeness
How natively the platform speaks the Model Context Protocol (MCP), not just whether a workaround exists.0 = no MCP; 3 = MCP via a separate adapter/library; 5 = MCP is a first-class, built-in capability.

The overall score shown on a dossier is the simple mean of these six dimensions, to one decimal place. It is a summary of our opinion, not a verdict — read the full dossier.

No thumb on the scale

Every platform is scored by the same rubric above, including Alfe, and no platform is hard-coded to top any category or win any head-to-head. Our “best-of” rankings are built from transparent, objective criteria that are printed on each page and shown on every row; a platform appears where the data places it. Where Alfe does not lead a dimension, it does not.

Comparative statements about other platforms are either factual and sourced, or clearly labelled as our opinion. We identify products by name only and are not affiliated with or endorsed by them.

Current limitations

These dossiers are built from documentation review, not yet hands-on testing. Every platform currently carries an untested status and has no testing notes. A later round of work will run each platform directly and validate — or correct — these scores. Until then, treat the scores as an informed editorial read, and rely on the sourced facts for anything load-bearing.

When we do test, each dossier will record what we tried, the version and date, and what we observed, so a reader can reproduce it.