BLOG

The latest news and updates from the Inference.net team.

Michael Ryaboy

Sep 9, 2025

Introducing Schematron: Structured HTML Extraction 40-80x Cheaper than GPT-5

Schematron-8B and Schematron-3B deliver frontier-level extraction quality at 1-2% of the cost and 10x+ faster inference than large, general-purpose LLMs.

Michael Ryaboy

Aug 25, 2025

Arbitraging Down LLM Inference to the Cost of Electricity

What if we allow every GPU to run serverless inference, and can verify that their LLM output is correct?

Sam Hogan

Aug 14, 2025

Introducing ClipTagger-12b: SoTA Video Understanding at 15x Lower Cost

We're thrilled to announce the release of ClipTagger-12b, a groundbreaking open-source vision-language model that delivers GPT-4.1-level performance for video understanding at a fraction of the cost.

Michael Ryaboy

Jul 31, 2025

GPU-Rich Labs Have Won: What's Left for the Rest of Us is Distillation

massive training runs and powerful but expensive models means another technique is starting to dominate: distillation

Amar Singh

Jul 29, 2025

On the Economics of Hosting Open Source Models

The open source community is buzzing around the new Wan release, but what are the economics of the businesses hosting it right now? Or hosting open source models in general?

Michael Ryaboy

Jul 24, 2025

Batch vs Real-Time LLM APIs: When to Use Each

Not every LLM request needs an immediate response. Chat interfaces need real-time. But data extraction, enrichment, and background jobs can wait hours.

Michael Ryaboy

Jul 22, 2025

Do You Need Model Distillation? The Complete Guide

Model distillation is particularly valuable in scenarios where large models are impractical due to resource constraints or performance requirements.

Michael Ryaboy

Jul 21, 2025

The Cheapest LLM Call Is the One You Don’t Await

Asynchronous requests – fire‑and‑forget calls that finish whenever idle GPUs are free.

Michael Ryaboy

May 31, 2025

Osmosis-Structure-0.6B: The Tiny Model That Fixes Structured Outputs

We're excited to announce that Osmosis-Structure-0.6B is now available on the Inference.net platform alongside our comprehensive DeepSeek R1 family.

Michael Ryaboy

May 29, 2025

How Smart Routing Saved Exa 90% on LLM Costs During Their Viral Moment

They thought of a clever solution that saved them 90% on tokens: route people with the most followers to Claude, and everyone else to dirt cheap open-source models