_ __ _ (_) / _| | | _ _ __ | |_ ___ _ __ ___ _ __ ___ ___ _ __ ___| |_ | | '_ \| _/ _ \ '__/ _ \ '_ \ / __/ _ \ | '_ \ / _ \ __| | | | | | || __/ | | __/ | | | (_| __/_| | | | __/ |_ |_|_| |_|_| \___|_| \___|_| |_|\___\___(_)_| |_|\___|\__|
inference.net is a wholesaler of LLM inference tokens for models like Llama 3.1. We provide inference batch and streaming inference APIs at a 50-90% discount from what you would pay together.ai or groq. We can currently generate ~100B tokens per day.
Are you a researcher? Click here.
There is less of a GPU shortage than you have been led to believe. Data centers have underutilized capacity, but it comes in a shape that most orchestration software is not capable
of using; a few minutes here, a few hours there.
Once those unused minutes have passed, they can never be reclaimed. Like a stock
option that is about to expire, unused compute becomes less valuable as it approaches
its expiration date. Few customers need just a few minutes of compute time, making
these fragments challenging to sell conventionally.
To solve this, we built custom scheduling and orchestration software that aggregates
these small chunks across data centers to run AI models on compute that would
otherwise go unused. Since we are the only purchaser of this compute, we are
able to buy at a steep discount from data centers and pass those savings on to
you.
We believe LLM inference is a new form of commodity that will quickly
outgrow the rest of the compute market by orders of magnitude. Inference
will trade more like electricity or oil than like other forms of compute
currently trade. In order to maximize the value of this new commodity, a
market will emerge, where inference producers (read: data centers) compete
to give developers the best deal.
We aim to accelerate this process.
curl -N https://api.inference.net/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_API_KEY>" \
-d '{
"model": "llama3",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the meaning of life?"
}
],
"stream": true
}'
Email us: [email protected]
© 2024 inference.net