This may be a framing gap, but I work with inference costs daily and don’t really see them trending downward in practice as models advance.
When you reference declining prices and a ~10×/year drop in inference cost, what layer of cost are you using? Token list prices, provider economics, or effective cost per task for users?
From an operational perspective, those seem to pull in different directions, especially as newer models require more reasoning and runtime. I’m interested in how you’re integrating that into your view.
I was referencing list prices, which is the layer most visible to CTOs evaluating vendor contracts. But your point stands about operational cost per outcome being flat or rising even as the per-token sticker price falls.
That’s actually a more interesting story for technical leaders: the token is getting cheaper, but we’re using more of them to do more sophisticated tasks.
This may be a framing gap, but I work with inference costs daily and don’t really see them trending downward in practice as models advance.
When you reference declining prices and a ~10×/year drop in inference cost, what layer of cost are you using? Token list prices, provider economics, or effective cost per task for users?
From an operational perspective, those seem to pull in different directions, especially as newer models require more reasoning and runtime. I’m interested in how you’re integrating that into your view.
I was referencing list prices, which is the layer most visible to CTOs evaluating vendor contracts. But your point stands about operational cost per outcome being flat or rising even as the per-token sticker price falls.
That’s actually a more interesting story for technical leaders: the token is getting cheaper, but we’re using more of them to do more sophisticated tasks.