A growing body of research attempts to put a number on energy use and AI—even as the companies behind the most popular models keep their carbon emissions a secret.
Sure, you can do that at an aggregate level, but then how do you divide it by customer? And even then, some setups will be more efficient than others, so you’d only get that setup’s usage.
And even if you do that and can narrow it down to a single user and a single prompt, you can still only roughly predict how long it will think and how long the response will be.
By customer is easy: they’re each renting specific resources. A fractional cloud instance (excepting the sma burst able ones) is tied to specific CPUs and GPUs. And there are records of who rented which one when being kept already.
You might not be able to break out specific individual queries, but computing averages is completely straightforward
Sure, you can do that at an aggregate level, but then how do you divide it by customer? And even then, some setups will be more efficient than others, so you’d only get that setup’s usage.
And even if you do that and can narrow it down to a single user and a single prompt, you can still only roughly predict how long it will think and how long the response will be.
By customer is easy: they’re each renting specific resources. A fractional cloud instance (excepting the sma burst able ones) is tied to specific CPUs and GPUs. And there are records of who rented which one when being kept already.
You might not be able to break out specific individual queries, but computing averages is completely straightforward