Before 1958, buying a new car at an American dealership was an even more confusing and feared process than it is today. Customers didn't have a reliable way of knowing what the price of a car should be and what features were built into that price. Senator Mike Monroney of Oklahoma had a solution in mind: he proposed a federal law which required that a sheet of paper or sticker be placed in the window of every new car that gave buyers standardized, basic information that a fast-talking car salesman could not avoid. He justified this by saying that the measure would "do away with the advantage now held by a few unscrupulous dealers."
By the time that the bill came to a vote, the National Auto Dealers Association decided to endorse it. Monroney had been right—it was only a few unscrupulous dealers who wanted to systematically leverage confusion about their products. The rest of the industry came together to support a shared and useful standard that would apply to everyone in the marketplace.
The result was what we still know today as the Monroney sticker. It wasn't perfect at first, but it evolved over time as cars changed and customers became more savvy. What was once a short list of specs became a complete overview of the vehicle, including price comparisons, warranty details, gas mileage or electric equivalents, crash test ratings, and more.
The quantum computing industry today is in a similar place to the 1950’s auto industry. We have a hard time evaluating progress, and an even harder time understanding progress between hardware providers. Many companies share metrics but they’re not always the same metrics, and more importantly, they’re often not enough to fully understand the quality of a given computer.
We need a quantum Monroney sticker.
More Than Just Qubits
When people try to understand the performance of an emerging technology, there’s a tendency to fixate on a single metric. With cars, it was horsepower. Digital cameras are often reduced to megapixels. Early PC processors were touted for their gigahertz. While each of these single-metric benchmarks tells you something about the system, it’s far from the complete story. A car with five hundred horsepower may sound impressive, but if it has no brakes, a range of only half a mile and terrible steering then you probably wouldn't want to buy it.
To be able to understand what a commercial quantum computer can and can’t do for a customer, qubit count alone isn’t enough. Yes, a high qubit count will be important for success, but it’s far from the only thing you need. Without understanding what those qubits can do, we don’t know if we’ve got a Lamborghini or a lemon.
In other words, a system with a high qubit count has the potential to be an exciting advancement in the field, but without more information, it’s too soon to celebrate. What’s the gate fidelity? Coherence time? What’s the state preparation and measurement (SPAM) error rate? Crosstalk? How are the qubits connected? What can you actually run on this system?
As an example, let’s imagine two quantum computers; one has seven qubits, one has seven hundred. Both have an average two-qubit gate fidelity of 98%1 —i.e. every time you perform a two-qubit operation (gate), there is only a 98% chance you’ll be successful. Let's say we want to run an algorithm that uses all of the qubits in the computer and takes N² two-qubit gates where N is the number of qubits in the computer.
For every two-qubit gate the system performs, we accumulate this gate error on the final outcome. This can be calculated by multiplying our fidelity by itself for each gate we run—98% * 98% is 96%, 98% * 98% * 98% is a 94% likelihood you get the right answer at the end, and for every additional gate it goes down further. If this total “algorithmic” fidelity goes below a certain threshold, the quantum computer becomes practically worthless. We’ll use 1/e (about 37%), a number with statistical significance, for our threshold.
So, for seven qubits, you would need around forty nine entangling gates to perform this algorithm. Our infidelity accumulates as 98% * 98% * 98% * 98% * 98% * 98% (...and so-on, once for each gate) or 98%49, which is about 63%. Which means we can fully utilize all of these qubits and still get a sensible outcome better than the threshold, but just barely.
For seven hundred qubits, you would need about 490,000 gates to perform this algorithm. You might be able to guess that 98% is not a good enough two-qubit gate fidelity to beat the threshold on this many gates; 98%490000 is a number so tiny that it might as well be zero. In fact, to run this many gates you would need an effective two-qubit gate fidelity of 99.9998%, which may only be possible with error correction techniques, and is certainly a long way from the current best two-qubit gate fidelities in the market.
In fact, anything beyond 49 gates on 7 of these qubits would put you below our magic number of ~37% — the two computers are both limited by their gate fidelity at seven, the same effective number of qubits!
The Quantum Monroney Sticker
So, what else can an informed consumer ask about? What might be good specs for the Quantum Monroney Sticker, why are each important, and how do you understand each one? This overview hopefully serves not only as a recommendation, but also as a guide to decipher and interrogate metrics announcements by IonQ and other quantum hardware manufacturers.
First, consider the number of physical qubits in the system. There are some basic requirements to even be able to call a qubit a qubit, but assuming these are met, it’s as easy as counting them up.
A good way to think about qubit count is as the most optimistic view possible of a system’s performance. It represents the maximum possible power of a system: if all of your other metrics are perfect, it’s nearly all you need to know, and when theorists and algorithm designers talk about qubit requirements for certain algorithms, they’re often talking about these hypothetical “perfect” ones.
But in the messy, real world of developing quantum hardware, perfection is never the case. Isolating a quantum particle from the universe is hard, and error sneaks in from everywhere. You need more metrics to know what the qubits can really do. Additional metrics provide additional information that let you build a more accurate picture of qubit and system performance.
Two-qubit gate error
Two-qubit gate error is the error introduced by a single quantum operation (gate) between two qubits. Fidelity is the opposite of error rate—an error rate of 1% is the same as a fidelity of 99%.
Each connected two-qubit pair in a quantum computer will have a two-qubit error rate, and they can sometimes vary greatly depending on which qubits you’re measuring.
This is the second-most-often reported metric, and for good reason. As we illuminated above, entangling gates are necessary for real-world quantum computation, and every time you perform one, you introduce more potential error into your computation.
Importantly, to fully understand gate error and how it impacts computation (and most sources of error), just knowing the “best” two-qubit gate error is not enough. It’s necessary to understand the two-qubit gate error for all of the qubit pairs, especially the worst ones. The most informative summaries of this metric take the form of an average with error bars, so you know not only the average, but the possible range of values. The lowest ones will be the ones that limit overall system capability. We warn readers not to lend too much credence to other ways to present this data, such as a median, which can hide outliers.
Single-qubit gate error
Single-qubit error is very similar to two-qubit error, but instead of measuring the error introduced by a two-qubit gate, we’re measuring the error introduced by single-qubit gates (sometimes also called rotations).
There is a lot of subtlety in understanding the highly specific characteristics of single-qubit gate error; there are many things that can cause single-qubit gate error, and each must be accounted for in a different way. But for the purposes of understanding overall system performance, similar rules apply: it’s ideal to know the error rate for all qubits, and performance will be most impacted by the worst error rate in the bunch.
State Preparation and Measurement (SPAM) Error
At the beginning of every quantum (and classical) computation, you have to first correctly set your initial state, and at the end, you have to correctly measure the result2 . State preparation and measurement error measures the likelihood of a system doing this correctly.
As an analogy, imagine if pressing clear on a calculator only produced a zero most of the time, and the rest of the time, it only got you close to a zero. Or if the “readout” once you were done pressing buttons only showed the real answer most of the time. It would be much harder to trust the calculator, and without measuring this separately, you can’t know if the problem is in the calculation or the readout itself!
While state preparation and measurement has the benefit of only happening once per algorithm, so the errors don’t compound in the same way that they do for gates, they do compound in a different and more-subtle way: SPAM error is per-qubit, which means it compounds across all of the individual readout operations of all of the individual qubits in the system.
So, as systems scale, this number becomes more critical. A 1% SPAM error on a five-qubit system provides a very high likelihood that your result will be read correctly (99%5 ≈ 95%), but on a 100-qubit system, it’s wildly insufficient (99%100≈ 37%)3 .
This is sometimes also called topology, and is generally either described in the form of a picture of the layout of a system, or using characteristic descriptions like “nearest neighbor,” where qubits are lined up in rows and columns, and each one is connected to the four it’s closest to, “heavy-hex lattice,” which creates connected hexagons of qubits, and tiles these hexagons, connecting them at their vertices, creating between two and three connection points for each qubit, or “all-to-all” connectivity, which means that every qubit is connected to every other one.
Connectivity matters because if two given qubits can’t directly connect, extra “swaps” have to be inserted into the program to move the information around so that they can be virtually connected. Each of these swaps is generally multiple two-qubit gates, and adds both additional time and some amount of accumulating error just like every other kind of two-qubit gate.
Some of these restrictions can be optimized away with clever compilers and mapping, but generally speaking, the fewer qubits talking to each other, the more error a system will accumulate by needing to swap information around before it can be computed on.
T1 Time (qubit lifetime) and T2 Time (qubit coherence time)
T1 and T2 time are effectively two different ways to look at the same question: how long do the qubits stay useful for computation? T1 is concerned with qubit lifetime, asking how long you can tell the one state from the zero state, while T2 is concerned with qubit phase coherence, a more subtle aspect of qubits that is still critical for accurate computation.
If either number is too short, systems can’t perform all of the necessary computations before the qubits lose their delicate quantum information. Knowing how many computations you can perform in a given T1 or T2 time requires the next metric, gate speed.
Trapped ions have a great benefit here, as they are “natural” qubits that don’t have to be coaxed into displaying these quantum properties. Our limits in T1 and T2 time are mostly a function of our ability to precisely trap, control, and measure our qubits, not the qubits themselves. This is untrue of superconducting systems, which coax their qubits into a short-lived quantum existence that naturally “relaxes” over time. That said, superconducting systems can execute gate operations much faster than trapped ions, which leads us to our next metric.
Gate speed is a metric of how quickly a quantum computer can perform a given quantum gate. In a future world with quantum computers that can run millions or billions of gates and still produce useful answer, gate speed will become increasingly important as a raw metric of time-to-solution—even microseconds add up at that scale—but for now, it’s primarily relevant in relation to the T1 and T2 times mentioned above because, it determines how many operations can you run before the qubit loses its coherence.
Senator Monroney couldn’t have predicted electric cars in 1958, and we can’t entirely predict what advances in quantum hardware and software will bring either. This list is not exhaustive, or definitive, depending on what you're specifically trying to understand about a system.
It’s just meant to be a generally-useful and relevant set that gives you enough of a “complete” picture to have a strong sense of a system’s overall performance. Feel free to reach out and tell us what you think we should add to this list—we intend to update it over time.
A Better Way
The biggest benefit of a Quantum Monroney Sticker that exposes all of these low-level metrics is also its greatest challenge: each metric matters. They impact performance differently, they interact differently, and a quantum computer is normally limited by its worst metric, not its best. So, it’s hard to understand or evaluate any given one without all of the others, and without quite a lot of training and understanding.
More than that, many people trying to evaluate the current state of hardware don’t actually care about these low-level metrics, or even about qubit count. They only ask for them because they’ve been taught that they’re a useful proxy for the real arbiter of quantum computing success: what a computer can do for an end user.
The best and easiest way to do this is not by carefully examining two-qubit error rate, or anything else. It’s by running actual algorithms on the hardware and reporting the results. Not only does this approach indirectly incorporate all of the already-mentioned low level metrics, including ones not listed here, it gets more directly at the most important questions. What algorithmically- or commercially-relevant problems can it solve? That other systems can’t? How is it impacting a customer’s business?
Algorithmic benchmarking initiatives like the QED-C benchmarks that the Quantum Economic Development Consortium are attempting to standardize this approach across many systems and hardware approaches and provide easily-legible, comparable results, but even past that, you can always just run something. IonQ hardware is available on every major cloud, and most other hardware manufacturers have some amount of cloud access available. Just get an API key, fire up an example notebook, and see for yourself what the state of the art can do.
Quantum engineers and other close-to-the-qubits folks will always need to know about a system’s spec sheet in fine detail, and quickly spot omissions or performance-limiting characteristics. For everyone else, asking for more data and being able to understand it is valuable when evaluating claims, but not necessarily the first thing we need to reach for in evaluating quantum computing performance.
The best way to understand the performance of a quantum computer—or any computer—is to run real programs on it.
1 For the purpose of this made-up example, let’s assume that everything else about these systems is completely perfect, the qubits are fully-connected, and every two-qubit gate has a fidelity of exactly 98%. In practice, these things likely wouldn’t be true, which would drag these estimates down even further.↫
2 In the cases of mid-circuit measurement and reset operations, you might do this in the middle of the operation, too.↫
3 To spell this out in a little more detail: we’re taking the fidelity for each SPAM operation — the likelihood of successful readout for one qubit— and then multiplying it by the likelihood of successful readout for all of the other qubits. Together, you only have about a one-in-three chance of getting the right answer, even if all of your gates are perfect.↫