What Every Experimenter Must Know About Randomization

▲

What Every Experimenter Must Know About Randomization(spawn-queue.acm.org)

51 points byunderscoreF7 hours ago |6 comments

▲p1necone54 minutes ago

I'm no statistician, but the part about halfway through that says not to use PRNGs for random assignment into bins seems wrong to me?

Sure I can understand why for a research trial you might want just want to be totally safe and use a source of true randomness, but for all practical purposes a decent PRNG used for sorting balls into buckets is totally indistinguishable from true randomness is it not?

I was half expecting this to have been written a few decades ago when really bad PRNGs were in common usage, but the article seems to be timestamped 2025.

▲freehorse2 hours ago

It reminded me an experiment where each subject was presented with a pseudorandomised sequence of trials, only that, unknown to the researchers, every time the experiment was running the same (default) seed was used, which resulted in all subjects being presented to the same "random" sequence of trials.

▲BigTTYGothGF5 hours ago

"If N = 300, even a 256-bit seed arbitrarily precludes all but an unknown, haphazardly selected, non-random, and infinitesimally small fraction of permissible assignments. This introduces enormous bias into the assignment process and makes total nonsense of the p-value computed by a randomization test."

The first sentence is obviously true, but I'm going to need to see some evidence for "enormous bias" and "total nonsense". Let's leave aside lousy/little/badly-seeded PRNGs. Are there any non-cryptographic examples in which a well-designed PRNG with 256 bits of well-seeded random state produces results different enough from a TRNG to be visible to a user?

▲lilyball3 hours ago

The argument against PRNGs this paper makes isn't that the PRNG produces results that can be distinguished from TRNG, but that the 256-bit seed deterministically chooses a single shuffling. If you need 300 bits to truly shuffle the assignment but you only have 256 bits, then that's a lot of potential assignments that can never actually happen. With this argument it doesn't matter what the PRNG is, the fact that it's deterministic is all that matters. And this invalidates the p-value because the p-value assumes that all possible assignments are equiprobable, when in fact a lot of possible assignments have a probability of zero.

I imagine you could change the p-value test to randomly sample assignments generated via the exact same process that was used to generate the assignment used by the experiment, and as you run more and more iterations of this the calculated p-value should converge to the correct value, but then the question becomes is the p-value calculated this way the same as the p-value you'd get if you actually went ahead and used equiprobable assignment to begin with?

Ultimately, this all comes down to the fact that it's not hard to use true randomness for the whole thing, and true randomness produces statistically valid results, if you use true randomness for assignment then you can't screw up the p-value test, and so there's no reason at all to even consider how to safely use a PRNG here, all that does is open the door to messing up.

▲gzread3 hours ago

If you have 300 bits of shuffling entropy, you have a lot of potential assignments that can never happen because you won't test them before the universe runs out. No matter how you pick them.

Of course a PRNG generates the same sequence every time with the same seed, but that's true of every RNG, even a TRNG where the "seed" is your current space and time coordinates. To get more results from the distribution you have to use more seeds. You can't just run an RNG once, get some value, and then declare the RNG is biased towards the value you got. That's not a useful definition of bias.

▲jmalicki2 hours ago

The number of possible assignments has to be effectively close to an integer multiple of the number of shuffles.

It doesn't matter how many universes it would take to generate all of them, there are some assignments that are less likely.

▲freehorse1 hour ago

And why does it matter in the context of randomly assigning participants in an experiment into groups? It is not plausible that any theoretical "gaps" in the pseudorandomness are related to the effect you are trying to measure, and unlikely that there is a "pattern" created in how the participants get assigned. You just do one assignment. You do not need to pick a true random configuration, just one random enough.

I assume that as long as p-values are concerned, the issue raised could very well be measured with simulations and permutations. I really doubt though that the distribution of p-values from pseudorandom assignments with gaps would not converge very fast to the "real" distribution you would get from all permuations due to some version of a law of large numbers. A lot of resampling/permutation techniques work by permuting a negligible fraction of all possible permutations, and the distribution of the statistics extracted converges pretty fast. As long as the way the gaps are formed are independent of the effects measured, it sounds implausible that the p-values one gets are problematic because of them.

▲labcomputer2 hours ago

So here's how I would think about it intuitively:

We can create a balanced partitioning of the 300 turkeys with a 300 bit random number having an equal number of 1's and 0's.

Now suppose I randomly pick 300 bit number, still with equal 0's and 1's, but this time the first 20 bits are always 0's and the last 20 bits are always 1's. In this scenario, only the middle 260 bits (turkeys) are randomly assigned, and the remaining 40 are deterministic.

We can quibble over what constitutes an "enormous" bias, but the scenario above feels like an inadequate experiment design to me.

As it happens, log2(260 choose 130) ~= 256.

> Are there any non-cryptographic examples in which a well-designed PRNG with 256 bits of well-seeded random state produces results different enough from a TRNG to be visible to a user?

One example that comes to mind is shuffling a deck of playing cards. You need approximately 225 bits of entropy to ensure that every possible 52 card ordering can be represented. Suppose you wanted to simulate a game of blackjack with more than one deck or some other card game with more than 58 cards. 256 bits is not enough there.

▲fc417fc8021 hour ago

It's an interesting observation and that's a nice example you provided but does it actually matter? Just because certain sequences can't occur doesn't necessarily mean the bias has any practical impact. It's bias in the theoretical sense but not, I would argue, in the practical sense that is actually relevant. At least it seems to me at first glance, but I would be interested to learn more if anyone thinks otherwise.

For example. Suppose I have 2^128 unique playing cards. I randomly select 2^64 of them and place them in a deck. Someone proceeds to draw 2^8 cards from that deck, replacing and reshuffling between each draw. Does it really matter that those draws weren't technically independent with respect to the larger set? In a sense they are independent so long as you view what happened as a single instance of a procedure that has multiple phases as opposed to multiple independent instances. And in practice with a state space so much larger than the sample set the theoretical aspect simply doesn't matter one way or the other.

We can take this even farther. Don't replace and reshuffle after each card is drawn. Since we are only drawing 2^8 of 2^64 total cards this lack of independence won't actually matter in practice. You would need to replicate the experiment a truly absurd number of times in order to notice the issue.

▲DevelopingElk1 hour ago

By the definition of a cryptographically secure PRNG, no. They, with overwhelming probability, produce results indistinguishable from truly random numbers no matter what procedure you use to tell them apart.

▲jmalicki2 hours ago

MCMC can be difficult for this reason. There are concepts like "k-dimensional equidistribution" etc. etc... where in some ways the requirements of a PRNG are far, far, higher than a cryptographically sound PRNG, but also in many ways less so, because you don't care about adversarial issues, and would prefer speed.

If you can't generate all possible assignments, you care about second and third order properties etc. of the sequence.

▲mananaysiempre2 hours ago

> There are concepts like "k-dimensional equidistribution" etc. etc... where in some ways the requirements of a PRNG are far, far, higher than a cryptographically sound PRNG

Huh? If you can chew through however many gigabytes of the supposed CSPRNG’s output, do some statistics, and with a non-negligible probability tell if the bytes in fact came from the CSPRNG in question or an actual iid random source, then you’ve got a distinguisher and the CSPRNG is broken.

▲jmalicki1 hour ago

It all comes down to actual specific statistical tests, and how hard they are to break in specific applications.

No CSPRNG is absolutely perfect, no CSPRNG has ever absolutely passed every statistical test thrown at it.

In MCMC, it stresses very different statistical tests than the typical CSPRNG tests.

Every PRNG is absolutely broken if you want to be absolute about it. MCMC and crypto applications push on different aspects where statistical issues will cause application level failures.

See e.g. this paper https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf

(it's not the end all be all, but it's a good survey of why this stuff matters and why it's different)

▲mananaysiempre9 minutes ago

> no CSPRNG has ever absolutely passed every statistical test thrown at it

As far as I know (admittedly not the highest standard), there is no published statistical test that you could run on, for example, a single AES-256-CTR bitstream set up with a random key and IV, running on a single computer, that would be able to tell you with a meaningful probability that you were looking at a pseudorandom rather than truly random input before the computer in question broke down. (I’m assuming related-key attacks are out of scope if we’re talking about an RNG for simulation purposes.)

▲Tomte6 hours ago

Starts interesting, then veers into the usual "true random number" bullshit. Use radioactive decay as source of your random numbers!

▲amelius6 hours ago

How do we know it's truly random?

▲ChadNauseam4 hours ago

The only known explanation of what's going on in quantum mechanics is a multiversal one^[1]. Using radioactive decay of an atom as an example: there are an uncountably infinite number of universes that are initially "fungible" (identical in every way), and over time the universes gradually differentiate themselves with the atom going from a non-decayed to decayed state, at different times in each universe. But you will be in all of those universes. So if you thought the atom would decay in, let's say 5 seconds, there would be some universes where you were right and some where you were wrong. That makes it impossible to ever make reliable specific predictions about when the atom will decay. So, in practice that just looks like perfect randomness.

^[1]: There are other interpretations, of course. And those other interpretations are equally explanatory. But they do not claim to be explanations of what is actually happening to unobserved quantum particles. There is also Bohmian mechanics, but I don't know how many people take it seriously.

▲zeroxfe5 hours ago

> usual "true random number" bullshit

What's bullshit about it? This is how TRNGs in security enclaves work. They collect entropy from the environment, and use that to continuously reseed a PRNG, which generates bits.

If you're talking "true" in the philosophical sense, that doesn't exist -- the whole concept of randomness relies on an oracle.

▲wavemode5 hours ago

What PRNGs lack compared to TRNGs is security (i.e. preventing someone from being able to use past values to predict future values). It's not that they somehow produce statistically invalid results (e.g. they generate 3s more often than 2s or something). Unless they're very poorly constructed.

▲refsys4 hours ago

Maybe people have bad memories from linear congruential generators, these could go really bad (https://en.wikipedia.org/wiki/Marsaglia%27s_theorem)

▲wtallis5 hours ago

I don't think hardware random number generators are bullshit, but it's easy to overstate their importance. Outside of cryptography, there aren't a whole lot of cases that truly require that much care in how random numbers are generated. For the kind of examples the article opens with (web page A/B testing, clinical trials, etc.) you'll never have sample sizes large enough to justify worrying about the difference between a half-decent PRNG and a "true" random number generator.