$\begingroup$

Some existing answers talk about statistical inference and some about interpretation of probability, and none clearly makes the distinction. The main purpose of this answer is to make this distinction.

The word "frequentism" (and "frequentist") can refer to TWO DIFFERENT THINGS:

One is the question about what is the definition or the interpretation of "probability". There are multiple interpretations, "frequentist interpretation" being one of them. Frequentists would be the people adhering to this interpretation. Another is statistical inference about model parameters based on observed data. There is a Bayesian and a frequentist approaches to statistical inference, and frequentists would be the people preferring to use the frequentist approach.

Now comes a speculation: I think there are almost no frequentists of the first kind (P-frequentists), but there are lots of frequentists of the second kind (S-frequentists).

Frequentist interpretation of probability

The question of what is probability is a subject of intense ongoing debate with 100+ years of history. It belongs to philosophy. I refer anybody not familiar with this debate to the Interpretations of Probability article in the Stanford Encyclopedia of Philosophy which contains a section on frequentist interpretation(s). Another very readable account that I happen to know of, is this paper: Appleby, 2004, Probability is single-case or nothing -- which is written in the context of foundations of quantum mechanics, but contains sections focusing on what probability is.

Appleby writes:

Frequentism is the position that a probability statement is equivalent to a frequency statement about some suitably chosen ensemble. For instance, according to von Mises [21, 22] the statement “the probability of this coin coming up heads is 0.5” is equivalent to the statement “in an infinite sequence of tosses this coin will come up heads with limiting relative frequency 0.5”.

This might seem reasonable, but there are so many philosophical problems with this definition that one hardly knows where to start. What is the probability that it will rain tomorrow? Meaningless question, because how would we have an infinite sequence of trials. What is the probability of the coin in my pocket coming up heads? A relative frequency of heads in an infinite sequence of tosses, you say? But the coin will wear off and the Sun will go supernova before the infinite sequence can be finished. So we should be talking about a hypothetical infinite sequence. This brings one to the discussion of reference classes etc. etc. In philosophy one does not get away so easily. And by the way, why should the limit exist at all?

Furthermore, what if my coin were to come up heads 50% of the time during the first billion of years but then would start coming up heads only 25% of the time (thought experiment from Appleby)? This means that $P(\mathrm{Heads})=1/4$ by definition. But we will always be observing $\mathrm{Frequency}(\mathrm{Heads})\approx 1/2$ during the next billion years. Do you think such a situation is not really possible? Sure, but why? Because the $P(\mathrm{Heads})$ cannot suddenly change? But this sentence is meaningless for a P-frequentist.

I want to keep this answer short so I stop here; see above for the references. I think it is really difficult to be a die-hard P-frequentist.

(Update: In the comments below, @mpiktas insists that it is because the frequentist definition is mathematically meaningless. My opinion expressed above is rather that the frequentist definition is philosophically problematic.)

Frequentist approach to statistics

Consider a probabilistic model $P(X\mid\theta)$ that has some parameters $\theta$ and allows to compute the probability of observing data $X$. You did an experiment and observed some data $X$. What can you say about $\theta$?

S-frequentism is the position that $\theta$ is not a random variable; its true values in Real World are what they are. We can try to estimate them as some $\hat \theta$, but we cannot meaningfully talk about probability of $\theta$ being in some interval (e.g. being positive). The only thing we can do, is to come up with a procedure of constructing some interval around our estimate such that this procedure succeeds in encompassing true $\theta$ with a particular long-run success frequency (particular probability).

Most of the statistics used in natural sciences today is based on this approach, so there certainly are lots of S-frequentists around today.

(Update: if you look for an example of a philosopher of statistics, as opposed to practitioners of statistics, defending S-frequentist point of view, then read Deborah Mayo's writings; +1 to @NRH's answer.)

UPDATE: On the relationship between P-frequentism and S-frequentism

@fcop and others ask about the relationship between P-frequentism and S-frequentism. Does one of these positions imply another one? There is no doubt that historically S-frequentism was developed based on P-frequentist stance; but do they logically imply one another?

Before approaching this question I should say the following. When I wrote above that there are almost no P-frequentists I did not mean that almost everybody is P-subjective-bayesian-a-la-de-finetti or P-propensitist-a-la-popper. In fact, I believe that most statisticans (or data-scientists, or machine-learners) are P-nothing-at-all, or P-shut-up-and-calculate (to borrow Mermin's famous phrase). Most people tend to ignore foundation problems. And it is fine. We do not have a good definition of free will, or intelligence, or time, or love. But this should not stop us from working on neuroscience, or on AI, or on physics, or from falling in love.

Personally, I am not a S-frequentist, but neither do I have any coherent view on foundations of probability.

In contrast, almost everybody who did some practical statistical analysis is either a S-frequentist or a S-Bayesian (or perhaps a mixture). Personally, I published papers containing $p$-values and I have never (so far) published papers containing priors and posteriors over model parameters so this makes me a S-frequentist, at least in practice.

It is therefore clearly possible to be a S-frequentist without being a P-frequentist, despite what @fcop says in his answer.

Okay. Fine. But still: Can a P-bayesian be a S-frequentist? And can a P-frequentist be a S-bayesian?

For a convinced P-bayesian it is probably atypical to be a S-frequentist, but in principle entirely possible. E.g. a P-bayesian can decide that they do not have any prior information over $\theta$ and hence adopt a S-frequentist analysis. Why not. Every S-frequentist claim can certainly be interpreted with P-bayesian interpretation of probability.

For a convinced P-frequentist to be S-bayesian is probably problematic. But then it is very problematic to be a convinced P-frequentist...