Chapter 15 Sample size

When designing a study for estimating disease prevalence or diagnostic test accuracy we will want to evaluate the required sample size to achieve a certain precision in our test accuracy and disease prevalence estimates. In the following sections, we will first present very basic methods which can be used to estimate the sample size needed. Then we will present a method specific for BLCM, which may provide a more accurate sample size estimation.

15.1 Basic method

Remember, the main parameters we will want to estimate are a disease prevalence and one (or many) sensitivity and specificity, and these are all proportions. More precisely:

  • The prevalence is \(\frac{\text{Number of diseased}}{\text{Number individuals}}\)

  • The sensitivity is \(\frac{\text{Number of test+}}{\text{Number diseased}}\)

  • The specificity is \(\frac{\text{Number of test-}}{\text{Number healthy}}\)

For any of these fractions, increasing the denominator will result in a decrease in the width of the 95% credibility interval (i.e., a greater precision). To determine the denominator (i.e., the sample size) that could generate a given width for the 95%CI, we could use the following, very simple Frequentist methods.

\(n=p*(1-p)*(\frac{1.96}{E})^{2}\)

Where n is the required denominator, p is the expected proportion, and E is the width of the 95%CI we would like to achieve.

Alternatively, we could compute instead the width of the 95%CI as function of a given denominator.

\(E=1.96*\sqrt{\frac{p*(1-p)}{n}}\)

Therefore, in order to estimate the required sample size, we would first need to provide an educated guess regarding:
- The true prevalence of disease we will observe;
- The sensitivity and specificity of the test(s) under investigation.

As an example, if we were to plan a study:
- On a population where we think the prevalence will be around 30%;
- Using a test that we think will have a sensitivity of 85% and specificity of 95%;
- Where we would like to have a width of the 95%CI of 10 percentage-points (i.e., +/- 5% on each side of the median estimate).

We could use the following scripts to get a “ballpark” estimate of the required sample size.

#Sample size for the prevalence
p <- 0.30
E <- 0.10

n <- p*(1-p)*(1.96/E)^2
n
## [1] 80.6736

For the prevalence, we would need 81 individuals to obtain a 95%CI width of 10 percentage-points if the prevalence is around 30%.

Note that, to obtain a 95%CI width of 5 percentage-points:

#Sample size for the prevalence
p <- 0.30
E <- 0.05

n <- p*(1-p)*(1.96/E)^2
n
## [1] 322.6944

We would now need 323 individuals.

Regarding the sensitivity, remember that the denominator will be the truly diseased individuals, rather than all sampled individuals.

#Sample size for the sensitivity
p <- 0.85
E <- 0.10

n <- p*(1-p)*(1.96/E)^2
n
## [1] 48.9804

We would need 49 diseased individuals to obtain a 95%CI width of 10 percentage-points if the sensitivity is around 85%.

Regarding the specificity, again the denominator will be the truly healthy individuals, rather than all sampled individuals.

#Sample size for the specificity
p <- 0.95
E <- 0.10

n <- p*(1-p)*(1.96/E)^2
n
## [1] 18.2476

We would need 18 healthy individuals to obtain a 95%CI width of 10 percentage-points if the specificity is around 95%.

Now, we need to reunite these different estimates together. For instance, we saw that measuring 81 individuals was enough to get the precision we needed for the prevalence of disease. But, with 81 individuals and a prevalence of 30%, we would, theoretically, have 24 diseased (30% of 81) and 57 healthy (70% of 81) individuals. And we just computed that we needed 49 diseased individuals to get a precise estimate of the test’s sensitivity.

In this case, we can see that the sensitivity is the most limiting parameter. We could, thus, possibly recruit a minimum of 150 individuals and, hopefully, 50 will be diseased and 100 will be healthy. With that latter sample, we would possibly achieve the precision needed for the test’s sensitivity, and we would exceed what we needed for the test’s specificity and for disease prevalence. Of course this will work if our guess estimates were not too “off the mark”.

This basic method, however, is not considering the loss of power that will occur when comparing imperfect tests with one another (as compared to comparing a novel test to a gold-standard test). This should, therefore, be considered as an optimistic scenario.

15.2 Method for the Hui-Walter model

Georgiadis et al. (2005) proposed a method to calculate sample size to estimate sensitivity and specificity with a desired precision, when using the Hui and Walter (1980) model (two conditionally independent tests for screening individuals from two populations). In their article, Georgiadis et al. (2005) provided a hyperlink to an Excel spreadsheet template. We included this spreadsheet in the course material.

Briefly, we need to provide our best guess on:
- Disease prevalence in population 1 (\(\pi1\));
- Disease prevalence in population 2 (\(\pi2\));
- Sensitivity of the first (\(Se1\)) and second test (\(Se2\));
- Specificity of the first (\(Sp1\)) and second test (\(Sp2\)).

And, then, the desired 95%CI width for these different parameters.

Below is a screen capture of the spreadsheet. In this case, we proposed to study two populations where:
- Disease prevalence (\(\pi1\) and \(\pi2\)) would be 0.10 and 0.70;
- Sensitivity of the two tests (\(Se1\) and \(Se2\)) would be 0.85 and 0.95;
- Specificity of the two tests (\(Sp1\) and \(Sp2\)) would both be 0.95;
- Desired width of the 95% CI for disease prevalence (\(\pi1\) and \(\pi2\)) would be 0.15;
- Desired width of the 95% CI for sensitivity (\(Se1\) and \(Se2\)) would be 0.20;
- Desired width of the 95% CI for specificity (\(Sp1\) and \(Sp2\)) would be 0.10;

Excel spreadsheet used to estimate sample size for BLCM comparing two independent tests in two populations
Excel spreadsheet used to estimate sample size for BLCM comparing two independent tests in two populations

Additional information are provided in the spreadsheet, but we can already see that we would need to sample 134 individuals from the low (10%) prevalence population (N1) and 402 from the high (70%) prevalence population (N2) to obtain the desired 95%CI.

Again, this will work if our guess estimates were not too “off the mark”. One additional note, when using the Georgiadis et al. (2005) method, we are not considering any informative priors hat we may want to use. Typically, if we were to use informative priors on some or all of the unknown parameters the width of the 95%CI will possibly be modified. For instance, if we provided relatively narrow priors for a given parameter, and if these priors agree closely with the observed data, than the 95%CI could be more precise than anticipated (i.e., narrower). On the other hand, a narrow priors which is not consistent with the observed data would possibly yield larger 95%CI.

Nevertheless, this spreadsheet is quite handy to get rough sample size estimates. It is also quite useful to explore, for instance, how larger differences in disease prevalence between populations can improve the study power. One final note, if we are planning a study with more than two populations, or more than two tests, this spreadsheet could be used to obtain a “worst-case scenario” sample size estimation.