4.2 Article

Estimating hidden population size using Respondent-Driven Sampling data

Journal

ELECTRONIC JOURNAL OF STATISTICS
Volume 8, Issue -, Pages 1491-1521

Publisher

INST MATHEMATICAL STATISTICS
DOI: 10.1214/14-EJS923

Keywords

hard-to-reach population sampling; network sampling; social networks; successive sampling; model-based survey sampling

Funding

  1. NICHD [1R21HD063000, 5R21HD075714-02]
  2. ONR [N00014-08-1-1015]
  3. NSF [MMS-0851555, MMS-1357619, SES-1230081]
  4. National Agricultural Statistics Service
  5. Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) [R24-HD041022]
  6. Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant [R24 HD042828]
  7. EUNICE KENNEDY SHRIVER NATIONAL INSTITUTE OF CHILD HEALTH & HUMAN DEVELOPMENT [R24HD042828, R24HD041022, P01HD031921, R21HD075714] Funding Source: NIH RePORTER
  8. EUNICE KENNEDY SHRIVER NATIONAL INSTITUTE OF CHILD HEALTH &HUMAN DEVELOPMENT [R21HD063000] Funding Source: NIH RePORTER
  9. Divn Of Social and Economic Sciences [1357619] Funding Source: National Science Foundation

Ask authors/readers for more resources

Respondent-Driven Sampling (RDS) is n approach to sampling design and inference in hard-to-reach human populations. It is often used in situations where the target population is rare and/or stigmatized in the larger population, so that it is prohibitively expensive to contact them through the available frames. Common examples include injecting drug users, men who have sex with men, and female sex workers. Most analysis of RDS data has focused on estimating aggregate characteristics, such as disease prevalence. However, RDS is often conducted in settings where the population size is unknown and of great independent interest. This paper presents an approach to estimating the size of a target population based on data collected through RDS. The proposed approach uses a successive sampling approximation to RDS to leverage information in the ordered sequence of observed personal network sizes. The inference uses the Bayesian framework, allowing for the incorporation of prior knowledge. A flexible class of priors for the population size is used that aids elicitation. An extensive simulation study provides insight into the performance of the method for estimating population size under a broad range of conditions. A further study shows the approach also improves estimation of aggregate characteristics. Finally, the method demonstrates sensible results when used to estimate the size of known networked populations from the National Longitudinal Study of Adolescent Health, and when used to estimate the size of a hard-to-reach population at high risk for HIV.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available