ICCS 2016 sampling design using R

To analyse national student data collected for International Civic and Citizenship Education Study (ICCS 2016) one should take into consideration its complex survey sampling design, “two-stage sampling procedure whereby a random sample of schools is selected at first stage, and one or two intact target grade classes (…) are sampled at the second stage”, as well as multiple imputation of civic knowledge proficiency test scores. Details can be found in ICCS 2016 User Guide for the International Database and ICCS 2016 Technical Report. This post shows how to prepare data for analysis using survey and mitools R packages.

You will need to register at IEA Data Repository to get the data files. This example uses datasets in SPSS format which are imported using read.spss() function in foreign package. There are four relevant national datasets for Croatia and they are read and merged into an object called 'data'.

# path = 'path/to/data/dir'
filenames <- list.files(path, pattern = ".sav$", full.name = TRUE)
data.list <- lapply(filenames, foreign::read.spss, use.value.labels = FALSE, to.data.frame = TRUE, 
    use.missings = TRUE)
data <- Reduce(function(...) merge(..., all = TRUE), data.list)

To analyse test scores, note that students did not take a complete test and the final population estimate was derived using plausibile values methodology based on multiple imputation technique. There are five sets of plausibile values provided in the dataset (columns PV1CIV to PV5CIV) and estimation should always be done using all of them, as recommended in technical documentation. To deal with multiple imputation in a complex survey design context, the first step is to create five versions of the same dataset, each with a new column representing a different set of plausibile values. In the second step, survey design object ‘des’ is created based on provided replicate weights using svrepdesign() function from survey package and a list of datasets using imputationList() function from mitools package.

data.imp <- lapply(1:5, function(x, d = data) {
    d$pvciv <- data[, paste0("PV", x, "CIV")]
    return(d)
})

library("mitools")
library("survey")
des <- svrepdesign(repweights = "^SRWGT", type = "JK1", scale = 1, combined.weights = TRUE, 
    weights = ~TOTWGTS, data = imputationList(data.imp))

To combine plausibile values of test scores, use functions from survey package wrapped in with() and MIcombine() functions from mitools package.

MIcombine(with(des, svymean(~pvciv)))

## Multiple imputation results:
##       with(des, svymean(~pvciv))
##       MIcombine.default(with(des, svymean(~pvciv)))
##        results       se
## pvciv 531.2111 2.470435

Analysis of other variables should be done in the same way, for example to get proper proportions and standard errors as shown on page 86 of the report Becoming Citizens in a Changing World you could do something like this:

des <- update(des, is3g16b = car::Recode(IS3G16B, "1:2='Yes';3='No'"), is3g16c = car::Recode(IS3G16C, 
    "1:2='Yes';3='No'"), is3g16e = car::Recode(IS3G16E, "1:2='Yes';3='No'"))

x <- MIcombine(with(des, svymean(~is3g16b + is3g16c + is3g16e, na.rm = TRUE)))
cbind(percentage = round(coef(x) * 100, 1), SE = round(SE(x) * 100, 2))[c(2, 4, 6), 
    ]

##            percentage   SE
## is3g16bYes       91.0 0.55
## is3g16cYes       20.2 1.02
## is3g16eYes       57.6 1.12

Please do read linked technical documentation! If you have any feedback, make a contact.