Dataset back-checks: Adding random back-checks to a household survey

Previous Next

Click here to download the dataset back-checks sample form. This .zip file contains three .xlsx files with form definitions plus two .xml files with dataset definitions.

This sample extends the dataset basics sample, so you may want to review that sample first. One of the two datasets used here and two of the three survey forms are shared with that sample (so this sample only includes one extra dataset and one extra survey form).

You will upload this sample to your server in five parts: as three forms and two datasets that link them. First, go to the Your forms and datasets section of the Design tab and upload each of the three .xlsx files, one at a time, using using + followed by Upload form definition. Then, click + again and Add server dataset to add a new dataset, choose the New dataset from definition tab, and upload the .xml file; then do the same for the second .xml file. (If you've already uploaded the one dataset for the dataset basics sample, you don't need to re-upload that.)

This sample demonstrates how datasets can be used to link multiple survey forms into a single workflow to perform back-checks. Expanding upon the prior sample, the new two-part workflow is as follows:

A: Household listing -> household sample -> household surveys

B: Household surveys -> back-check sample -> back-check surveys

Accordingly, the sample includes three survey forms and two datasets to link them: a household listing survey, a household sample dataset, a household survey, a back-check sample dataset, and a back-check survey.

To understand the first part of the workflow (labeled A above), please see the prior sample. Here, we will focus on the second part (labeled B).

Look first at the household survey form, Sample-Dataset-Basics-HHSurvey.xlsx. In addition to the fields discussed in the dataset basics sample, it includes three calculate fields at the very end:

type	name	calculation
calculate	randomdraw_bc	once(random())
calculate	bc_selected	if(${randomdraw_bc} <= 0.15, 1, 0)
calculate	survey_date	once(today())

These fields use a random draw to select 15% of surveys for back-checking (i.e., for auditing via independent re-visits), then record the date on which each survey was conducted. (Building the random selection into the survey itself automates the entire process, but you could also organize a workflow that allows you to randomly select surveys for back-checking; in that case, you'd manually update a server dataset by uploading your own back-check selections.)

This household survey form publishes data directly into the back-check sample dataset included with this sample. Specifically, the hhid field publishes into the dataset's hhid_key field, and the address, survey date, and head-of-household fields publish to fields by the same names. In particular, the age recorded for the head of household publishes to the dataset because we want to use the back-check survey to verify that this age was captured correctly. Also, the publishing is set so that only submissions for which bc_selected is equal to 1 are published into the dataset because we want this dataset to hold just our randomly-chosen back-check sample (not all surveyed households).

This dataset is then attached – as pre-loaded data – to the third form included with this sample, the back-check survey form.

This third form begins with a field that asks the enumerator to select one of the households randomly selected for back-checks.

This field dynamically loads the list of options from pre-loaded data (in this case, data pre-loaded from the attached back-check sample dataset). Following are the relevant rows from the survey and choices worksheets of the form definition.

type	name	label	appearance
select_one household	hhid	Which household are you surveying?	search('bc_sample')

list_name	value	label
household	hhid_key	address, hoh_name

The search() appearance indicates that the choice options should include all unique values from the bc_sample dataset. The value and label columns of the choices sheet indicate the dataset columns to use for the choice values and labels (hhid_key for the values and both address and hoh_name for the labels).

Next, once a specific back-check household has been chosen, the pulldata() function is used to read the address, hoh_name, hoh_age, and survey_date fields from the pre-loaded data (i.e., from the dataset).

type	name	calculation
calculate	address	pulldata('bc_sample', 'address', 'hhid_key', ${hhid})
calculate	hoh_name	pulldata('bc_sample', 'hoh_name', 'hhid_key', ${hhid})
calculate	original_age	pulldata('bc_sample', 'hoh_age', 'hhid_key', ${hhid})
calculate	survey_date	pulldata('bc_sample', 'survey_date', 'hhid_key', ${hhid})

Now part of the back-check survey form, these fields can be referenced to, for example, solicit confirmation that the enumerator is at the correct household.

Fields can also be used to internally check whether responses recorded during the back-check match responses recorded during the original survey. In this sample, the age_different field is used to determine whether the household head's age recorded during back-check is different from what was recorded during the original survey.

type	name	calculation
calculate	age_different	if(${hoh_age_bc} != ${original_age}, 1, 0)

If there is a discrepancy between the two answers, this field will be recorded with a value of 1; otherwise, it will be recorded with a value of 0. This field is not displayed to the back-checker. Rather, it is discreetly recorded in the survey form so that you can easily check discrepancy rates after you have exported the data. (You could also imagine other designs, however. For example, you could automatically ask the back-checker to double-check an entry whenever it doesn't match with what was recorded in the original survey.)

This sample leaves off at this point. However, you could add another server dataset that merges key information from the listing, household, and back-check surveys. In that dataset, you could have one row, identified by the unique household ID, with information from all three surveys (including whether it was sampled, whether it was back-checked, and what the error rate was in cases where back-checks were done). There are a great many possible designs and configurations.

Previous Next

Dataset back-checks: Adding random back-checks to a household survey

Don't have a SurveyCTO account yet?