Collecting high-quality data

Previous Next

Collecting high-quality data can be hard – much harder, at least, than collecting poor-quality data. It involves thought and effort at several key stages of the data-collection process: survey design, training, and supervision and monitoring during data collection. Luckily, SurveyCTO has been designed to make your life systematically easier, so that you can collect higher-quality data with relatively less effort. Following are a series of quality-assurance-related suggestions, along with links to more details about particular SurveyCTO features that can help.

Click here for a webinar about how real SurveyCTO users are implementing SurveyCTO tools to collect high-quality data.

Survey design

Whether you're collecting data with a digital device or paper and pencil, many of the same rules apply: you want to think hard about which questions you ask, how you ask them, and in what order; you want to exhaustively pre-test and pilot every question; and you want to make sure that everything you do gets translated properly (and re-tested) in all of the necessary languages or dialects. With digital data-collection, you have some additional opportunities, however, to improve data quality with good survey design.

Identifying enumerators

When working with a team of enumerators or other data-collectors who fill out your forms, it's important to know which enumerator filled out which form. The enumerator field type is designed specifically to help ensure accurate identification of your enumerators. See Managing enumerators to learn more.

Skipping irrelevant questions

The first opportunity may seem obvious, but avoid asking questions that are not relevant. By enabling your form to automatically skip irrelevant questions, you minimize the possibility of skip-pattern errors. And, while it is tempting to let your enumerators sort out which questions should be asked when, it is often better for you to design your skip logic in a more predictable, error-free fashion. See the help on relevance for more on implementing automatic skip patterns.

Constraining responses

The next opportunity is data validation. Using constraints in your survey form will prevent enumerators from entering data that is obviously incorrect, invalid, or inconsistent. While you do not want to restrict answers to the extent that enumerators are totally unable to enter unusual (but sometimes correct) values, you should disallow answers that are clearly impossible or those that contradict earlier responses. After all, the cheapest and safest possible point at which to correct a mistake is during the interview itself. See the help on constraints for more on constraining enumerator responses.

If you do not want to fully disallow some kinds of responses but still want to warn the enumerator that the responses are suspicious, you can also implement a warning or confirmation (otherwise known as a "soft" constraint). Simply add a note or yes-no confirmation that is only relevant (i.e., that only appears to the enumerator) if an entry looks potentially incorrect, invalid, or inconsistent. That way, your concern as a survey designer is raised under certain suspicious circumstances, but the enumerator is ultimately able to either go back and correct something or continue without making any changes.

Designing for workflow

Finally, SurveyCTO gives you additional options for controlling the workflow surrounding how survey forms are filled out: how your enumerators navigate the survey, when and how they save or review their work, when and how they finalize data and send it in. It is helpful to keep these options in mind as you design your survey to fit the circumstances your enumerators are likely to face as they administer your survey in the field. See the help on workflows for more on designing your form for optimal workflow.

Survey testing

The quality of the survey instrument itself – its design, and the degree to which it is error-free – can have a profound impact on the quality of the data you collect. So before you start collecting any data at all, it's vitally important that you thoroughly test your survey form. See our help topic on testing to learn more.

Supervision and monitoring

Supervision and monitoring of survey operations happens at several levels. First, there is field-level supervision and monitoring by team leaders and field supervisors. Then, presuming that your field teams are periodically sending data to the SurveyCTO server, you can also monitor data as it comes in; this gives you extremely valuable opportunities to manage the overall data-collection process in ways that help to assure a high level of data quality.

Field-level supervision

One commonly-used tool for field-level supervision is the "back-check." Supervisors – or specially-hired back-checkers – return to a random proportion of the households or facilities surveyed, confirm that they were visited by enumerators, and re-ask certain questions to compare with the original responses. It can be cumbersome to set up the workflow for back checks: you have to obtain all original surveys, select a certain percentage to be back-checked, compile and upload the original data from the selected sample to a back-checker's device, and then send them out to conduct the back-check. Using datasets, you can automate the workflow for your back-check surveys and make the overall turn-around time much faster with less work. See the back-checking sample for a working example.

Office-level monitoring

The surest single way to improve the quality of your entire data-collection effort is to review your data as it comes in. Use the "Monitor form data" action in the Form submissions and dataset data section of the Monitor tab to jump into SurveyCTO's built-in Data Explorer. There, you can configure and save a monitoring workbook, review aggregate data as well as individual submissions, and catch potential data-quality issues right away.

To review some or all incoming data in a systematic way, before it is released for publishing or export, you can enable the review and correction workflow for any or all of your forms. Doing so builds in explicit review of some or all incoming data, before it is passed on to any outside systems.

For reviewing your data outside SurveyCTO, you can export all of your data or export different subsets of your data for review by different people or different teams. If you're collecting GPS locations, you can even export to Google Earth in order to review the data in a more visual way. If reviewing raw data in spreadsheet format is too difficult, you can also use Microsoft Word's mail merge feature to create more easy-to-review versions of incoming data.

Meanwhile, you need to track your back-office data review, processing, and management processes. Because many people track these processes in spreadsheets, SurveyCTO makes it easy to merge (subsets of) incoming data with Microsoft Excel workbooks or Google Sheets.

SurveyCTO can also help you to catch potential problems that may not be obvious in your more manual reviews. If you configure automated quality checks, SurveyCTO can help automatically detect potential problems and provide specific warnings about quality concerns. See the help topic on quality checks for more details.

Finally, you can automatically execute other automated processes (e.g., a Stata .do file to process and review the new data) whenever you export new data using SurveyCTO Desktop.

Random audits

To help assure quality in your data-collection efforts, you can always have supervisors or other quality-assurance personnel randomly (a) accompany your surveyors and (b) re-visit surveyed individuals to perform back-checks. As an alternative or complement to these manual quality assurance (QA) methods, SurveyCTO offers two random auditing options to allow you to monitor for quality survey administration.

The first auditing option is a random "text audit." For any random proportion of administered surveys (from 1% to 100%), SurveyCTO can save meta-data about survey administration. This includes details on how much time the surveyor spent on each question in the survey form and the sequence with which he or she proceeded through the survey. For each audited survey, a .csv file is saved that contains this information. Once the survey data is exported, these .csv files can be opened and reviewed in Excel. For example:

Field name	Total duration (seconds)	First appeared (seconds into survey)
intronote	3	0
consent	5	3
consented[1]/name	3	8
consented[1]/age	3	10
consented[1]/confirmnote	1	13

Text audits can be configured to also include details about exactly which multiple-choice options were presented to the user in which order, which can be particularly helpful when some choice lists are dynamic, randomized, or translated into multiple languages. In such cases, text audit files contain two additional columns:

Field name	Total duration (seconds)	First appeared (seconds into survey)	Choice values	Choice labels
consent	5	3	1]\|[0	Yes]\|[No

(When choice details are recorded, choice values and labels are listed with "]|[" as the separator, so as not to be confused with commas, spaces, or other characters likely to be included in choice labels.)

If you monitor incoming data with the Data Explorer, you can easily download text-audit data for any submission you view – or even click the hourglass button to see the timing information overlaid on the questions and responses (and, when available, information about the exact choices presented to the user for multiple-choice questions).

Platform limitations

Only SurveyCTO Collect supports random audio audits. It's not supported on the web, as web browsers do not support invisible audio recording. More about platform limitations...

The second auditing option is a random "audio audit." For any random proportion of surveys, SurveyCTO can audio-record some or all of the survey administration. When the data is exported, audio audits will be included in separate audio files; QA personnel in the office can then review these audio recordings.

Random audio audits can be configured to begin at the beginning of a survey, at the beginning of a particular question, a fixed number of seconds into a survey, or a random number of seconds into a survey. They can be configured to record for a fixed duration (in seconds), or to stop at the end of a particular question.

Each second of audio recording adds between 700 and 1,000 bytes to the survey data, so most configurations audit only a random proportion of a random subset of administered surveys. Since surveyors are unaware of when they are being recorded, they cannot behave systematically differently when being audited.

If you monitor incoming data with the Data Explorer, you can easily download and play audio audits for any submission you view.

To add auditing to any survey, add text audit and/or audio audit fields to the survey's form definition, name the fields, then specify configuration parameters in the appearance column. See the help topics for those field types (text audit and audio audit) for more details.

When you export data that includes audit files, those files will be exported with your data, and the URL or local path for each file will be included in the exported data.

Text audit files will be named "TA_UUID.csv", where UUID is the unique ID for the corresponding survey form (also exported in the KEY column).

Audio audit files will be named differently depending on their configuration. For recordings that begin at a particular question, the filename will be "AA_UUID_FIELDNAME.mp4", where UUID is the unique ID and FIELDNAME is the name of the field at which the recording began. For other recordings, the filename will be "AA_UUID_AFTER_#S.mp4" where UUID is the unique ID and # is the number of seconds into the survey before the audio recording began (0 for recordings that started at the beginning of the survey).

See the auditing sample form for a basic example.

Speed limits

Another quality-control tool available in SurveyCTO is "speed limits": you can use the minimum_seconds column in your survey form definition to specify a minimum number of seconds that enumerators should spend on any given field.

If you do specify a minimum_seconds value for a field, then the first time that field is shown for a given survey, SurveyCTO will (invisibly) keep track of how much time the enumerator spends before moving on to another question. If the enumerator spends less than the specified minimum time the first time he or she encounters the field, nothing will happen by default; you have a few choices for how you deal with these "speed limit violations":

Quietly keep track of the number of violations. You can do this by adding a new field to your survey form with speed violations count as its field type. This will be an invisible field that keeps a tally of how many times the enumerator violates the speed limit (attempts to move to the next field before minimum_seconds have passed) when filling out the form. You might track the number of violations and then set up quality checks to warn about individual cases with too many violations or about enumerators or teams with significantly different levels of violations on average.
Quietly keep track of the list of violations. You can do this by adding a new field to your survey form with speed violations list as its field type. This will be an invisible field that keeps a list of all fields for which the enumerator violated the speed limit when filling out the form. Often, if you keep track of the count of violations, you also want to keep track of the list – so that you can better follow up with enumerators or even change the speed limits when appropriate.
Trigger audio audits upon a certain number of violations (only in the mobile app). Here, rather than audio recording randomly (as described above), you would begin audio recording in response to a certain number of speed limit violations – so that your team can hear what was going on when the enumerator was moving so quickly through the survey. Just add a new field to your survey form with speed violations audit as its field type, then put "v=x; d=y" into that field's appearance column to specify the number of violations required to trigger the audit (the x) and the length of the audit in seconds (the y). For example, to invisibly record a two-minute audio clip beginning immediately after the fifth violation, put "v=5; d=120" in the appearance column. When you export your data, all audit recordings will be included in the media subdirectory, and the exported form data will include the path and filename within the audit field itself (for each case where an audit was triggered).
Enforce the speed limit (only in the mobile app). Another option is to prevent the enumerator from moving forward until after the minimum time has elapsed. You can do this by enabling the Enforce minimum times for fields option within Collect's Admin Settings (from the main Collect menu, open the three-dot menu, then choose Admin Settings).

While you can combine the first three options above (to count, list, and audit violations), using the last option to enforce speed limits will effectively preclude you from using the other options: if speed limits are enforced, then you can't have any violations to count, list, or audit. Thus, in most cases it may be more effective to allow violations – but then to monitor them carefully. (And if you monitor incoming data with SurveyCTO's built-in Data Explorer, the submission-details view will use your speed-limit count and list data to visually flag speed-limit violations.)

In combination with random audio audits and other quality checks, speed limits can be an additional tool to help you assure a high level of data quality.

Sensor meta-data

In addition to text and audio audits, speed limits, and the standard meta-data collected while a form is being filled out (like the overall duration of time spent in the form), you can collect additional meta-data from Android device sensors. Used in combination with automated quality checks or other back-end analysis, this meta-data might help you to enhance your monitoring and quality-control efforts. See the help topic on sensor meta-data for more details.

Previous Next

Collecting high-quality data

Survey design

Identifying enumerators

Skipping irrelevant questions

Constraining responses

Designing for workflow

Survey testing

Supervision and monitoring

Field-level supervision

Office-level monitoring

Random audits

Speed limits

Sensor meta-data

Don't have a SurveyCTO account yet?