Introduction to advanced dataset usage
SurveyCTO datasets help you to organize and manage your data. Broadly speaking, you can use datasets to:
Provide pre-loaded data as an input into one or more survey forms.
Publish data from one or more survey forms into a dataset.
Manage your enumerator list for enumerator identification or your cases list for case management.
Offline dataset publishing is a new feature that enables workflows like these.
The first two functions can be combined so that data that's been submitted (to the server, requiring an Internet connection) from one or more survey forms can be used as pre-loaded data for one or more other survey forms. Optionally, data that's collected can be made immediately available as pre-load data without any server interaction (without a connection to the Internet), for more advanced offline data collection workflows. As you can imagine, a wide range of applications for this exist.
Datasets are similar to spreadsheets. They are organized around rows (also known as "records"), on the one hand, and columns (also known as "fields") on the other. SurveyCTO datasets (also known as "server datasets") are constructed and maintained on the server. They can be used to pre-load data into survey forms, and form submission data can publish into these datasets (however: to be publishable, form fields can't be end-to-end encrypted — learn more about encrypted forms. Server datasets can also be monitored for data quality or published to the cloud so that incoming data streams out to, e.g., some kind of outside visualization or dashboard.
For a small working example, see the "dataset basics" sample form.
Types of server dataset
SurveyCTO currently supports three types of server dataset:
- Datasets for data
- Cases datasets
- Enumerator datasets
All datasets are similar in that they are collections of rows and columns, can be attached to forms, and can be published into by forms. Enumerator and cases datasets are only different in that they each have a specific structure in terms of their columns, have some special settings, and integrate into the broader platform in particular ways. So, for example, the enumerator field type loads enumerator lists from enumerator datasets, and the Manage Cases interface loads case lists from cases datasets. See Managing enumerators and Case management for details.
Server dataset settings
Once created, you can customize your dataset from the Settings option on the dataset under the Design tab. In the case of general purpose server datasets for data, you can choose to set a Unique ID field (pre-configured for specialized dataset types) which is a requirement for the Allow offline updates option to be enabled. If you do set a Unique ID field, you will be required to select it as the Form field to identify unique records when setting up forms to publish data into the dataset (learn more in Publishing form data into server datasets. Further, SurveyCTO will enforce that all values in a Unique ID field are unique and non-blank to help avoid data publishing problems.
Case management and enumerator datasets have additional settings which you can read about in those topics.
Manual use of server datasets
The Your forms and datasets section of the Design tab has all of the options you need to manually manage server datasets for the purposes of attaching pre-loaded data to your survey forms. See Pre-loading data into a form for a full discussion. You can upload new or revised data; download, rename, or purge existing data; and attach forms to datasets so that their data will be available as pre-loaded data.
When you purge a server dataset's existing data, all rows of data will be deleted – but the existing columns in that dataset will remain (albeit currently empty). If you want to completely eliminate old columns that are no longer desired, you will need to delete the dataset entirely, then re-create it.
And when you upload data for a dataset, you always upload a .csv or an .xlsx file. Please note the following:
The first row of your .csv or .xlsx file should include short, unique names for each column. These column names should not themselves include commas or quotes. Any uploaded column names that do not correspond to fields already in the dataset will be added to the dataset as new fields.
If your data contains non-English fonts or special characters, you will need to save your .csv file in Unicode/UTF-8 format. Please note, if you upload an .xlsx file instead of .csv, the character encoding will be converted automatically. If you cannot directly save or export either an .xlsx file or a .csv file in Unicode/UTF-8 format, you can use SurveyCTO Desktop to re-encode it: choose Re-encode .csv from the Offline form tools menu, select your file and the encoding for which its text appears correctly in the preview window, and then click Convert to save the re-encoded .csv file.
When you upload new data for an existing dataset, you can choose whether to:
- Append the new data to the dataset's existing data.
- Merge with existing data.
- Replace all existing data.
Aside from attaching server datasets to survey forms, you can also download or export these datasets for your own back-office or analytical purposes. To download a dataset's current data, you have three options:
- On the server console, either from the Export tab using the Download option, or on the Design tab click on Download and then Download data. You will receive a .csv file with all of the dataset's current data.
- Use SurveyCTO Desktop to export a dataset's current data.
- Programmatically export the contents of a server dataset in a .csv file using the SurveyCTO API.
And finally, you can click Attach to manage the list of forms to which any of your server datasets are attached. An attached dataset's data will be available as pre-loaded data that can be pulled into calculated form fields, used to dynamically populate multiple-choice option lists, or even pre-loaded as default values for user-editable survey fields.
That covers the basics of using manually-managed datasets for attaching pre-loaded data to survey forms. See the following topics in this section for discussions of more advanced dataset techniques.