SurveyCTO can automatically generate Stata .do files that import, merge, and partially process your exported data. You can use these auto-generated .do files as they are, or you can use them as a starting-point for your own back-end processing code.
To download a Stata .do file template for one of your forms, go to the Design tab of your server console, scroll down to that form in the Your forms and datasets section, and then select Download and Stata .do template. Because you can export your data in different ways, you'll need to say a few things about the format of your exported data: whether it's in long or wide format, and whether it includes group names in the column headers (see the help topic on data export formats for more). You'll also need to choose which language to use for labeling in Stata, if your form has multiple languages. Finally, you'll need to download and unzip the .do file(s).
For most forms, SurveyCTO will output two files: import_FORMID.do and FORMTITLE_corrections.csv (where FORMID and FORMTITLE are replaced by the form's ID and title, respectively). If your form has repeat groups and you selected the "long" export format, then additional .do files will also be output and automatically called from the main import_FORMID.do file.
To try the .do file out, first export your data into .csv format and save it to the same directory as the .do file template. Be sure that the export filenames are the SurveyCTO defaults, so that the .do file will be able to find them okay; if you export from the server console and use your browser to save, it might add a "(2)" or "(3)" on the end, so be sure to catch and remove those suffixes. Once you have the .do file(s) and .csv file(s) in the same place, try running the main import_FORMID.do file.
If you get an error about not being able to find the .csv file, the problem could be the "working directory" that Stata is using. If you get such an error, your best bet is probably to add a "cd" command to the top of the .do file, to change the working directory to the one in which you have saved your .do and .csv files (e.g., "cd ~\Files" on OSX or "cd C:\Files" on Windows).
Another possible source of errors concerns variable names: since Stata only allows variable names to be up to 32 characters long, you can run into trouble. If you have very long group and/or field names, the first 32 characters could fail to uniquely identify a variable. If you run into this problem and you're using SurveyCTO Desktop to export your data, you can choose not to export group names in your .csv column headers (see Data export options), then re-generate your Stata templates and re-export your data; that will shorten many field names because they will no longer include enclosing group names.
Even if you do not plan to customize them much, you should familiarize yourself with the Stata code contained in these templates. Most broadly, each template does the following:
- Imports, labels, and formats all incoming data.
- If using long format, organizes data for repeat groups (if any) into separate .dta files (linkable via the key and parent_key variables).
- Merges with any previously-imported data, dropping any duplicates (by default, previously-imported data is respected and not overwritten, but see this help topic about overriding that behavior if you allow un-approving data in your review and correction workflow).
- Applies data corrections, if any.
- Saves the revised Stata dataset.
In more detail, each template:
- Initializes Stata ("clear all", "set mem", etc.). Depending on your memory requirements and version of Stata, you may need to revise this code.
- Initializes filenames and locations in local macros. If you later want to change your .csv or Stata directories, you can update these macro definitions.
- Lists any names of repeat groups in local macros, both as they would appear in .csv filenames and as they would appear in imported .csv file headers.
- Lists any names of text, note, date, and date-time fields in local macros. The fields are listed with the names as they will come into Stata, from the exported .csv headers. SurveyCTO tries to make these lists as accurate as possible, based on the form definitions. However, it is possible that you might need to tweak them.
- Imports the primary incoming .csv file.
- Drops any note fields, since they do not contain data.
- Converts any date and date-time variables from text format into Stata's internal date/time format. That way, they sort and filter properly. Please note that the default code uses the clock() and date() functions to parse incoming dates, and we automatically assume MDY or DMY date ordering based on the regional settings of the computer outputting the template. If your computer's regional settings are different from the computer that exported the .do template, you may need to search for "date" and "clock" in the .do file and adjust the MDY or DMY to match your computer's regional settings. (When all else fails, use a text editor to open the raw .csv file exported by SurveyCTO Desktop and see what format the dates are in. Then make sure the date() and clock() calls specify the correct format.)
- Converts all text fields to text format. By default, this includes "calculate" fields – but you can destring them later if you want, or remove them from the text_fields macro to not convert them at all.
- Labels variables and select_one values. Note that Stata can only label numeric values, so the template will only label select_one values when all possible values are numeric. Note also that variable labels are truncated at 79 characters, but the template also adds a "note" to each field with the full text of the label (as found in the form definition).
- Merges with any previously-imported data, dropping any duplicates. Because the Stata process is designed to run repeatedly – each time importing .csv files that likely contain both old and new data – the import process defaults to never overwriting existing data with incoming .csv data; that way, you can always update or extend existing data in the Stata file without fear of it being overwritten, and the import process will only add new data to the existing data file. However, if you use a review and correction workflow and allow un-approving data, then the default behavior will mean that you potentially miss changes in data that happen after submissions have been approved. To re-import data that was previously imported – and catch potential corrections made after the initial approval – change the overwrite_old_data local macro at the top of the template from 0 to 1. See this help topic for more on advanced correction workflows.
- Saves the updated data file.
- If using long format, runs additional .do files to process any secondary .csv files for repeat-group data, essentially applying all of the above logic to each repeat group (saving each as its own .dta file).
- Outputs the codebook and all variable notes to the Stata console.
Applies corrections, if any. If you wish, you can enter data corrections into the FORMTITLE_corrections.csv file (where FORMTITLE is the title of the form).
SurveyCTO gives you an empty corrections template along with the Stata template, and you can then edit this template and add a row for each correction you would like to make to the data. If you want to use Microsoft Excel to maintain the list of corrections, you should maintain a .xls or .xlsx file in which all cells are set as Text format; otherwise, Excel will do funny things like assume that door number "4/16" is April 16 and encode the value as a very long number instead of "4/16". If you maintain your corrections in .xls or .xlsx format, you can either "Save as" .csv format for the Stata template, or you can update the Stata template to import your .xls or .xlsx file directly.
If there are any corrections in the corrections .csv file, they will be applied in sequence, row by row. Each row should indicate the key of the row to correct, the name of the field to correct (as it appears in the header of .csv exports), the corrected value, and any notes you might wish to maintain in the corrections file (optional).
If there is any error applying a correction, an error message will be output to the Stata console, including the row number of the offending correction. For example, if you entered a correction with a value of "John" for a numeric field, or if you had a typo in a field name, there would be an error applying the correction.
Note that the Stata code for applying corrections is tricky. In essence, each correction is output as Stata code, into a temporary .do file, and the template executes the temporary .do file in order to apply the corrections. Thus, you have auto-generated Stata code outputting and running its own auto-generated Stata code.
- Saves the corrected data file.
The auto-generated Stata templates are mostly meant to get you started on your back-end data-processing. Feel free to revise, extend, and delete whichever parts you don't need or want.
If you use SurveyCTO Desktop to download and export your data, see also the help topic on using Desktop with Stata.