Experimental: Using sensor meta-data to enhance quality-control efforts

Previous Next

In addition to text and audio audits, speed limits, and the standard meta-data collected while a form is being filled out (like the overall duration of time spent in the form), you can collect additional meta-data from Android device sensors. Used in combination with automated quality checks or other back-end analyses, this meta-data might help you to enhance your monitoring and quality-control efforts.

Click here to watch a webinar about sensor metadata.

Platform limitations

Only SurveyCTO Collect on Android supports sensor meta-data. It's not supported on iOS or the web. More about platform limitations...

SurveyCTO's sensor meta-data support was developed as part of a machine-learning roadmap, as a means to enrich the safe, non-personally-identifiable data available to machine-learning algorithms. Thus, if your team is experimenting with machine-learning technologies, you might be able to make use of some of this sensor meta-data as additional "factors." Even if not, there are more straightforward and immediate ways that you might use this data to enhance your quality-control efforts. You could, for example use automated quality checks to flag submissions that are outliers in terms of certain sensor statistics.

Experimental feature

Sensor meta-data is still considered experimental. This means that research and development is ongoing, both accuracy and compatibility are modest, and the SurveyCTO team is still learning, with the help of users and partners, where and how this meta-data can be most valuable. Please contact us via the Support Center to let us know your experience, or to give us your input on these features.

There are two different types of sensor meta-data available to you:

Sensor statistics. These are individual statistics that summarize sensor data, each providing one statistic per form submission. They are captured by adding sensor_statistic fields to your form (e.g., sensor_statistic mean_light_level).

Sensor streams. These are streams of sensor data, each providing one .csv file per form submission. They are captured by adding sensor_stream fields to your form (e.g., sensor_stream light_level).

Sensor statistics can be easily viewed and utilized in automated quality checks, but sensor streams are more difficult to work with because of the volume of data collected. For every submission, a sensor stream will record a stream of observations (potentially thousands), and store those as an additional .csv file attached to the submission. For this reason, sensor streams are mostly useful to those doing advanced analysis using powerful statistical tools.

However, captured sensor data will depend on your devices, and on the settings in which they are used. So your best bet, when getting started, is to begin by using one or more sensor_stream fields, to see what raw sensor data is produced under which circumstances. Then you can configure appropriate sensor_statistic fields, based on your understanding of the underlying sensor data.

Types of sensor data

Light level: light level measured in lux, as reported by the device's ambient light sensor.

Movement: linear acceleration measured in m/s², as reported by the device's linear acceleration sensor and gyroscope sensor.

Sound level: sound level measured in dB, as calculated from the device microphone.

Sound pitch: sound pitch measured in Hz, as calculated from the device microphone (estimated using the YIN algorithm and a window of 25ms).

Conversation: 1 when conversation seems to be happening vs. 0 when not, as estimated from the sound level, pitch, and/or raw data from the device microphone. This is the most-experimental – but also, potentially, the most-useful – type of sensor meta-data.

Sensor statistics

In order to make sensor meta-data available in a safe and convenient format for monitoring and quality control, SurveyCTO offers a series of non-personally-identifiable summary statistics for each type of sensor. While the underlying streams of sensor data are higher-frequency, all summary statistics are based on a lower-frequency – but more uniform – stream of means. Specifically, SurveyCTO calculates a mean value for each one-second period, and then calculates the overall summary statistic based on those means. For a three-minute interview, for example, the statistic will be calculated based on 180 means, one for each second of collected sensor data.

SurveyCTO includes support for these basic sensor statistics:

sensor_statistic mean_light_level - mean light level measured in lux, as reported by the device's ambient light sensor.

sensor_statistic min_light_level - minimum light level measured in lux, as reported by the device's ambient light sensor.

sensor_statistic max_light_level - maximum light level measured in lux, as reported by the device's ambient light sensor.

sensor_statistic sd_light_level - standard deviation of light level measured in lux, as reported by the device's ambient light sensor.

sensor_statistic mean_movement - mean linear acceleration measured in m/s², as reported by the device's linear acceleration sensor and gyroscope sensor.

sensor_statistic min_movement - minimum linear acceleration measured in m/s², as reported by the device's linear acceleration sensor and gyroscope sensor.

sensor_statistic max_movement - maximum linear acceleration measured in m/s², as reported by the device's linear acceleration sensor and gyroscope sensor.

sensor_statistic sd_movement - standard deviation of linear acceleration measured in m/s², as reported by the device's linear acceleration sensor.

sensor_statistic mean_sound_level - mean sound level measured in dB, as calculated from the device microphone.

sensor_statistic min_sound_level - minimum sound level measured in dB, as calculated from the device microphone.

sensor_statistic max_sound_level - maximum sound level measured in dB, as calculated from the device microphone.

sensor_statistic sd_sound_level - standard deviation of sound level measured in dB, as calculated from the device microphone.

sensor_statistic mean_sound_pitch - mean sound pitch measured in Hz, as calculated from the device microphone.

sensor_statistic min_sound_pitch - minimum sound pitch measured in Hz, as calculated from the device microphone.

sensor_statistic max_sound_pitch - maximum sound pitch measured in Hz, as calculated from the device microphone.

sensor_statistic sd_sound_pitch - standard deviation of sound pitch measured in Hz, as calculated from the device microphone.

And also these inference statistics:

sensor_statistic pct_light_level_between - percentage of time the light level was between x and y, where the appearance column includes "min=x;max=y" to specify the x and y (see this Wikipedia topic for more about lux light levels, but exact ranges will depend on your device and setting; use a sensor_stream field to see what ranges of values are captured for you, before deciding on a specific range to use).

sensor_statistic pct_sound_level_between - percentage of time the sound level was between x and y, where the appearance column includes "min=x;max=y" to specify the x and y (see this webpage for more about decibel sound levels, but exact ranges will depend on your device and setting; use a sensor_stream field to see what ranges of values are captured for you, before deciding on a specific range to use).

sensor_statistic pct_sound_pitch_between - percentage of time the estimated pitch was between x and y, where the appearance column includes "min=x;max=y" to specify the x and y (exact ranges will depend on your device and setting; use a sensor_stream field to see what ranges of values are captured for you, before deciding on a specific range to use).

sensor_statistic pct_movement_between - percentage of time linear acceleration was between x and y, where the appearance column includes "min=x;max=y" to specify the x and y (exact ranges will depend on your device and setting; use a sensor_stream field to see what ranges of values are captured for you, before deciding on a specific range to use).

sensor_statistic pct_quiet - percentage of time quiet. This is the same as using pct_sound_level_between with "max=25" in the appearance.

sensor_statistic pct_still - percentage of time the device was still. This is the same as using pct_movement_between with "max=0.15" in the appearance.

sensor_statistic pct_moving - percentage of time the device was moving. This is the same as using pct_movement_between with "min=1.5" in the appearance.

sensor_statistic pct_conversation - percentage of time conversation seemed to be happening. This statistic is particularly experimental, and its voice activity detection algorithm may or may not work well for you. Before you rely on it too heavily, test it out, and consider recording audio audits so that you can listen for yourself, to verify submissions flagged for, e.g., too little apparent conversation.

Depending on your context, raw sensor statistics like mean_light_level, pct_still, and pct_conversation may not be extremely meaningful in terms of their exact levels – but they still might be useful. For example, you could record some of these statistics and then set up automated quality checks to flag outliers. You could then use a review and correction workflow to examine those flagged submissions more closely. In the end, there is no single best monitoring or quality-control regime for all projects, but SurveyCTO offers a variety of tools that can be useful in different settings.

Sensor streams

To capture a sensor stream, add one or more of the following field types to your form:

sensor_stream light_level - light level measured in lux, as reported by the device's ambient light sensor.

sensor_stream movement - linear acceleration measured in m/s², as reported by the device's linear acceleration sensor.

sensor_stream sound_level - sound level measured in dB, as calculated from the device microphone.

sensor_stream sound_pitch - sound pitch measured in Hz, as calculated from the device microphone (estimated using the YIN algorithm and a window of 25ms).

sensor_stream conversation - 1 when conversation seems to be happening vs. 0 when not, as estimated from the sound level, pitch, and/or raw data from the device microphone. This is the most-experimental – but also, potentially, the most-useful – type of sensor meta-data.

By default, all sensor streams capture data with a period or observation length of 1 second, so you will get roughly one observation recorded every second, while your form is being filled out. To set a different period length, put "period=#" in the field's appearance column, with the "#" replaced with the period length in seconds. If you want to record observations as fast as they come in from the underlying sensors, you can specify "period=0" (and then, e.g., you'll get roughly 25 observations/second from a sound-related sensor, but still a maximum of 1 observation/second for conversation streams).

Regardless of the period length you choose, sensor_stream data will be saved into an attached .csv file, with one file per submission. Each .csv file will then have a single row per observation, recorded in this format:

second	count	mean	min	max	sd	fieldname
Seconds into the form	# observations	Mean sensor value	Minimum sensor value	Maximum sensor value	Standard deviation	Current field name
...

If you overrode the period length to be 0, then each row's mean will be the sensor observation's value, the count will always be 1, the sd will always be 0, and the min and max will be equal to the mean. Otherwise, the count, mean, min, max, and sd will summarize the sensor data that came in during the observation period. The fieldname always has the current field on the screen at the time of observation, if any (when it's blank, it means that the user was somewhere outside the form itself, like on the opening screen or in the navigation menu).

For more information, check out our support article Using sensor meta-data.

Previous Next

Experimental: Using sensor meta-data to enhance quality-control efforts

Types of sensor data

Sensor statistics

Sensor streams

Don't have a SurveyCTO account yet?