Professor Uichin Lee’s research team from the School of Computing at KAIST have conducted a study to explore data contributors’ perceived benefits and risks in participating in open dataset collection projects that involve mobile and wearable devices for emotional intelligence research.
This research has been published at Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (Issue 2, July 2022), one of the most renowned conferences in ubiquitous computing. The work has been invited to present at the ACM Ubicomp conference that will be held on 11–15th September 2022 in Atlanta, USA and Cambridge, UK.
It’s becoming more and more popular for us to use mobile and wearable sensors to continuously and passively collect sensor data from our daily lives. The data gathered are often used to investigate how users’ life patterns are related to mental and physical health issues (known as digital phenotyping). The benefits of collecting large amounts of data are numerous. The relationship between people’s routine actions and user states of interest can be better understood with the help of collected data. For instance, we may create machine learning models for detecting and understanding a person’s mental state and designing health intervention services using the dataset gathered from many sensors (e.g., stress detection and just-in-time intervention).
Such open datasets with mobile and wearable devices are essential for innovating digital health and emotion intelligence technologies. Open datasets collected in the wild help to build and test high performance machine learning models that can be personalized in diverse user contexts. Interestingly, machine learning models and their applications often advocate “the more the better” which necessitates a big volume of data from more users. However, so far, only a handful of studies have released mobile datasets since it’s challenging to collect a large-scale, longitudinal dataset in the wild setting.
One major issue hindering the publication of open datasets is privacy risks. The privacy of participants and maybe technology end users could be put to danger since acquired sensor readings could reveal participants’ identities. Even models developed from the collected dataset may have ethical issues; for instance, emotion detection algorithms may be inaccurate or biased, leading to psychological and physical harm.
To mitigate potential privacy concerns and facilitate open dataset practice, identifying participants’ perceived benefits for data contributions and perceived risks (including privacy concerns) is imperative so that we can boost user participation and improve cooperation throughout the data gathering campaigns. Since these perceptions ultimately affect participants’ attitudes and behaviors about the entire data collection and its public distribution,
Source: Digital biomarkers for Alzheimer’s disease: the mobile/ wearable devices opportunity (NPJ Digit Med, Kourtis et al., 2019)
To understand participants’ general attitudes and privacy concerns in order to create a mobile sensor dataset for the purpose of affective computing research, the team carried out an in-the-wild inquiry (e.g., mood detection). The team conducted a four-week study involving 100 college students who took part in a project to collect open datasets using a wide range of sensor data from wearable and mobile devices.
Prior to study, participants were informed to install a research team-made sensing app on their smartphones for mobile data collection and were also distributed with the Fitbit HR Inspire and Polar H10 devices. Since our sensing platform runs on Android phones with an operating system version 7.0.0 (Nougat) or higher, participants whose Android phones below this version were screened in the initial stage. Participants were allowed to check each collecting sensor in the configuration screen on the platform and were asked to respond to ESM questionnaires during data collection. After installation, participants were asked to lead their daily lives as usual, with a Polar chestband and Fitbit for a month. In terms of the Polar chestband, participants reported that it was uncomfortable to wear the band 24/7 for such a long period, so we shortened the period to one week.
<Data collection apparatus and sensing app platform>
<Types of data collected in the study>
After a month-long data collection, participants were asked to join an interview to share their experiences during the data collection and privacy concerns regarding the multimodal sensor data collection. For the interviewees, the study intuitively selected 15 participants who asked for selective disclosure (i.e. partly excluding personally sensitive data). Remaining 85 participants out of 100 participants agreed to the full release of the data gathered. Thus, for comparative analysis between these two groups in terms of data sensitivity and perceived privacy concerns, we later recruited 11 additional participants who were okay with disclosing complete versions of their collected data.
<Participants’ Perceived pre-post sensitivity of each sensor data (1: Highly Negative ∼ 7: Highly Positive, and N/A or don’t know)>
Although the majority of participants are generally unconcerned about any privacy problems and showed low levels of privacy sensitivity to each sensor’s data collection, our pre and post survey on data sensitivity shows that participants were particularly sensitive to certain data types in smartphones. They exhibited sensor-specific “intuitive worries” over privacy due to the possibility of the disclosure of personal traits and social behaviors which can be derived from call/message data, app usage data, and mobility traces. Their concerns exacerbated as they combined each of these sensor data types with seemingly innocuous wearable sensor data. For example, one participant showed a high level of concern in his routine mobility traces (e.g., visiting girlfriends’ house regularly) being collected with his heart rate data, concerning whether his intimate relationship could be revealed. Such responses indicate that although some data types may look benign and unrelatable to threaten one’s privacy, it could turn out to be a great threat when combined with other types of sensor data which is highly and directly related to one’s personal life.
As mentioned, one reason behind such moderate or low levels of privacy concerns lie in financial compensation the participants received during the study. During the interview, participants were prone to outweigh the potential benefits over their privacy concerns, mentioning that their data is nothing special or less valuable than instant prize — the money. The team posit that such tendencies in human behavior can be attributed to privacy-utility calculus, which is a well-known privacy behavior theory that claims that people tend to consider their worth of privacy at the expense of a reward given to them.
The leading author, Hyunsoo Lee said, “Our work is the first to conduct a large-scale mobile data collection study in-the-wild that aims to explore diverse shaping factors of participants’ attitudes and risks in open dataset collection.” She added, “There should be further research on exploring users’ concerns in everyday contexts and designing novel tools for mitigating privacy risks.”
Lee, Hyunsoo, Soowon Kang, and Uichin Lee. “Understanding Privacy Risks and Perceived Benefits in Open Dataset Collection for Mobile Affective Computing.” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6.2 (2022): 1–26.
[[UbiComp’22] Privacy_Presentation]]