Understanding User Preferences of Voice Assistant Answer Structures for Personal Health Data Queries

Bradley Rey , Yumiko Sakamoto, Jaisie Sin, Pourang Irani

Published in Conversational User Interfaces, 2024

Abstract

Voice assistants (VAs) are becoming ubiquitous within daily life, residing in homes, personal smart-devices, vehicles, and many other technologies. Designed for seamless natural language interaction, VAs empower users to ask questions and execute tasks without relying on graphical or tactile interfaces. A promising avenue for VAs is to allow people to ask personal health data questions. However, this functionality is currently not widely available and answer preferences to such questions have not been studied. We implemented a pseudo-VA that handles personal health data questions, answering in three unique styles: minimal, keyword, and full sentence. In two online user studies, 82 unique participants interacted with our VA, asking varying personal health data questions and ranking answer structures given. Our results show a strong preference for full sentence responses throughout. We find that even though full sentence answers have the longest mean response time, they are still found to provide high quality and optimal behaviour, while also being comprehensible and efficient. Furthermore, participants reported that for personal health question and answering, VAs should provide technical and efficient interactions rather than being social.

In Summary

Our results come at a contrast to previous work which explore answer structures for general voice assistant use (e.g., weather, calendar, smart home commands/queries). Our results suggest that full sentence answers offer less ambiguity, and despite their longer response time, full sentence answers were perceived as equally efficient. Along with other findings, such as a desire for voice assistants to be efficient and technical rather than social entities (e.g., as a fitness coach), we provide design implications in line with these results that offer insight into future voice assistant systems handling personal health data queries.

Methodology

We conducted two online user studies. We built a browser-based pseudo-voice assistant, which we embedded within a Qualtrics survey, using Javascript and the Web-Speech API. During interaction with our pseudo-voice assistant, the participant’s question would be recognized and then processed by checking for keywords (and varying synonyms) specific to each question. Only when our pseudo-voice assistant recognized all required keywords, was the appropriate answer vocalized. Upon a successful interaction, participants would answer questions about the quality and experience of the interaction/response.

Our pseudo-voice assistant can be demoed here using Google Chrome.

Key Findings

  • Full Sentence answers were preferred for their personal health data queries.
    • We suggest this is due to reduced ambiguity in some answer types. For example, a general question like “What is on my calendar tomorrow?” can be answered minimally, “Lunch with Tyler, 1PM. Games with Danica, 7PM” without confusion, as the content implies calendar events. In contrast, a Minimal response to “What is my average daily step count in the last week?”, e.g., “10,320 steps” can be unclear, as it may not confirm the specific time range or metric. This kind of ambiguity, already noted as a barrier in clinical use of personal health data, also affects everyday users.
  • Despite being longer with respect to response time, Full Sentences were perceived as equally efficient as Minimal and Keyword answers. We contemplate several reasons for this observation:
  • There may be room to augment Full Sentence answer content without sacrificing perceived efficiency. This is due to the fact that we observed that participants feel Full Sentence answers were equally as fast as Keyword and Minimal answers. We contemplate several reasons for this observation:
    • Contextual information provided within a Full Sentence may contribute to a more comprehensive understanding of the answer, thereby reducing the need for follow-up questions or clarification.
    • Context in which the answer is given may influence its perceived efficiency. Our study was conducted in non-distracting environments, and thus these results may change in real-world settings.
  • While human likeness was preferred, participants were not inclined to have their voice assistant act as a fitness coach, rather they strongly preferred a more technical and objective relationship with the voice assistant for personal health data queries.
  • In our study, despite using an adapted version of the the UEQ+ survey to measure VA user experience, we observed conflicting semantic differentials for Efficiency and Comprehensibility.
    • The lack of correlation found suggests that the semantic differentials may not be effectively capturing the same intended user experience factor, and thus should be further explored to refine the survey.

In More Detail

Please review our full paper (linked above) for an abstract, study details, methodologies, and complete results.