Abstract

This paper investigates how interaction modalities, specifically text-based versus speech-based interfaces, influence the experience and outcomes of academic co-writing with generative AI (GenAI). Drawing on distributed and situated cognition and human-AI creativity studies, I argue that modality is a structuring force in how ideas emerge, evolve, and stabilize in academic writing. Text-based interaction supports branching exploration, comparative revision, and temporal layering, enabling asynchronous control and persistent visual memory. In contrast, speech-based interaction facilitates associative thinking and spontaneous ideation, offering immediacy and flow. However, it can limit memory offloading, structural manipulation, and cross-comparison. I synthesize findings from related work in HCI, including multimodal interaction and mixed-initiative systems, to propose design implications for GenAI tools. These include supporting multi-threaded interfaces, prompt orchestration, hybrid modality integration, and friction-aware scaffolding. The paper reframes modality as a central design dimension in AI-supported knowledge work, contributing to ongoing conversations in HCI and creativity research.

Keywords

Interaction modality; Generative artificial intelligence; Writing; Creativity-support tools; Interaction design; Human-computer interaction

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Conference Track

Track 4 - Human-Centered AI

Share

COinS
 
Dec 2nd, 9:00 AM Dec 5th, 5:00 PM

How Interaction Modalities Influence Academic Co-Writing with Generative Artificial Intelligence

This paper investigates how interaction modalities, specifically text-based versus speech-based interfaces, influence the experience and outcomes of academic co-writing with generative AI (GenAI). Drawing on distributed and situated cognition and human-AI creativity studies, I argue that modality is a structuring force in how ideas emerge, evolve, and stabilize in academic writing. Text-based interaction supports branching exploration, comparative revision, and temporal layering, enabling asynchronous control and persistent visual memory. In contrast, speech-based interaction facilitates associative thinking and spontaneous ideation, offering immediacy and flow. However, it can limit memory offloading, structural manipulation, and cross-comparison. I synthesize findings from related work in HCI, including multimodal interaction and mixed-initiative systems, to propose design implications for GenAI tools. These include supporting multi-threaded interfaces, prompt orchestration, hybrid modality integration, and friction-aware scaffolding. The paper reframes modality as a central design dimension in AI-supported knowledge work, contributing to ongoing conversations in HCI and creativity research.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.