Abstract

Feature steering enables users to strengthen or weaken concepts (features) that an LLM has already learned in real Ime. It has emerged as an alterna Ive to prompIng, allowing conInuous rather than discrete control over model behavior. However, given the technical knowledge required to extract and then steer on features, this process is mostly limited to technical researchers. We seek to build a graphical user interface that makes steering accessible to the layperson. ExisIng interfaces prioriIze flexibility over intuiIon, overwhelming users with the sheer number of opIons for what to prompt or what features to steer on. We conducted a user-centered design process guided by two phases of user research to develop our GUI. Through user tesIng, we found that the most valuable use case for feature steering for the layperson is creaIng model personas for specific use cases. We developed our interface with this use case in mind, and engaged in formaIve and summaIve tesIng to evaluate design decisions related to represen Ing features, building steering intuiIon, and reducing the blank slate problem. Our interface, built on Goodfire's Ember API for LLaMA 3.1 8b, simplifies steering controls and provides clear response comparisons to help users build intuiIon about how steering affects outputs. This work demonstrates how interface design can make powerful but complex interpret ability tools more accessible, allowing everyday users to meaningfully shape model behavior through steering. Code is available at h[ps://github.com/acyhuang/steering-interface.

Keywords

Feature steering; Human-centered AI; Interface design; Large language models

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Conference Track

Track 4 - Human-Centered AI

Share

COinS
 
Dec 2nd, 9:00 AM Dec 5th, 5:00 PM

Designing Intuitive Interfaces for Feature Steering of LLMs

Feature steering enables users to strengthen or weaken concepts (features) that an LLM has already learned in real Ime. It has emerged as an alterna Ive to prompIng, allowing conInuous rather than discrete control over model behavior. However, given the technical knowledge required to extract and then steer on features, this process is mostly limited to technical researchers. We seek to build a graphical user interface that makes steering accessible to the layperson. ExisIng interfaces prioriIze flexibility over intuiIon, overwhelming users with the sheer number of opIons for what to prompt or what features to steer on. We conducted a user-centered design process guided by two phases of user research to develop our GUI. Through user tesIng, we found that the most valuable use case for feature steering for the layperson is creaIng model personas for specific use cases. We developed our interface with this use case in mind, and engaged in formaIve and summaIve tesIng to evaluate design decisions related to represen Ing features, building steering intuiIon, and reducing the blank slate problem. Our interface, built on Goodfire's Ember API for LLaMA 3.1 8b, simplifies steering controls and provides clear response comparisons to help users build intuiIon about how steering affects outputs. This work demonstrates how interface design can make powerful but complex interpret ability tools more accessible, allowing everyday users to meaningfully shape model behavior through steering. Code is available at h[ps://github.com/acyhuang/steering-interface.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.