Abstract
Large Language Models are increasingly integrated into UX design. However, their effectiveness in meeting visual accessibility requirements is under-explored. This research evaluates ChatGPT and Microsoft Copilot to generate visually accessible interfaces using a Research through Design methodology. First, an accessibility scoring system was created from the Apple, WCAG 2.2, and Microsoft accessibility guidelines. Second, design experiments were conducted using ChatGPT and Copilot, and the outputs were evaluated using the new scoring system. Findings indicate ChatGPT and Copilot can respond effectively to well-structured prompts, but they demonstrate low competence in executing visually accessible interfaces. This research makes two valuable contributions to the field. It accesses the state-of-the-art capabilities of AI-generated design for visual accessibility, proposing a balanced positioning of AI as an assistive tool rather than an autonomous designer; and, it provides a new ‘cross-standard’ scoring system and method for evaluating the visual accessibility of AI-generated outputs.
Keywords
visual accessibility, AI-generated, accessibility compliance, cross-standard scoring system
DOI
https://doi.org/10.21606/drs.2026.1568
Citation
Rao, M., and Watts, J. (2026) Evaluating visual accessibility of AI-generated interfaces, in Simeone, L., Gray, C. M., Verhoeven, A., de Götzen, A., Bakırlıoğlu, Y., Zohar, H., Stead, M., and Buwert, P. (eds.), DRS2026: Edinburgh, 8–12 June, Edinburgh, United Kingdom. https://doi.org/10.21606/drs.2026.1568
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Included in
Evaluating visual accessibility of AI-generated interfaces
Large Language Models are increasingly integrated into UX design. However, their effectiveness in meeting visual accessibility requirements is under-explored. This research evaluates ChatGPT and Microsoft Copilot to generate visually accessible interfaces using a Research through Design methodology. First, an accessibility scoring system was created from the Apple, WCAG 2.2, and Microsoft accessibility guidelines. Second, design experiments were conducted using ChatGPT and Copilot, and the outputs were evaluated using the new scoring system. Findings indicate ChatGPT and Copilot can respond effectively to well-structured prompts, but they demonstrate low competence in executing visually accessible interfaces. This research makes two valuable contributions to the field. It accesses the state-of-the-art capabilities of AI-generated design for visual accessibility, proposing a balanced positioning of AI as an assistive tool rather than an autonomous designer; and, it provides a new ‘cross-standard’ scoring system and method for evaluating the visual accessibility of AI-generated outputs.