Abstract

This study proposes a Human-AI interaction retrieval framework that combines natural language descriptions with 3D structural forms to retrieve the 3D fold forms as the example, due to its rich geometric and mathematical properties. We constructed a database covering a variety of 3D fold forms, with style tags and body/garment position tags to assist training. The framework adopts a contrastive learning strategy to process text input through the pre- trained CLIP (Contrastive Language-Image Pre-training) text encoder, and we present a geometric encoder to extract vertex information and point cloud data of 3D fold forms to obtain geometric feature embedding. Then the text features and geometric features are mapped into a joint embedding space, trained the cross-modal alignment through InfoNCE loss. After training, the framework uses FAISS to construct a similarity index of geometric vectors, allowing users to use descriptive language to query theclosest 3D fold forms to the semantic distance in real time. Experiments show that the framework has a satisfactory retrieval accuracy and can retrieve geometrically matching fold forms using only semantic descriptions. This study highlights the potential of AI in connecting design semantics with geometric structures, and provides intelligent tools for utilizing AI to assist the design process.

Keywords

Artificialintelligence; Multimodal; Human-AIinteraction; Retrievalsystem

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Conference Track

Track 4 - Human-Centered AI

Share

COinS
 
Dec 2nd, 9:00 AM Dec 5th, 5:00 PM

CLIP the Form: A Human-AI Interaction Framework for Retrieving 3D Structural Forms from Textual Prompts

This study proposes a Human-AI interaction retrieval framework that combines natural language descriptions with 3D structural forms to retrieve the 3D fold forms as the example, due to its rich geometric and mathematical properties. We constructed a database covering a variety of 3D fold forms, with style tags and body/garment position tags to assist training. The framework adopts a contrastive learning strategy to process text input through the pre- trained CLIP (Contrastive Language-Image Pre-training) text encoder, and we present a geometric encoder to extract vertex information and point cloud data of 3D fold forms to obtain geometric feature embedding. Then the text features and geometric features are mapped into a joint embedding space, trained the cross-modal alignment through InfoNCE loss. After training, the framework uses FAISS to construct a similarity index of geometric vectors, allowing users to use descriptive language to query theclosest 3D fold forms to the semantic distance in real time. Experiments show that the framework has a satisfactory retrieval accuracy and can retrieve geometrically matching fold forms using only semantic descriptions. This study highlights the potential of AI in connecting design semantics with geometric structures, and provides intelligent tools for utilizing AI to assist the design process.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.