ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

1HKUST(GZ), 2HKUST, 3Bytedance
*Indicates Equal Contribution
Indicates Corresponding Author
arXiv Code Demo

Abstract

With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems.

Text-to-Image Generation

Reasoning Generation

Image Editing

Video Generation

Pipeline Overview

Overview of ComfyMind pipeline. Given a user instruction, the system first parses the task and delegates it to Planning Agent. The Agent incrementally explores a semantic search tree, where each node proposes a candidate workflow and receives local feedback based on execution results.

BibTeX

@misc{guo2025comfymindgeneralpurposegenerationtreebased,
      title={ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback}, 
      author={Litao Guo and Xinli Xu and Luozhou Wang and Jiantao Lin and Jinsong Zhou and Zixin Zhang and Bolan Su and Ying-Cong Chen},
      year={2025},
      eprint={2505.17908},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.17908}, 
}