DisciplineGen-1M: A Large-Scale Dataset for Multidisciplinary Visual Generation and Editing

Paper Code Dataset

Zhaokai Wang^1,*,‡, Mingxin Liu^1,*, Zirun Zhu^1,*, Ziqian Fan^2,*, Yiguo He^1,*, Mohan Zhang³, Leyao Gu¹, Xiangyu Zhao¹, Ning Liao¹, Shaofeng Zhang⁴, Xuanhe Zhou¹, Zhihang Zhong¹, Junchi Yan¹, Xue Yang^1,†

¹ Shanghai Jiao Tong University ² South China University of Technology ³ Xiamen University
⁴ University of Science and Technology of China

^* Equal Contribution ^‡ Project Lead ^† Corresponding Author

Abstract

Recent image generation and editing models can produce visually appealing natural images, yet they remain unreliable when the target image is a knowledge-intensive diagram whose correctness depends on disciplinary concepts, symbolic structure, and precise spatial relations. We introduce DisciplineGen-1M, a million-scale multidisciplinary dataset that supports text-to-image generation and image editing. It contains 1.2M samples spanning mathematics, physics, chemistry, biology, geography, computer science, economics, history, music, and sports. To construct the dataset, we design a scalable framework that combines vector-graphics rendering, OCR-based editing, curated programmatic synthesis, and large-scale text-to-image filtering. These pipelines produce captions, editing instructions, structured annotations, and paired images with controllable semantic differences. Building on DisciplineGen-1M, we further introduce a discipline-informed reasoning-generation model for both text-to-image generation and image editing. Experiments on discipline-related benchmarks, GenExam and GRADE, show substantial improvements over open-source baselines, while evaluations on general reasoning-informed benchmarks, WISE and RISE, further indicate broader transfer. The results suggest that large-scale structured academic visual data is a key ingredient for moving image generation from aesthetic plausibility toward verifiable knowledge-grounded visual creation. We will publicly release our dataset, model, and source code of the data curation pipeline to ensure reproducibility and benefit future research.

Construction Framework

Overview of the DisciplineGen-1M construction framework. We combine four complementary methods to produce T2I and editing data: structured rendering with SVG/TikZ, OCR-based editing, large-scale T2I filtering, and specialized programmatic synthesis.

III

Dataset Examples

Representative examples from DisciplineGen-1M. The multidisciplinary data span over ten subjects and their fine-grained subdomains.

Dataset Statistics

Statistics of DisciplineGen-1M. The dataset contains long and information-dense prompts, diverse subject coverage, multiple image categories, and varied resolutions and aspect ratios.

Qualitative Results

Qualitative examples generated by our model and baselines on GenExam

Qualitative examples of images generated by our model and baselines on GenExam.

VII

Citation

@article{DisciplineGen, title={DisciplineGen-1M: A Large-Scale Dataset for Multidisciplinary Visual Generation and Editing}, author = {Wang, Zhaokai and Liu, Mingxin and Zhu, Zirun and Fan, Ziqian and He, Yiguo and Zhang, Mohan and Gu, Leyao and Zhao, Xiangyu and Liao, Ning and Zhang, Shaofeng and Zhou, Xuanhe and Zhong, Zhihang and Yan, Junchi and Yang, Xue}, journal={arXiv preprint arXiv:2607.02290}, year={2026} }