Towards Making Flowchart Images Machine Interpretable

Shreya Shukla, Prajwal Gatti, Yogesh Kumar, Vikash Yadav, Anand Mishra

Indian Institute of Technology Jodhpur

ICDAR 2023

[Paper] [Code] [Data]


Our work aims to convert flowcharts in images to executable computer progams.

     

Abstract

Computer programming textbooks and software documentations often contain flowcharts to illustrate the flow of an algorithm or procedure. Modern OCR engines often tag these flowcharts as graphics and ignore them in further processing. In this paper, we work towards making flowchart images machine-interpretable by converting them to executable Python codes. To this end, inspired by the recent success in natural language to code generation literature, we present a novel transformer-based framework, namely FloCo-T5. Our model is well-suited for this task, as it can effectively learn semantics, structure, and patterns of programming languages, which it leverages to generate syntactically correct code. We also used a task-specific pre-training objective to pre-train FloCo-T5 using a large number of logic-preserving augmented code samples. Further, to perform a rigorous study of this problem, we introduce the FloCo dataset that contains 11,884 flowchart images and their corresponding Python codes. Our experiments show promising results, and FloCo-T5 clearly outperforms related competitive baselines on code generation metrics. We will make our dataset and implementation publicly available.

     

Highlights

  • Studied the flowchart-to-code task that aims to convert flowcharts in images to executable computer programs.
  • Introduced the FloCo dataset containing 11.8K flowchart images and corresponding Python codes.
  • Proposed a novel transformer-based method – FloCo-T5 to address this challenge, and also explored various pre-training, data-augmentation and flowchart-encoding methods.
     

FloCo Dataset

We introduce a new large-scale dataset called "FloCo" for Flowchart images to Python Codes conversion. It contains 11,884 paired flowchart-code samples. The dataset can be downloaded here: Google Drive link. Please refer to the paper for more details regarding statistics and dataset construction.

Bibtex

Please cite our work as follows:

@inproceedings{shukla2023floco,
  author    = "Shukla, Shreya and 
              Gatti, Prajwal and 
              Kumar, Yogesh and
              Yadav, Vikash and
              Mishra, Anand",
  title     = "Towards Making Flowchart Images Machine Interpretable",
  booktitle = "ICDAR",
  year      = "2023",
}

Acknowledgements

This work was partly supported by MeITY, Govt. of India (Project number: S/MeitY/AM/20210114). Yogesh Kumar is supported by a UGC fellowship.