2025
- When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
Abhirama Subramanyam Penamakuri*, Navlika Singh*, Piyush Arora*, Anand Mishra. (*: equal contribution)
EMNLP 2025. (NEW)
[Paper]
[Code]
- Aligning Moments in Time using Video Queries
Yogesh Kumar*, Uday Agarwal*, Manish Gupta, Anand Mishra. (*: equal contribution)
ICCV 2025. (NEW)
[Paper]
[Code]
- SynSlideGen : AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
Suyash Maniyar, Vishvesh Trivedi, Ajoy Mondal, Anand Mishra, C.V. Jawahar
ICDAR 2025 (Oral). (NEW)
[Paper]
[Project Page]
- Audiopedia: Audio QA with Knowledge,
Abhirama Subramanyam Penamakuri*, Kiran Chhatre*, Akshat Jain.
ICASSP 2025. (NEW)
[Paper]
[Project Page]
[Data]
- PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures,
Shreya Shukla*, Nakul Sharma*, Manish Gupta, Anand Mishra.
AAAI 2025. (NEW)
[Paper]
[Project Page]
2024
- Chapter-Based Video Moment Retrieval using Natural Language Queries,
Uday Agarwal*, Yogesh Kumar*, Abu Shahid*, Prajwal Gatti, Manish Gupta, Anand Mishra.
ICVGIP 2024.
[Paper]
[Code]
[Data]
- Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant,
Abhirama Subramanyam Penamakuri, Anand Mishra.
EMNLP 2024.
[Paper]
[Project Page]
- Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation,
Shreyas Vaidya*, Arvind Kumar Sharma*, Prajwal Gatti, Anand Mishra. (*: equal contribution)
ICPR 2024.
[Paper]
[Project Page]
[Code]
- Sketch-guided Image Inpainting with Partial Discrete Diffusion Process,
Nakul Sharma, Aditay Tripathi, Anirban Chakraborty, Anand Mishra
CVPR Workshop 2024.
[Paper]
[Code]
- QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos,
Yogesh Kumar, Saswat Mallick, Anand Mishra, Sowmya Rasipuram, Anutosh Maitra, Roshni Ramnani
AAAI 2024.
[Paper]
[Code]
- Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions,
Prajwal Gatti, Kshitij Parikh, Dhriti Paul, Manish Gupta, Anand Mishra.
AAAI 2024.
[Paper]
[Code]
2023
- Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering,
Abhirama Subramanyam Penamakuri, Anand Mishra, Manish Gupta, Mithun Das Gupta,
IJCAI 2023.
[Paper]
[Project Page]
[Code]
- Towards Making Flowchart Images Machine Interpretable,
Shreya Shukla, Prajwal Gatti, Yogesh Kumar, Vikash Yadav, Anand Mishra,
ICDAR 2023.
[Paper]
[Project Page]
[Code]
- Few-Shot Referring Relationships in Videos,
Yogesh Kumar, Anand Mishra,
CVPR 2023.
[Paper]
[Project Page]
[Code]
2022
- Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification,
Nakul Sharma, Abhirama Subramanyam Penamakuri, Anand Mishra,
ICVGIP 2022.
[Paper]
[Project Page]
[Code]
- VISTOT: Vision-Augmented Table-to-Text Generation,
Prajwal Gatti, Anand Mishra, Manish Gupta, Mithun Das Gupta,
EMNLP 2022.
[Paper]
[Project Page]
[Code]
- COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani
AACL-IJCNLP 2022.
[Paper]
[Project Page]
[Code]
2021
- Few-shot Visual Relationship Co-localization,
Revant Teotia*, Vaibhav Mishra*, Mayank Maheshwari*, Anand Mishra,
ICCV 2021.
[Paper]
[Project Page]
[Code]
(*: equal contribution)
- Look, Attend and Ask: Learning to Ask Questions by Reading Text in Images,
Soumya Jahagirdar, Shankar Gangisetty, Anand Mishra
ICDAR 2021.
[Paper]
2020
2019
- From Strings to Things: Knowledge-enabled VQA model that can read and reason,
Ajeet Kumar Singh, Anand Mishra, Shashank Shekhar, and Anirban Chakraborty
ICCV 2019 (oral).
[Paper]
[bibtex]
[Project page]