The Vision, Language, and Learning Group (VL2G) at the Indian Institute of Technology Jodhpur is a group of researchers and students led by Anand Mishra. The group addresses fundamental vision and language tasks and their applications to socially relevant problems. Currently, the group is primarily focusing on document intelligence, massively multilingual visual text understanding, and fine-grained video understanding, and their applications in various domains, including but not limited to education and assistive technologies.