OVQA is the first multimodal dataset specifically designed for visual question answering (VQA), visual question elicitation (VQE), and multimodal research in the low-resource Odia language. It consists of 6,149 English-Odia question-answer pairs, each linked to unique images from the Visual Genome dataset, totaling 27,809 parallel sentences. The dataset ensures a semantic match between questions and corresponding visual content. Baseline experiments on VQA and VQE tasks demonstrate its potential. OVQA is a valuable resource for advancing multimodal research in Odia and can be extended to other low-resource languages.
Statistics of the OVQA Dataset
Item | Count |
Number of Images | 6,149 |
Number of Questions | 27,809 |
Number of Answers | 27,809 |
Number of Wh-Questions | 26,939 |
Number of Counting Questions | 70 |
Others | 800 |
The OVQA dataset is available at Lindat: http://hdl.handle.net/11234/1-5820.
Additionally, the OdiaVQA dataset, prepared for multimodal LLM training in an instruction set format, is available at Hugging Face: https://huggingface.co/datasets/odiagenmllm/odia_vqa_en_odi_set.
The work on this project was supported by the grant CZ.02.01.01/00/23\_020/0008518 of the Ministry of Education of the Czech Republic.
@inproceedings{parida2025ovqa, title = {{OVQA: A Dataset for Visual Question Answering and Multimodal Research in Odia Language}}, author = {Parida, Shantipriya and Sahoo, Shashikanta and Sekhar, Sambit and Sahoo, Kalyanamalini and Kotwal, Ketan and Khosla, Sonal and Dash, Satya Ranjan and Bose, Aneesh and Kohli, Guneet Singh and Lenka, Smruti Smita and Bojar, Ondřej}, year = {2025}, note = {Accepted at the IndoNLP Workshop at COLING 2025} }