Shehan Munasinghe

I am a second-year Master’s student at MBZUAI, UAE, currently pursuing the MSc in Computer Vision, advised by Dr Salman Khan and Professor Fahad Khan at the the Intelligent Visual Analytics Lab (IVAL).

I completed my undergrad at the Department of Electronic and Telecommunication Engineering, University of Moratuwa, Sri Lanka. There I worked on dynamic object detection, tracking and trajectory prediction for autonomous driving, advised by Dr Peshala Jayasekara and Dr Ranga Rodrigo.

My research interests include vision-language modeling, video understanding and large multimodal models.

Email / CV / LinkedIn / Google Scholar / Twitter / GitHub

Publications

	VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric Xing, Fahad S. Khan, Salman Khan CVPR, 2025 Paper / Code / Project Page Presents a video large multimodal model, capable of pixel-level visual grounding, featuring an end-to-end alignment mechanism. Also introduces a grounded video conversation dataset curated using a semiautomatic annotation pipeline.
	PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Mubarak Shah, Fahad S. Khan Arxiv-preprint, 2023 Paper / Code / Project Page Extends recent advances in images-based LLMs to video understanding by incorporating audio transcripts to support video context understanding, while introducing a baseline framework and benchmark for conversation-based spatial grounding.
	Class-Aware Attention for Multimodal Trajectory Prediction Bimsara Pathiraja, Shehan Munasinghe, Malshan Ranawella, Maleesha De Silva, Ranga Rodrigo, Peshala Jayasekara Arxiv, 2022 Paper Presents a novel model architecture for multimodal trajectory prediction in autonomous driving, that takes the physical properties of the target and surrounding vehicles into account through a weighted attention module. Achieved the highest results out of the models which use rasterized maps to encode environment information in the NuScenes benchmark.
	A Novel Transfer Learning-Based Approach for Screening Pre-existing Heart Diseases Using Synchronized ECG Signals and Heart Sounds Ramith Hettiarachchi, Udith Haputhanthri, Kithmini Herath, Hasindu Kariyawasam, Shehan Munasinghe, Kithmin Wickramasinghe, Duminda Samarasinghe, A. De Silva, and Chamira Edussooriya IEEE ISCAS, 2021 Paper Introduces a novel Dual-Convolutional Neural Network based approach which uses transfer learning to tackle the problem of having limited amounts of simultaneous PCG and ECG data that is publicly available.

Other Projects

Multi-Sensor Based Dynamic Object Detection, Tracking & Trajectory Prediction for Self-Driving

August 2021 - August 2021

Worked on developing machine learning models for detection, tracking and trajectory prediction of dynamic agents in autonomous-driving, and integrating them with Robot Operating System (ROS).

Education

MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE

MSc - Computer Vision
August 2023 - Present

University of Moratuwa, Sri Lanka

BSc(Hons) Eng. - Electronic and Telecommunication Engineering
August 2017 - June 2022

GPA : 3.96/4.2 (First Class)
Dean’s List : Semester 1,2,3,4,6,7,8

Experience

MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE

Research Assistant
July 2022 - March 2023

Conducted research on explainability and visualization methods for multimodal and vision transformer models.

Creative Software, Sri Lanka

Intern (Machine Learning)
October 2020 - March 2021

Implemented solutions for semantic segmentation and object detection in corrosion and industrial object identification. Pioneered the creation of synthetic data through the Unity 3D game engine for enhanced model training.

Design from Jon Barron and Leo Keselman

Shehan Munasinghe

Publications

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

Class-Aware Attention for Multimodal Trajectory Prediction

A Novel Transfer Learning-Based Approach for Screening Pre-existing Heart Diseases Using Synchronized ECG Signals and Heart Sounds

Other Projects

Multi-Sensor Based Dynamic Object Detection, Tracking & Trajectory Prediction for Self-Driving

Education

MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE

University of Moratuwa, Sri Lanka

Experience

MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE

Creative Software, Sri Lanka