Shehan Munasinghe

I am a second-year Master’s student at MBZUAI, UAE, currently pursuing the MSc in Computer Vision, advised by Dr Salman Khan and Professor Fahad Khan at the the Intelligent Visual Analytics Lab (IVAL).

I completed my undergrad at the Department of Electronic and Telecommunication Engineering, University of Moratuwa, Sri Lanka. There I worked on dynamic object detection, tracking and trajectory prediction for autonomous driving, advised by Dr Peshala Jayasekara and Dr Ranga Rodrigo.

My research interests include vision-language modeling, video understanding and large multimodal models.

Email  /  CV  /  LinkedIn  /  Google Scholar  /  Twitter  /  GitHub

profile photo

Publications

project image

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos


Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric Xing, Fahad S. Khan, Salman Khan
Arxiv-preprint, 2024
Paper / Code / Project Page

Presents a video large multimodal model, capable of pixel-level visual grounding, featuring an end-to-end alignment mechanism. Also introduces a grounded video conversation dataset curated using a semiautomatic annotation pipeline.

project image

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models


Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Mubarak Shah, Fahad S. Khan
Arxiv-preprint, 2023
Paper / Code / Project Page

Extends recent advances in images-based LLMs to video understanding by incorporating audio transcripts to support video context understanding, while introducing a baseline framework and benchmark for conversation-based spatial grounding.

project image

Class-Aware Attention for Multimodal Trajectory Prediction


Bimsara Pathiraja, Shehan Munasinghe, Malshan Ranawella, Maleesha De Silva, Ranga Rodrigo, Peshala Jayasekara
Arxiv, 2022
Paper

Presents a novel model architecture for multimodal trajectory prediction in autonomous driving, that takes the physical properties of the target and surrounding vehicles into account through a weighted attention module. Achieved the highest results out of the models which use rasterized maps to encode environment information in the NuScenes benchmark.

project image

A Novel Transfer Learning-Based Approach for Screening Pre-existing Heart Diseases Using Synchronized ECG Signals and Heart Sounds


Ramith Hettiarachchi, Udith Haputhanthri, Kithmini Herath, Hasindu Kariyawasam, Shehan Munasinghe, Kithmin Wickramasinghe, Duminda Samarasinghe, A. De Silva, and Chamira Edussooriya
IEEE ISCAS, 2021
Paper

Introduces a novel Dual-Convolutional Neural Network based approach which uses transfer learning to tackle the problem of having limited amounts of simultaneous PCG and ECG data that is publicly available.





Other Projects

project image

Multi-Sensor Based Dynamic Object Detection, Tracking & Trajectory Prediction for Self-Driving


August 2021 - August 2021

Worked on developing machine learning models for detection, tracking and trajectory prediction of dynamic agents in autonomous-driving, and integrating them with Robot Operating System (ROS).


Education

Education Instite Logo

MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE


MSc - Computer Vision
August 2023 - Present

Education Instite Logo

University of Moratuwa, Sri Lanka


BSc(Hons) Eng. - Electronic and Telecommunication Engineering
August 2017 - June 2022

GPA : 3.96/4.2 (First Class)
Dean’s List : Semester 1,2,3,4,6,7,8


Experience

Company Logo

MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE


Research Assistant
July 2022 - March 2023

Conducted research on explainability and visualization methods for multimodal and vision transformer models.

Company Logo

Creative Software, Sri Lanka


Intern (Machine Learning)
October 2020 - March 2021

Implemented solutions for semantic segmentation and object detection in corrosion and industrial object identification. Pioneered the creation of synthetic data through the Unity 3D game engine for enhanced model training.






Design from Jon Barron and Leo Keselman