|
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric Xing, Fahad S. Khan, Salman Khan
Arxiv-preprint, 2024
Paper
/ Code
/ Project Page
Presents a video large multimodal model, capable of pixel-level visual grounding, featuring an end-to-end alignment mechanism. Also introduces a grounded video conversation dataset curated using a semiautomatic annotation pipeline.
|
|
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Mubarak Shah, Fahad S. Khan
Arxiv-preprint, 2023
Paper
/ Code
/ Project Page
Extends recent advances in images-based LLMs to video understanding by incorporating audio transcripts to support video context understanding, while introducing a baseline framework and benchmark for conversation-based spatial grounding.
|
|
Class-Aware Attention for Multimodal Trajectory Prediction
Bimsara Pathiraja, Shehan Munasinghe, Malshan Ranawella, Maleesha De Silva, Ranga Rodrigo, Peshala Jayasekara
Arxiv, 2022
Paper
Presents a novel model architecture for multimodal trajectory prediction in autonomous driving, that takes the physical properties of the target and surrounding vehicles into account through a weighted attention module. Achieved the highest results out of the models which use rasterized maps to encode environment information in the NuScenes benchmark.
|
|
A Novel Transfer Learning-Based Approach for Screening Pre-existing Heart Diseases Using Synchronized ECG Signals and Heart Sounds
Ramith Hettiarachchi, Udith Haputhanthri, Kithmini Herath, Hasindu Kariyawasam, Shehan Munasinghe, Kithmin Wickramasinghe, Duminda Samarasinghe, A. De Silva, and Chamira Edussooriya
IEEE ISCAS, 2021
Paper
Introduces a novel Dual-Convolutional Neural Network based approach which uses transfer learning to tackle the problem of having limited amounts of simultaneous PCG and ECG data that is publicly available.
|
|
Multi-Sensor Based Dynamic Object Detection, Tracking & Trajectory Prediction for Self-Driving
August 2021 -
August 2021
Worked on developing machine learning models for detection, tracking and trajectory prediction of dynamic agents in autonomous-driving, and integrating them with Robot Operating System (ROS).
|
|
MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE
MSc - Computer Vision
August 2023 -
Present
|
|
University of Moratuwa, Sri Lanka
BSc(Hons) Eng. - Electronic and Telecommunication Engineering
August 2017 -
June 2022
GPA : 3.96/4.2 (First Class)
Dean’s List : Semester 1,2,3,4,6,7,8
|
|
MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence), Abu Dhabi, UAE
Research Assistant
July 2022 -
March 2023
Conducted research on explainability and visualization methods for multimodal and vision transformer models.
|
|
Creative Software, Sri Lanka
Intern (Machine Learning)
October 2020 -
March 2021
Implemented solutions for semantic segmentation and object detection in corrosion and industrial object identification. Pioneered the creation of synthetic data through the Unity 3D game engine for enhanced model training.
|
|