AudioToolAgent: An Agentic Framework for Audio-Language Models
Pairs a LLM controller with ASR and audio reasoning tools via tool adapters so the agent can iteratively call and cross-verify ASR and audio reasoning tools.
I explore how machines listen and understand audio. I focus on building novel audio understanding models and datasets.
Currently a PhD candidate at the University of Maastricht, where I am supervised by Elia Formisano and Michel Dumontier.
I am also a Co-Founder of Encode Europe, where we aim to build a future in which AI benefits society through advocacy, education, and policy research.
Our reasoning-enhanced audio-language model was accepted for poster session at NeurIPS 2025 in San Diego!
The comprehensive review of audio-language datasets is accepted to IEEE Access journal!
Audio Captioning Evaluation on Semantics of Sound, our evaluation metric on semantic audio captioning, was accepted to EUSIPCO 2023 in Helsinki.
Pairs a LLM controller with ASR and audio reasoning tools via tool adapters so the agent can iteratively call and cross-verify ASR and audio reasoning tools.
Introduces AudSemThinker, a reasoning-enriched audio-language model that outperforms state-of-the-art methods by structuring reasoning around auditory semantics, supported by a novel dataset AudSem.
Combines curriculum learning with statistical data balancing to improve audio QA accuracy by 11.7% on DCASE 2025, addressing dataset imbalances through difficulty-based training and category filtering.
Survey of 69 audio-language datasets, analyzing their characteristics, biases, and challenges for training next-generation models.
Introduces ACES, a novel metric for automated audio captioning that evaluates captions based on how humans derive semantic information from sounds, moving beyond traditional text-based metrics.