Audio Machine Learning · Audio Datasets · Multimodal AI

Hi, I’m Gijs Wijngaard.

I explore how machines listen and understand audio. I focus on building novel audio understanding models and datasets.

Currently a PhD candidate at the University of Maastricht, where I am supervised by Elia Formisano and Michel Dumontier.

I am also a Co-Founder of Encode Europe, where we aim to build a future in which AI benefits society through advocacy, education, and policy research.

Updates

Selected Papers

Peer-reviewed highlights across audio understanding and datasets.

NeurIPS 2025 · 2025

AudSemThinker: Enhancing Audio-Language Models through Reasoning over Semantics of Sound

Introduces AudSemThinker, a reasoning-enriched audio-language model that outperforms state-of-the-art methods by structuring reasoning around auditory semantics, supported by a novel dataset AudSem.

Read paper

Preprint · 2025

Data-Balanced Curriculum Learning for Audio Question Answering

Combines curriculum learning with statistical data balancing to improve audio QA accuracy by 11.7% on DCASE 2025, addressing dataset imbalances through difficulty-based training and category filtering.

Read paper

IEEE Access · 2025

Audio-Language Datasets of Scenes and Events: A Survey

Survey of 69 audio-language datasets, analyzing their characteristics, biases, and challenges for training next-generation models.

Read paper

EUSIPCO 2023

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Introduces ACES, a novel metric for automated audio captioning that evaluates captions based on how humans derive semantic information from sounds, moving beyond traditional text-based metrics.

Read paper

Hi, I’m Gijs Wijngaard.

Updates

AudSemThinker at NeurIPS

Survey Accepted to IEEE Access

ACES at EUSIPCO

Selected Papers

AudSemThinker: Enhancing Audio-Language Models through Reasoning over Semantics of Sound

Data-Balanced Curriculum Learning for Audio Question Answering

Audio-Language Datasets of Scenes and Events: A Survey

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds