FUTGA-MIR:

Enhancing Fine-grained and Temporally-aware Music Understanding with Music Information Retrieval

FUTGA Logo

Example Audio with Captions

Each audio file is accompanied by a Global Description generated by the model, displayed beneath the respective audio player. The model also identifies distinct sections within each audio file, providing detailed, fine-grained captions for these sections. Hover over the segments in the bar to view these specific captions.

Example 1

Example 2

Example 3

Example 4

Example 5