This page lists the scientific publications using aubio.

Paul Brossier's papers

  1. Brossier, J. P. Bello and M. D. Plumbley. Real-time temporal segmentation of note objects in music signals, in Proceedings of the International Computer Music Conference (ICMC 2004), Miami, Florida, USA, November 1-6, 2004.

Abstract: Segmenting note objects in a real time context is useful for live performances, audio broadcasting, or object-based coding. This temporal segmentation relies upon the correct detection of onsets and offsets of musical notes, an area of much research over recent years. However the low-latency requirements of real-time systems impose new, tight constraints on this process. In this paper, we present a system for the seg- mentation of note objects with very short delays, using recent developments in onset detection, specially modified to work in a real-time context. A portable and open C implementation is presented.

  1. Brossier, J. P. Bello and M. D. Plumbley. Fast labelling of notes in music signals, in Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR 2004), Barcelona, Spain, October 10-14, 2004.

Abstract: We present a new system for the estimation of note attributes from a live monophonic music source, within a short time delay and without any previous knowledge of the signal. The labelling is based on the temporal segmentation and the successive estimation of the fundamental frequency of the current note object. The setup, implemented around a small C library, is directed at the robust note segmentation of a variety of audio signals. A system for evaluation of performances is also presented. The further extension to polyphonic signals is considered, as well as design concerns such as portability and integration in other software environments.

Co-authored papers

  1. Hazan, P. Brossier, P. Holonowicz, P. Herrera, and H. Purwins. Expectation Along The Beat: A Use Case For Music Expectation Models, in Proceedings of International Computer Music Conference 2007 (ICMC 2007), Copenhagen, Denmark, p. 228-236, 2007.

Abstract: We present a system to produce expectations based on the observation of a rhythmic music signals at a constant tempo. The algorithms we use are causal, in order be fit closer to cognitive constraints and allow a future real-time implementation. In a first step, an acoustic front-end based on the aubio library extracts onsets and beats from the incoming signal. The extracted onsets are then encoded in a symbolic way using an unsupervised scheme: each hit is assigned a timbre cluster based on its timbre features, while its inter-onset interval regarding the previous hit is computed as a proportion of the extracted tempo period and assigned an inter-onset interval cluster. In a later step, the representation of each hit is sent to an expectation module, which learns the statistics of the symbolic sequence. Hence, at each musical hit, the system produces both what and when expectations regarding the next musical hit. For evaluating our system, we consider a weighted average F-measure, that takes into account the uncertainty associated with the unsupervised encoding of the musical sequence. We then present a preliminary experiment involving generated musical material and propose a roadmap in the context of this novel application field.

  1. Hazan, P. Brossier, R. Marxer, and H. Purwins. What/when causal expectation modelling in monophonic pitched and percussive audio, in Music, Brain and Cognition. Part 2: Models of Sound and Cognition, part of the Neural Information Processing Conference (NIPS), Vancouver, Canada, 2007.

Abstract: A causal system for representing a musical stream and generating further expected events is presented. Starting from an auditory front-end which extracts low-level (e.g. spectral shape, MFCC, pitch) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical stream events using both timbre and time descriptions. The time events are represented using inter-onset intervals relative to the beats. These symbols are then processed by an expectation module based on Predictive Partial Match, a multiscale technique based on N-grams. To characterise the system capacity to generate an expectation that matches its transcription, we use a weighted average F-measure, that takes into account the uncertainty associated with the unsupervised encoding of the musical sequence. The potential of the system is demonstrated in the case of processing audio streams which contain drum loops or monophonic singing voice. In preliminary experiments, we show that the induced representation is useful for generating expectation patterns in a causal way. During exposure, we observe a globally decreasing prediction entropy combined with structure-specific variations.

  1. E. P. Davies, P. Brossier and M. D. Plumbley. Beat Tracking Towards Automatic Musical Accompaniment, in Proceedings of the Audio Engineering Society (AES) 118th Convention, Barcelona, Spain, May 28-31, 2005.

Abstract: In this paper we address the issue of real-time rhythmic analysis, primarily towards predicting the locations of musical beats such that they are consistent with a live audio input. This will be a key component required for a system capable of automatic accompaniment with a live musician. We implement our approach as part a real-time audio library. Due to the removal of "future" audio information for this causal system, performance is reduced in comparison to our previous non-causal system, although still acceptable for our intended purpose.

Other Contributions

  1. B. Dannenber, An Intelligent Multi-Track Audio Editor, in Proceedings of the 2007 International Computer Music Conference (ICMC 2007), Volume II, pp. 89-94, San Francisco, USA, 2007.

Abstract: Audio editing software allows multi-track recordings to be manipulated by moving notes, correcting pitch, and making other fine adjustments, but this is a tedious process. An "intelligent audio editor" uses a machine-readable score as a specification for the desired performance and automatically makes adjustments to note pitch, timing, and dynamic level.

  1. You and R. B. Dannenberg. Polyphonic Music Note Onset Detection Using Semi-Supervised Learning, in Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR 2007), Vienna, Austria, September 23-27, 2007.

Abstract: Automatic note onset detection is particularly difficult in orchestral music (and polyphonic music in general). Machine learning offers one promising approach, but it is lim- ited by the availability of labeled training data. Score-to-audio alignment, however, offers an economical way to locate onsets in recorded audio, and score data is freely available for many orchestral works in the form of standard MIDI files. Thus, large amounts of training data can be generated quickly, but it is limited by the accuracy of the alignment, which in turn is ultimately related to the problem of onset detection. Semi-supervised or bootstrapping techniques can be used to iteratively refine both onset detection functions and the data used to train the functions. We show that this approach can be used to improve and adapt a general purpose onset detection algorithm for use with orchestral music.

  1. C. Yang, E. Chew, and A. Volk. A dynamic programming approach to adaptive tatum assignment for rhythm transcription, in Seventh IEEE International Symposium on Multimedia, December 12-14, 2005.

Abstract: We present a method for segmenting music with different grid levels in order to properly quantize note values in the transcription of music. This method can be used in automatic music transcription systems and music information retrieval systems to reduce a performance of a music piece to the printed or digital score. The system takes only the onset data of performed music from either MIDI or audio, and determine the best maximal grid level onto which to fit the note onsets. This maximal grid level, or tatum, is allowed to vary from section to section in a piece. We obtain the optimal segmentation of the piece using dynamic programming. We present results from an audio based performance of Milhaud's Botafogo, as well as several MIDI performances of the Rondo-Allegro from Beethoven's Pathetique. The results show a reduction of error compared to quantization based only on one global metric level, and promises to create rhythm transcriptions that are parsimonious and readable.

Last modified 9 years ago Last modified on May 20, 2008, 2:55:48 PM