Last week I attended ICASSP 2011, which was held in Prague, and presented a poster there. As always, it was a conference full of people, well-organized, in a nice place and with many parallel sessions. This year's edition had a strong presence of European researchers (Europe, Middle East and Africa had 1236 submissions compared to the 914 of Asia and Australia or the 701 from the USA), and specially of French people (227 submissions). Curiously, Spain was the 9th country in number of submissions (98; USA, China and France were the three first). This is quite remarkable if we think on the percentage this represents over the population of each country (are we producing too much PhDs in Spain?).
As well as in 2010, the conference had 3 sessions entirely devoted to music signal processing (1 oral + 2 poster sessions). In fact, music signal processing was explicitly acknowledged to be one of the rising topics within the IEEE Signal Processing community. Apart from music processing, source separation and sparsity-related stuff were quite well represented. Across the whole conference I could spot some interesting papers related to music, for example:
- Yun Wang & Zijian Ou, "Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio".
- Jinyu Han & Wei Chen, "Improving melody extraction using probabilistic latent component analysis".
- Thierry Bertin-Mahieux, Graham Grindlay, Ron J. Weiss & Daniel P. W. Ellis, "Evaluating music sequence models through missing data".
- Other posters of AASP-P2 that I couldn't check because I had to stay with mine.
- Hung-Yi Lo, Ju-Chiang Wang, Hsin-Min Wang & Shou-De Lin, "Cost-sensitive stacking for audio tag annotation and retrieval".
- Naoki Yasuraoka, Hirokazu Kameoka, Takuya Yoshioka & Hiroshi G. Okuno, "I-divergence-based dereverberation method with auxilliary function approach".
A new thing this year seemed to be the "Trends in XXX" sessions, which were half-hour sessions where three experts in a certain field discussed about the trends of such field. As always, this kind of sessions are very interesting for people outside the field, but I doubt they brought something new to people inside a particular field.
Last but not least, there was a very stimulating special session on "Innovative Representations of Audio", with all papers worth reading (I really enjoyed it!). In particular, I would highlight:
- Nima Mesgarani & Shihab Shamma, "Speech processing with a cortical representation of audio".
- Richard Lyon, Jay Ponte & Gal Chechik, "Sparse coding of auditory features for machine hearing in interference".
- Paris Smaragdis, "Approximate nearest-subspace representations for sound mixtures".
I'm in the program commitee of the MIRUM workshop, the first international ACM workshop on music information retrieval with user-centered and multimodal strategies (MIRUM), which will be held as a full-day event during ACM Multimedia 2011. It will take place from November 28 to December 1 in Scottsdale (Arizona, USA). The workshop aims to gather experts from the Music Information Retrieval community and neighboring fields at a premier multimedia venue, to initiate a cross-disciplinary dialogue on open challenges in the field of Music Information Retrieval with user-centered and/or multimodal strategies. Here is the call for papers (more information at the MIRUM website):
<<Music is an outstanding example of a content type with many different representations. The symbolic notation by the composer (e.g. in a score or a lead sheet) will only reach full manifestation when performed and presented to listeners in the form of music audio. Next to the symbolic and aural modality, multiple other modalities hold useful information that will contribute to the way in which the music is conveyed and experienced, such as visual, textual and social information. The existence of complementary representations and information sources in multiple modalities makes music content multimedia by definition.
The consumption of music is strongly guided by affective and subjective responses: aspects that are personal and context-dependent, occur at different conceptual specificity levels, and for which no universal, uncompromising ground truth exists. In order for music retrieval systems to yield satisfying results, insight into the information needs and demands of the actual users of the systems thus becomes very important.
To allow comprehensive and flexible exploitation of the multifaceted aspects of music, both the availability of complementary music-related information in multiple modalities and the role of the human user should be considered. At the same time, challenges such as the identification and optimal combination of useful information from different modalities and algorithmic approaches to user-dependent subjective assessments of music retrieval results still are largely unsolved. These challenges are certainly not unique to music content, but actual and prevalent in the broad multimedia community.
The MIRUM workshop, held in conjunction with ACM Multimedia 2011, November 28 – December 1 2011 in Scottsdale, AZ, provides a platform at a premier multimedia venue for discussing open challenges and presenting state-of-the art work on music information retrieval applying user-centered and/or multimodal strategies. The workshop explicitly aims to initiate a cross-disciplinary idea exchange between experts in music and multimedia information retrieval (and related fields) on the topics including, but not limited to: