We study and characterize the rank-frequency distribution of MFCC code-words, considering speech, music, and environmental sound sources. We show that, regardless of the sound source, MFCC code-words follow a shifted power-law distribution. This implies that there are a few code-words that occur very frequently and many that happen rarely. We also observe that the inner structure of the most frequent code-words has characteristic patterns. For instance, close MFCC coefficients tend to have similar quantization values in the case of music signals. Finally, we study the rank-frequency distributions of individual music recordings and show that they present the same type of heavy-tailed distribution as found in the large-scale databases. This fact is exploited in two supervised semantic inference tasks: genre and instrument classification. In particular, we obtain similar classification results as the ones obtained by considering all frames in the recordings by just using 50 (properly selected) frames. Beyond this particular example, we believe that the fact that MFCC frames follow a power-law distribution could potentially have important implications for future audio-based applications.
M. Haro, J. Serrà, Á. Corral, and P. Herrera. Power-law distribution in encoded MFCC frames of speech, music, and environmental sound signals. Proc. of the Int. World Wide Web Conf., Workshop on Advances in Music Information Research (AdMIRe), pp. 895-902. Lyon, France. April 2012.