ARTIST: A CONNECTIONIST MODEL OF MUSICAL ACCULTURATION


Frederic G. Piat
Image Video and Multimedia Systems Laboratory,
Dept. of Electrical and Computer Engineering,
National Technical University of Athens,
Zographou 15773, Greece
(+301) 772-2521
piat@image.ntua.gr
http://www.image.ece.ntua.gr/~piat/


Table of Contents


Background

It is often not until we are faced with an unfamiliar musical style that we fully realize the importance of the musical mental schemata gradually acquired through our past listening experience. These cognitive structures automatically intervene as music is heard, and they are necessary to build integrated and organized perceptions from acoustic sensations: without them, as it happens when listening to a piece in a musical style foreign to our experience, a flow of notes seems like a flow of words in a foreign language, incoherent and unintelligible. The impression is that all pieces or phrases sound more or less the same, and musical styles such as Indian Rags, Chinese Guqin or Balinese Gamelan are often qualified as being monotonous by Western listeners, new to these kinds of music. This happens to experienced, musically trained listeners as well as to listeners without any musical experience other than just listening. Thus it is clear that the mental schemata required to interpret a certain kind of music can be acquired through gradual acculturation (Francès, 1988), which is the result of passive listening in the sense that it does not require any conscious effort or attention directed towards learning. This is not to say that formal training has no influence, but only that it is not necessary and that exposure to the music is sufficient.
Becoming familiar with a particular musical style usually implies two things: 1) The memorization of particular melodies 2) An intuitive sense of the prototypicality of musical sequences relative to that style (i.e., the sense of tonality in the context of Western music). These underlie two kinds of expectancies, respectively melodic and stylistic expectancies. Melodic (also called 'veridical') expectancies rely on the listener's familiarity with a particular melody and refer to his knowledge of which notes will be coming next after hearing part of it. Stylistic expectancies rely on the listener's familiarity with a particular musical style, and refer to his sense of the notes that should or will probably follow a passage in order for the piece to fit well in that style. These expectancies can be probed in different ways, for instance with Dowling's (1973) recognition task of familiar melodies interleaved with distractor notes, and Krumhansl and Shepard's (1979) probe-tone technique, respectively.

<-- Back to Table of Contents>


Aims

Some connectionist models of tonality have been proposed before but they are rarely realistic in that they often use a priori knowledge from the musical domain (e.g., octave equivalence) or are built without going through learning (Bharucha, 1987; extended by Laden, 1995). This paper presents an Artificial Neural Network (ANN), based on a simplified version of Grossberg's (1982) Adaptive Resonance Theory (ART) to model the tonal acculturation process. The model does not presuppose any musical knowledge except the categorical perception of pitch for its input, which is a research problem in itself (Sano and Jenkins, 1989) and beyond the scope of this paper. The model gradually develops through unsupervised learning. That is, it does not need any other information than that present in the music to generate the schemata, just like humans do not need a teacher. Gjerdingen (1990) used a similar model for the categorization of musical patterns, but did not aim at checking the cognitive reality of these musical categories. Page (1999) also applied successfully ART2 networks to the perception of musical sequences. The goal of the present paper is to show that this simple and realistic model is cognitively pertinent, by comparing its behaviour with humans' directly on the same tasks. As mentioned in the previous section, these tasks have been chosen because they are robust, having stood the test of time, and because they reflect broad and fundamental aspects of music cognition.

<-- Back to Table of Contents>


Main contribution

  • The Model ARTIST

  • The ART2 self-organizing ANN (Carpenter and Grossberg, 1987) was developed for the classification of analogue input patterns and is well suited for music processing. It seemed a bit more complex than what is needed here, and a few simplifications were made to build the present model, ARTIST (Adaptive Resonance Theory to Internalize the Structure of Tonality). It is made up of 2 layers (or fields) of neurons, the input field (F1) and the categories field (F2), connected by synaptic weights that play the role of both Bottom-Up and Top-Down connections. Learning occurs through the modification of the weights, that progressively tune the 'category units' in F2 to be most responsive to a certain input pattern (the 'prototype' for this category). The weights store the long-term memory of the model.

    <-- Back to Table of Contents>


  • Melodic Expectancies

  • When we are very familiar with a melody, we can usually still recognize it after various transformations like transposition, rhythmic or tonal variations, etc... This is not the case when distractor (random) notes are added in between the melody notes, and even the most familiar tunes become unrecognizable as long as the distractors 'fit in' (if no primary acoustic cue like frequency range, timbre or loudness for instance segregates them; Bregman, 1990). However, when given a few possibilities regarding the identity of the melody, it can be positively identified (Dowling, 1973). This means that Top-Down knowledge can be used to test hypotheses and categorize stimuli. For melodies, this knowledge takes the form of a pitch-time window within which the next note should occur, and enables the direction of auditory attention (Dowling, Lung & Herrbold, 1987; Dowling, 1990). As the number of possibilities offered to the subject increases, his ability to name that tune decreases: when Top-Down knowledge becomes less focused, categorization gets more difficult. With its built-in mechanism of Top-Down activation propagation, ARTIST can be subjected to the same task.

    <-- Back to Table of Contents>


  • Stylistic Expectancies

  • The most general and concise characterization of tonality --and therefore of most Western music-- probably comes from the work of Krumhansl (1990). With the probe-tone technique, she empirically quantified the relative importance of pitches within the context of any major or minor key, by what is known as the 'tonal hierarchies'. These findings are closely related to just about any aspect of tonality and of pitch use: frequency of occurrence, accumulated durations, aesthetic judgements of all sorts (e.g., pitch occurrence, chord changes or harmonization), chord substitutions, resolutions, etc... Many studies support the cognitive reality of the tonal hierarchies (Jarvinen, 1995; Cuddy, 1993; Repp, 1996; Sloboda, 1985; Janata and Reisberg, 1988). All these suggest that subjecting ARTIST to the probe-tone technique is a good way to probe whether it has extracted a notion of tonality (or its usage rules) from the music it was exposed to, or at least elements that enable a reconstruction of what tonality is.

    <-- Back to Table of Contents>


    Implications

    From the two simulations above we can see that it is easy to subject ARTIST to the same musical tasks as given to humans in a natural way, and that it approximates human behaviour very closely on these tasks. When probed with the standard techniques, it shows both melodic and stylistic expectancies, the two main aspects of musical acculturation. ARTIST learns unsupervised, and its knowledge is acquired only from exposure to music, so it is a realistic model explaining how musical mental schemata can be formed. The implication is that all is needed to accomplish those complex musical processings and to develop mental schemata is a memory system capable of storing information according to similarity and abstracting prototypes from similar inputs, while constantly interpreting the inputs through the filter of Top-Down (already acquired) knowledge. From the common action of the mental schemata results a musical processing sensitive to tonality. This property emerges from the internal organisation of the neural network, it is distributed over its whole architecture. Thus it can be said that the structure of tonality has been internalized. Testing ARTIST with other musical styles could further establish it as a general model of music perception.
    In the simulation of the probe-tone task, ARTIST's response has to be recorded before any lateral inhibition occurs in F2. Otherwise, the sum of all activations in F2 would simply be that of the winner, all others being null, and a lot of information regarding ARTIST's reaction would be lost. This is taking one step further Gjerdingen's (1990) argument for using ANNs, namely that cognitive musical phenomena are probably too complex to be represented through the tidiness of a set of rules. In the present example, the simulation of complex human behaviours is achieved through the 'chaos' of the activation of hundreds of prototypes, each activated to a degree reflecting its resemblance to the input. This explains why the correlations between human and ARTIST tone profiles are negative, and resolves the apparent contradiction with Katz' (1999) theory of the aesthetic ideal being 'unity in diversity': The global activation in F2 is a measure of diversity, not of unity. Lateral inhibition is the key element of the theory, but it is deliberately not used here to preserve all aspects of the complexity of the abstract representation of a stimulus.
    The major limitation of ARTIST in its current state is that it cannot account for transpositional invariance. Whether the perception of invariance under transposition can be acquired at all through learning is not obvious, as the question concerning how humans possess this ability is still opened.

    <-- Back to Table of Contents>


    References

    Bharucha, J.J. (1987). Music cognition and perceptual facilitation: A connectionist framework. Music Perception, 5(1), 1-30.
    Bregman, A.S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press.
    Carpenter, G.A., & Grossberg (1987). ART2: Self-organization of stable category recognition codes for analog input patterns. Applied optics, 26, 4919-4930.
    Cuddy, L.L. (1993). Melody comprehension and tonal structure. In T.J.Tighe & W.J.Dowling (Eds.), Psychology and music: the understanding of melody and rhythm. Hillsdale, NJ: Erlbraum.
    Dowling, W.J. (1973). The perception of interleaved melodies. Cognitive psychology, 5, 322-337.
    Dowling, W.J. (1990). Expectancy and attention in melody perception. Psychomusicology, 9(2), 148-160.
    Dowling W.J., Lung, K.M.T., & Herrbold, S. (1987). Aiming attention in pitch and time in the perception of interleaved melodies. Perception and Psychophysics, 41, 642-656.
    Francès, R. (1988). La perception de la musique. Hillsdale, NJ: Erlbraum. Originally published 1954. Librairie philosophique J. Vrin, Paris. (Translated by W.J.Dowling)
    Gjerdingen, R.O. (1990). Categorization of musical patterns by self-organizing neuronlike networks. Music Perception, 7, 339-370.
    Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception, development, cognition and motor control. Boston:D.Reidel/Kluwer.
    Janata, P., & Reisberg, D. (1988). Response-time measures as a means of exploring tonal hierarchies. Music Perception, 6(2), 161-172.
    Jarvinen, T. (1995). Tonal hierarchies in Jazz improvisation. Music Perception, 12(4), 415-437.
    Katz, B.F. (1999). An ear for melody. In Griffith, N. & Todd, P. (Eds.) Musical Networks: Parallel Distributed Perception & Performance, Cambridge: MA, MIT Press. pp 199-224.
    Krumhansl, C.L. (1990). The cognitive foundations of musical pitch. Oxford psychology series, No. 17.
    Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89, 334-368.
    Krumhansl, C., & Shepard, R. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5, 579-594.
    Laden, B. (1995). Modeling cognition of tonal music. Psychomusicology, 14, 154-172.
    Page, M.P.A. (1999). Modelling the perception of musical sequences with self-organizing neural networks. In Griffith, N. & Todd, P. (Eds.) Musical Networks: Parallel Distributed Perception & Performance, Cambridge: MA, MIT Press. pp 175-198.
    Repp, B.H. (1996). The Art of inaccuracy: Why pianists' errors are difficult to hear. Music Perception, 14(2), 161-184.
    Sano, H, & Jenkins, B.K. (1989). A neural network model for pitch perception. Computer Music Journal, 13(3).
    Shepard, R.N. (1964). Circularity in judgments of relative pitch. The journal of the Acoustical Society of America, 36, 2346-53.
    Sloboda, J.A. (1985). The musical mind. Oxford psychology series, No.5.
    <-- Back to Table of Contents>


    Figures


    From top to bottom:
    Figure 1. Summed ranks of the 2 label nodes for 'Twinkle twinkle' as a function of stimulus played and hypothesis tested.
    Figure 2. Comparison of ARTIST and Krumhansl & Kessler (1982) C major tone profile (correlation = .95).
    Figure 3. Comparison of ARTIST and Krumhansl & Kessler (1982) C major tone profile (correlation = .91).
    Figure 4. Comparison of ARTIST and Krumhansl & Kessler (1982) inter-key distances between C major and all minor keys (correlation = .972).

    <-- Back to Table of Contents>