Review: The Clockwork Muse, by Colin Martindale 1990


The Clockwork Muse: The predictability of artistic change

Colin Martindale, Harper Collins, 1990 (on Amazon)

You know how I always gripe that nobody does literary theory anymore? This is real artistic theory. Martindale studied thousands of poems, paintings, musical compositions, and a few pieces of fiction, using tests with human subjects and with computers. He came up with interesting questions, and tried to form hypotheses, conduct experiments to test them, and evaluate them using sound statistical methods.

I say “tried” because, unfortunately, he didn’t understand the principle of conservation of evidence, and didn’t understand statistics very well. But he raised interesting questions, answered some of them, and showed how to answer more of them. His work is remarkable for almost successfully taking a scientific approach to art.

The extent to which literary theorists ignored him is also remarkable. But Martindale was a professor of psychology, and published most of his results in psychology or computer science journals. I don’t know whether this was by choice, or because literary journals wouldn’t take them. He published quite a few in Poetics. I don’t think Poetics is a mainstream literary journal, since its guidelines request papers in sociology, psychology, media and communication studies, and economics.

The Good

Martindale did a lot of experiments, mostly in support of his central thesis (see under “The Ugly Details”):

– Artists are always trying to make their work more strange or surprising.

– They can make their work more surprising either by using more “primordial content” (basically randomness), or by creating a new style.

– New styles therefore appear at a regular rate over time, when the content presented in the previous style has become as random as it can be.

– This accounts for almost all stylistic change, throughout all of history, across all art forms.

If his analyses had been correct, he would have an overwhelming amount of evidence in favor of this (somewhat repugnant) thesis. As it is, it’s hard to say how much evidence is left when you throw out all the bad statistics, optimistic curve-gazing, and post-hoc rationalization, but I think it’s significantly more than zero.

The irony is that other aesthetic theorists had no way of knowing how bad Martindale’s use of statistics was. They knew even less about statistics. They ignored him correctly, but unjustifiably. Or perhaps this incident justifies their ignoring scientific incursions into literature, and explains the hostility between C. P. Snow’s “two cultures” (the sciences and the humanities): Anyone from a scientific discipline can rush into a humanity and terrorize its inhabitants, brandishing graphs and chanting p-values. If our hapless “humanitarians” admit that science works, they’ll be helpless against him, because they won’t be able to tell whether his science is good or bad. (Let us suppose, in the name of democracy, that the same holds for incursions from the humanities into the sciences.)

Chapter 7, “Cross-National, Cross-Genre, and Cross-Media Synchrony”, section 2 on cross-media styles: This experiment showed that the terms “baroque”, “romantic”, and “neoclassical” mean something other than just “what people did during an arbitrarily-bounded time period”. Martindale said this is now an unpopular belief.

Martindale doesn’t get into any of this, but I’ll explain why I think post-modernists are suspicious of the idea that “baroque” by itself means something other than an arbitrary, socially-agreed-on time period. It’s important. Well, if you care about philosophy or art theory.

A lexeme is a word or set of words whose semantic meaning is not clearly composed of the semantic meanings of its parts. “Run” can be a lexeme, but when it’s in “run up a bill”, that whole phrase is the lexeme, because “running up a bill” doesn’t involve anybody running, or any movement up, and you can’t “run up a credit” or “run up a reputation”.

Post-modernists believe that the meaning of any lexeme doesn’t ultimately reside in properties of the thing or event the lexeme refers to, but in the position of the lexeme in a giant graph describing the relationships between all the lexemes of the language. Call that belief S (for “Structuralism”). For example, we might say that the meaning of the term “love” was that two people who were in the relationship “love” had mutual intentions towards each other with positive emotional valences (wishing each other good health, respect, satisfying work, wealth, etc.), while “hate” referred to a relationship between people who mutually held intentions with negative valence towards each other (wishing each other harm, humiliation, and financial ruin).

A post-modernist additionally says meaning is indeterminate. That means that if we met an alien species which used the terms “mikto” and “klaanbart” to refer to relationships between people who held mutual intentions of the same valence, we would have no way of ever knowing which meant “love” and which meant “hate”, because we couldn’t feel the valences of their emotions, and might misinterpret their facial emotions and all other indicators in a systematically wrong way. To be more precise, the post-modernist would say that we can’t be wrong in this fashion, because “love”, “hate”, “mikto”, and “klaanbart” have no meaning other than enabling you to predict that if Jerry “loves” Sally he is more likely to give her chocolates than scrapings from the bottom of his shoe, and if Freemulo miktos Gromblat, ze is more likely to frondle zim than to blammo zim. (This sort of argument comes from Quine.) The argument fails in this case if we believe that pain is a universal evolved perception of negative valence to prevent organisms from harming themselves. We then expect to find either “mikto” or “klaanbart” associated much more often with actions that cause harm, and we can call that one “hate”.

If you try to enumerate the set of relationships baroque music is in, the instantiations of “baroque music” are all instances of music, and not instances of painting, literature, or architecture. If the true “meaning” of “baroque music” were found at such a high level of abstraction that it also applied to instances of music, painting, and literature, that would imply a degree of coherence and orderliness to reality that is at odds with post-modern semiotics. So post-modernists are likely to treat “baroque music” as a lexeme, and say that “baroque music” “means”, mainly, the set of relationships between the people using the term, the music, the instruments used, the musicians, the composers, and so on, and probably has little to do with “baroque architecture”.

For a more logical explanation:

The belief S was posited by Saussure as an alternative to the belief that the meanings of words are “grounded” in reality, which I’ll denote by G. Philosophers see S and G as mutually exclusive, and as covering all possible cases: S ⇔ not(G). (There’s no reason to believe either of these things, however. In fact it’s generally impossible to try to list the (verbal) relationships between words without running into relationships that imply facts about the entities that are grounded in reality. We might, for instance, find that baroque music was usually commissioned by the Church or by extremely wealthy patrons, and so was played in churches or very large private residences, which had large dimensions and so had long reverberation times, and this led to the use of low-pitched instruments and slow tempos. Trying to list the “structure” of relationships that define “baroque music” has led to a quantifiable, measurable property of the music itself, which grounds its meaning in reality.)

Let D (for “Decomposability”) denote the belief that words are usually lexemes, and so “baroque” in “baroque music” probably has the same meaning as “baroque” in “baroque architecture”, even though there are no instances of art that are both baroque music and baroque architecture. There’s no logical necessity to D => G or G => D. The term “baroque music” could be a lexeme whether or not its meaning is grounded in reality, and even if “baroque music” is defined structurally, it could be that “baroque” has its own structural definition. But philosophers appear to assume thatDG, probably because “folk metaphysics” assumes both D and G. It does at least seem that G weakly implies D, because given G, you could follow the folk-linguistics model of coming up with words to describe real things, and then putting them together to describe combinations of things and relationships between them.

So, given the false assumptions Snot(G) and D => G, the post-modern commitment to S implies not(G), which implies not(D), which suggests that “baroque” doesn’t mean anything on its own.

Martindale showed people who didn’t know much about art pictures of paintings, sculpture, and architecture, and played them recordings of music. When he asked them to put them together into groups, in any way they chose, they put the baroque music with the baroque painting, the baroque sculpture, and the baroque architecture, and so on with classical and romantic, more than you’d expect by chance.

The rub is that I don’t fully trust that Martindale knew how to know what you’d expect by chance, because he said subjects created an average of 9 groups (p. 253), then used math assuming they had created 3 groups (p. 254). But the error, if there is any, is in the direction of making his results stronger than his analysis indicates. The musical data chosen is peculiar, excluding Beethoven, Brahms, and Wagner from the romantic, but their inclusion would only have made his results stronger.

Chapter 8, “Art and Society”, the only chapter in which he adjusts for multiple hypothesis testing, presents some good data indicating that prosperity for the working class correlates with collective thought, cultural references, and a de-emphasis on nature; conservatism correlates with concrete words and references to culture, while liberalism correlates with thought, emotion, and action. The work is interesting, but cast into doubt by the inconsistency between the British and American data.

In Chapter 9, “The Artist and the Work of Art”, discussing the common theme of a hero’s descent into an underworld, he pioneers the use of word frequency counts to disclose the theme of a story.

We can use coherence of trends [in word usage] to decipher what a narrative is about: that is, if a narrative is about overcoming evil, the trend in evaluative connotation should be stronger than the trend in primordial content. If a narrative is about alteration in consciousness, the reverse should be the case. The Tibetan Book of the Dead, for example, shows a clear trend in primordial content but no trend at all in the use of good versus bad words: it must be about alteration in consciousness rather than good versus evil. This conclusions conforms with what a Tibetan Buddhist would probably tell you. The descent into Hell in book I of Homer’s Odyssey is more about good and evil than about alteration in consciousness, though it seems to be about both. In this case, the trend indicates that Hell is a better place than earth, and is consistent with pagan conceptions of the afterlife. … Moby Dick [has trends in primordial content, but not in good/bad word frequencies, so it] doesn’t have much to do with ethics but does seem to symbolize alteration in consciousness…. Lewis Carroll’s Alice in Wonderland is an exception: it has no trends at all in either evaluation or primordial content. The story is about something else. (p. 329)

We can use some simple equations to delineate the plots of such narratives…. They can help unlock the hidden or symbolic meaning of a narrative. Narratives have more than one meaning. We do not need to leave it to the whimsy of the reader to decide which interpretation is most important. We can examine the coherence or orderliness of trends in the usage of different types of words to make an objective decision. Book VI of the Aeneid and Coleridge’s ‘The Rime of the Ancient Mariner’ are both about alteration of consciousness and about confronting and overcoming evil. (p. 339)

I think he overstates the strength of conclusions based on word counts, but I admire his vision. He also looks at Dante’s Divine Comedy and other major works. I haven’t seen anyone use word frequency analysis to investigate the themes of different books, or of the different parts of different books. I want to start testing this idea myself.

The Bad

He presents his theory as being about the evolution of music, but didn’t understand what evolution is. When he says “evolution”, he means its opposite: genetic drift in the absence of selection pressure. He says this is essential: “Evolution” only occurs when art proceeds without any interference from society. He calls selective pressure from society “non-evolutionary pressure” (p. 169). He assumes that whatever aesthetics is, it is not anything that real people like or want; their preferences can only contaminate aesthetic evolution. That’s not just saying artistic quality or popularity isn’t objective; it’s saying, from an “evolutionary” standpoint, that it’s bad. (Again, though, this is a popular position among aesthetic theorists.) He seems to have forgotten that he theorized that arousal potential (AP; see “Ugly details” below) was important only by arguing that it increases hedonic value = aesthetic fitness.

He ought to spend more time explaining what “primordial content” (PC) is, since he spends the entire book measuring it. It comes from psychoanalysis and indicates regression into… something. The subconscious? The collective? The pre-human? His attempted explication (p. 49), equating primordial thought with noticing similarities, and conceptual thought with making distinctions, is just a repetition of a common prejudice against “analytic” science that we have inherited from the Middle Ages and the romantic poets, of scientific thought as only dividing and never synthesizing. It has no real bearing on whatever it is that his construct measures.

So the theoretical underpinnings of his research are shaky. Fortunately he has lots of data. His interpretation of it, though, is usually statistically flawed. On p. 166-167 he describes computing a correlation using 150 samples, and says it results in “a marginally significant correlation of .14 with time…. If we [group datapoints together into 15 groups and use their means], the correlation is much higher–.66–and clearly significant.” That shouldn’t happen, and if it does, you shouldn’t use it. Allowing the experimenter to choose a cluster size that gives him “significance” is cheating. There are problems with many of his claims of significance, particularly the ones that claim periodic oscillations are significant [5]. He tells us that his theory works with Hamlet, Cymbeline, andThe Great Railway Bazaar (p. 318), but not how many books it didn’t work with. This concealment of his selection process reduces much of his quantitative data to anecdotes.

[5] My guess is that what he means when he says an eyeballed oscillation is significant is that he tried a lot of different polynomials, and eventually found one simple polynomial to fit the main curve, and one higher-degree recurrence relation that fits the oscillation after the first one is subtracted from the data, such that the fit to the oscillation explained {enough of the variance left over after the fit to the main curve} to achieve significance on a t-test. However, this doesn’t account for the freedom he had in choosing the type of equation and the parameters for it to fit his data.

The most-common problem is that he would do some experiment rating people or works of art on, say, 20 different dimensions, most of which he didn’t specify in this book, and nearly all of which, when revealed, are synonyms for either “primordial content” or “arousal potential” (AP). Then he does data fishing to find the two of the ten million or so possible small subsets of those 20 which have the highest correlation with PC and AP, and if one of those ten million choices correlates better than would happen one time in 20 by chance, he calls it significant.

If you look on page 188, you’ll find an experiment with Italian paintings in 20-year periods from 1330-1729. He had subjects rate painters along 24 dimensions, and then do factor analysis. Then he informs us that two of the resulting 5 dimensions corresponded to primordial content and arousal potential. This is better than cherry-picking the subsets that work best for him, but it’s still picking 2 out of 5. (We’d really like to know what the other factors were, and their relative importance, because that would suggest other influences on artistic change, but he doesn’t tell us what they were.) When he tells us which dimensions correlated with arousal potential (active, complex, tense, disorderly) and which correlated with primordial content (not photographic, not representative of reality, otherworldly, and unnatural), it becomes clear that most of the first set were designed to measure arousal potential, and the second set are all synonyms for primordial content. So the experiment didn’t validate his two dimensions; it just asked people to rate paintings along them, then (surprise) pulled his planted measurements out of the factor analysis.

He’s guilty of cherry-picking data. On p.178 you’ll find a chart of primordial content in pop music lyrics. He states that “there was a significant increase in primordial content from 1952-53 to 1958-59.” But if you start at 1953 instead of 1952, it becomes a decrease; even more so if you end at 1960 or 1961.

He had no conception of degrees of freedom. The section on cross-national synchrony in Chapter 7 is outrageous: He fit equations to explain how trends in one art in one country are influenced by trends in other arts in that country and other countries. But studying the equations on page 242, we realize that each of his fits takes 17 parameters! And in most cases he constructs these to fit fewer than 17 datapoints! I don’t know why they don’t fit exactly, or how he found his supposedly optimal solutions.

His quest for periodicity used tests that would find periodicity in random walks. Every time he plots a bunch of points and says that the oscillations around a curve are statistically significant, count the number of times that a segment goes through one point before re-crossing the central curve, and the number of times it goes through 2 or more points. If those numbers are roughly equal, it indicates that the oscillation around the central curve is a random walk, and is not statistically significant. (You can prove this using the binomial theorem.) Out of figures 7.5, 9.1, 9.2, 9.4, 9.20, and 9.21, only figures 9.1 and 9.21 pass this simple test. He’s generally guilty of optimistic eyeballing of data. He analyzes Dante’s Inferno and finds that “the main trend takes the shape of an M with an extra up-flourish at the end” (p. 323) Looking at figure 9.18, it’s hard to imagine how any realistic data could look less like his description of it.

The book is full of post-hoc rationalization. (That is, he never predicted a test’s outcome; he found the outcome, then justified it, often with some accommodating exceptions). For example, his study of American painters (p. 193-198) finds a single dip-rise in primordial content from 1800 to 1920, and so instead of admitting that he didn’t find dips and rises for the different styles during that time, he designates that entire 120-year period as “American style”. By never stating up front what he expects to find, he always interprets his result either as having proven his hypothesis (when they are consistent with it) or having proven something peculiar about the data (when they are not).

Sometimes he claims to have proven both at the same time. On p. 191, he reports finding results for his Italian paintings experiment that match the time periods for the styles late gothic, renaissance-mannerist, baroque, and rococo. But what’s “renaissance-mannerist”? It’s a mashing together of two periods because the data doesn’t come out as it should if they’re two separate periods. “If one accepts the idea that primordial content rises once a style is in effect, the present results support the idea that mannerism is the final stage of renaissance style rather than a separate style” (p. 193). Okay, but you can either assume A (mannerism is the final stage of renaissance style) and use it to prove B (that primordial content dips then rises within a style), or you can assume B and use it to prove A. You can’t assume both A and B in order to prove B and A simultaneously!

The Ugly Details

Primordial Content

Martindale also thought he’d found the principal component of art, starting from theory rather than from data or observation. This principal component was “primordial content” (PC, p. 57-59), which seems not to mean content that’s primordial = primal (e.g., sex, hunger, pleasure, terror), but content that’s dream-like, hallucinatory, unreal, nonsensical, chaotic, incoherent [1]. Martindale doesn’t get much more specific than that. He justifies this by saying that Nietzsche’s Apollo / Dionysius, Jung’s eros / logos, McKellar’s A / R (?), Berlyne’s autistic / directed, Werner’s dedifferentiated / differentiated (?), and Wundt’s associationistic / intellectual dichotomies, all mean the same thing. “Thought or consciousness varies along one main axis, as is obvious to anyone who studies the topic.” (p. 57)

Not quite. Those are all dichotomies with logic on one side, but they have one of two very different things on the other side: either sensuality, or associationism / dream-logic [2]. I don’t think those things (Dionysian abandon, and drug-induced hallucinations) have anything in common. The former is very agentive; the second is entirely passive. The former leads to Lord Byron, Wagner, the Moulin Rouge, and heavy metal; the latter (I would say, based partly on my own limited experience), to Celtic knotwork, Bach, Salvador Dali, Carlos Castaneda, and electronic / trance music. It became obvious as I read on that Martindale was measuring dream-like content, not sensuousness.

Also, because those other dichotomies oppose logic to something, they’re about processes of thought, while Martindale’s “primordial content” is static. It’s something you can see in a picture, like dark shadows or bat wings, or words you can count in a poem, like “rock”, “flame”, or “kiss”. And he doesn’t oppose primordial content to logic; he opposes it to… less primordial content. That’s not actually a dichotomy; it’s just a category.

But that’s okay. It doesn’t really matter how he came up with the category if he can state clearly what’s in it, and gets strong results from it. He does that [3].

[1] My guess is he was thinking of Freud’s “primary process thought”, and used “primal” in its obsolete sense of “primary”, even though Freud’s “primary process” is neither primal nor primary.

[2] If there is a historic linking of these two kinds of dichotomies, it’s probably through the yin-yang. Women were historically stereotyped as being (a) sensual and (b) illogical. So if your main dichotomy is male / female, and “female” = sensual and illogical, then of course Apollo / Dionysius and directed / autistic mean the same thing.

[3] He built something called the Regressive Imagery Dictionary that’s a big list of PC words, among other things.

I mislead by calling PC the principal component of art. If you had a principal component, you’d explain variation in art in terms of variation of that component. Martindale’s explanation isn’t that simple. It’s complicated and not very compelling. (Don’t worry. Things gets better once he starts experimenting.)

Arousal Potential

“Arousal” is a very general, very vague concept from psychology that’s used to measure the strength of an animal’s response to stimuli. It can mean the number of steps an animal takes per minute, how much time it spends awake, its blood pressure, sexual arousal, or pretty much anything else an experimenter can measure that seems more active than passive.

Like Willie van Peer, Martindale begins by describing the Wundt curve (p. 42):

This curve shows that people get the most enjoyment (“hedonic value”) out of things that produce one particular amount of “arousal”. Play music too quietly, and it’s not very arousing. Play it too loud, and it’s painful. Same thing for other senses.

Also like van Peer, Martindale forgets the shape of the curve immediately after presenting it. He assumes for the rest of the book that artists always seek to increase arousal, although looking at the Wundt curve would suggest instead that they always seek to keep it at its optimal value. He uses the term “arousal potential” (AP), because he’s talking about a property of works of art, not a measured response to them.


He doesn’t forget about the curve entirely. He dismisses it by talking about habituation (p. 45). Habituation is a very general behavior, found in humans, mice, snails, and even planaria. It means that an organism responds strongly to (is aroused by) a stimuli the first time, but its response grows weaker with time. So a given type of art should arouse the same person less and less the more they’re exposed to it. This, of course, is why, after reading science fiction books for a few years, people will get tired of them and switch to romance or mystery novels, and why old people can’t stand to listen to the music or re-watch the movies that were popular when they were young, but continually seek out the newest and latest. So this is why artistic styles must change: They produce less arousal over time, and people grow tired of them. The main problem is thus always to produce more arousal, to get back to optimal AP.

Except, wait, humans don’t act that way. Habituation is routinely used in theories of art, but it doesn’t match human behavior at all. Humans do exactly the opposite: They imprint on what they read or listened to as a teenager and generally seek out more of the same for the rest of their lives.

Also, if music entered the classical style around 1750 because people had become habituated to baroque, why don’t we just switch back to baroque now? The idea that we, in the 21st century, know fugues better than Bach did, is ridiculous. The habituation explanation for changing artistic styles requires Lamarckian inheritance of habituation. Martindale takes up this objection, which has been made before, and rejects it with an argument on page 49 that is, frankly, too nonsensical to summarize.

Pure Aesthetics: Content Doesn’t Matter

Martindale began by assuming that artistic change is internally driven by the quest for increasing AP. The only way to increase AP, he believes, is either to increase the primordial content (PC) of art, or to change to a new style. This is so obvious to Martindale that he doesn’t explain why. I think I’ve figured out why: Martindale adhered to a “pure aesthetics” theory of art.

It is not what Gibbon said—it is not meaning—that makes The Decline and Fall of the Roman Empire a work of literature. It is how he said what he had to say that makes it literature… In other words, the meaning of a text is not really relevant to literature. (p. 15)

He never considers the possibility that the content of a poem, or a story, or a picture, can be artistically significant. He says the point of all art is its style (p. 71). If someone likes a work of art, any part of that liking that can be explained in terms of, say, their personal experience or morals, must not be aesthetics (p. 169). (Indeed, being likable or not likable is generally not thought by theorists to be properly part of aesthetics–rather odd, considering aesthetics is defined as the study of what people like.)

I would like to be able to say that Art, to him, is whatever is left over after you understand it. The aesthetic value of a piece would then be literally the soul of its appeal, in that it’s a hypothesized essence that can contain only whatever you don’t yet understand. That would mean he was chasing a ghost.

That’s a horrible thing to say, but I can’t be even that generous, because he says what he considers to be the soul of art: Surprise. When he talks about French poetry, it becomes apparent what he thinks art is. Like Apollinaire, he prefers poetry that makes no sense to poetry that does, because poetry that makes no sense is surprising, while poetry that makes sense, isn’t (p. 82-86). It seems that “art” is, to him, approximately synonymous with “shock”. (Unfortunately, I think this may also be a common view now in aesthetic theory.)

For the most part, this doesn’t matter, since he’s working with data rather than armchair philosophizing. His poor understanding of how art operates only becomes a burden when coupled with his weakness for rationalizing away results. (In the section on short stories, he explains away some unexpected results using a very crude model of what a story is; e.g., p. 172, 175, 313.)

But it’s his unspoken justification for assuming that there’s a very simple dynamic underlying all of art, so that taste, artistic merit, or external factors. He doesn’t feel the need to justify his expectation that artistic appeal can be measured by a single number (AP), since he already believes, from his own taste in art, that it is composed of only one factor (surprise), which means about the same thing as “arousal potential”.

Artistic Change is Scalloped

PC, Martindale says, goes down and then up within an artistic style. The more PC a work of art has, the more AP it has. But PC is hard to generate. The artist has to regress (perhaps by becoming alcoholic, acting like a spoiled brat, and/or moving to the Village). So artists generate just as much PC as they need to out-do the artist before them. (A better explanation would be that artists generate just enough additional PC to compensate for the diminution of AP below its optimal level due to habituation, but Martindale has long since forgotten that AP has an optimal level.)

When artists invent a new style, they can slack off on the regression and not generate so much PC, because the new style, and incremental changes to it, provide enough AP to exceed the AP of the previous style. (Similarly, a better explanation would be that they must include less PC, to avoid producing art with too much AP.)

Once the new style has completely replaced the old and has been completely developed, PC must increase to keep increasing AP. Eventually an artist’s workdegenerates progresses to complete incoherence, or his liver gives out, and he can only increase AP by switching to another new style.

So you expect a plot of PC over time to go up and down, and each local minimum of the graph should be the midpoint of one artistic style. And this is what we see, sort of, in this plot on page 231 of PC in European music from 1500 to 1900:

Here we see the main problem with Martindale’s work: It involves a lot of staring at graphs and wishful thinking. Yes, there are curves going up and down. But how could there not be? Are these curves any different than we’d see if we plotted a random number from a normal distribution for each point?

If a point goes on a random walk, at each step it has a .5 chance of changing direction. So if you cut a random-walk’s graph into pieces at every local maximum or minimum, half of the pieces should have 2 points, ¼ should have 3 points, ⅛ should have 4 points, and so on. If the walk isn’t random, but instead you plot points from a normal distribution, then there should be fewer long runs; reversion to the mean should be more common. Pieces with 2 and 3 points should be more common, and pieces of 4 and 5 should be less common. I’m too lazy and stupid to figure it out, so I wrote a program to brute-force it. Let’s check:

          Pieces  2     3     4     5

Italy:        15   11   3     1     0

France:    10    4    3     2     1

Britain:     12    5    6     1     0

Germany: 13    5    7     0    0


Total:       50   25  19     4     1

RWalk:     50  25 12.5  6.2   3

Normal:   50   31  14     4     1

“RWalk” are the numbers we’d see in a random walk. “Normal” are the most-likely numbers we’d see if the plots were from a random number generator with a normal distribution. I’m not impressed.

And, yes, we see that the labels for the periods B1, B2, etc., seem to come at the beginning of a decline in PC. But the declines didn’t come where those labels were; Martindale put the labels where he saw the declines. I know this because they’re in a different position for each graph (France, Britain, Germany). The standard division is as follows: Baroque 1600-1750; Classical 1750-1800; Romantic 1800-1900 [4].

Wikipedia divides Baroque music into Early, High, and Late. Martindale has only Early and Late Baroque. Hmm. On the German graph, which is the most-important one for this period of music [6], the labels B1 and B2 appear after points 4 and 8, which would locate them at the years 1570 and 1650. Interpolating between his points, Martindale locates the start of the Early Baroque around 1555, and the end of the Late Baroque around 1695. His entire “Baroque” is shifted 50 years too early. It would be more accurate to call the dip labelled “C” on his graph (1700-1760) “Late Baroque” instead of “Classical”. And if you check the other graphs, they’re even worse: he has the Baroque in France as 1520 to 1680!

[4] Wikipedia approaches it differently; it gives overlapping periods of 1580-1760, 1730-1820, 1780-1910, and 1890-1975. Averaging the endpoints gives the same results.

His graph begins the “Early Romantic” in 1760, 40 years too soon, and ends the “Late Romantic” in 1880. Wikipedia lists a single Romantic era. Throughout the book, Martindale divides recognized eras into as many styles as his graphs seem to say they have, rather than stating up-front how many different styles he expects to find. So, again, what would the data have had to look like for Martindale to say it didn’t confirm his theory? Pretty strange, I think.


Suppose Martindale’s thesis about artistic change were correct. What would that mean?

Well, it would at least mean that all of the essays and manifestos by all artists of all time were meaningless twaddle. Artists creating new styles are sometimes quite vocal about why they’re doing it, like the Pre-Raphaelite Brotherhood painters, realist novelists, existentialist playwrights, and modernist poets. When they’re not, critics will often jump into the gap and explicate their work for them. All of those explanations are incompatible with Martindale’s. He says that a new style is good if, and only if, it is strange. No amount of theory matters. The theories all offer only false justifications for new strange things. At best, they’re rationalizations artists must make to themselves to produce something new and strange.

It also leaves no role for quality, content, or even skill. I’d like “arousal potential” to include these, but Martindale has been explicit throughout that it does not–it only includes depth of regression (primordial content), and degree of surprise. He maintains this even when it’s patently absurd, as on page 313, when he says, “A writer must … either increase depth of regression or change styles in order to increase incongruity, complexity, and the other devices that constitute arousal potential … in an individual work of literature.” In other words, action, plot, suspense, surprising events, engaging characters, and even steamy sex are all incapable of increasing arousal potential, and so have little or no bearing on the artistic fitness of a book. Logically, I would conclude from this that the best thing I could do to my stories to make them more popular would be to use bad grammar, or no grammar at all, to increase their incongruity and “complexity”.

Taken as an absolute, his thesis is simply wrong–there is more to art than incongruity. But if even a quarter of his tests held up under appropriate statistical techniques, it would indicate that the judgements of posterity, on who were great artists and what was great art, have very little to do with skill, quality, or anything other than novelty. It would mean that we don’t know how to art. I’ll have more to say about this after I review Pitirim Sorokin’sSocial & Cultural Dynamics.

Even if Martindale’s thesis is entirely wrong, it’s still valuable as an insight into the horrible implications of Ezra Pound’s “make it new!” Martindale’s book drives home, page after page, graph after relentless graph, a totalistic vision of art as lust for novelty. That Martindale can be so conversant with these many types of art, and value them only for their incongruity, proves that humans can theorize themselves into a numbness to art. Or, worse, that there are people who have no other aesthetics. (This would explain Axe Cop and a lot of Random fics.) That this vision of art is so compatible with 20th century ideas about art is a warning sign about the latter.


I like Martindale’s approach very much. He gathered a lot of data, framed a lot of hypotheses, and did a lot of tests, in many different art forms, covering the past 700 years. He just screwed up almost all of his analyses. His analysis is plagued by a failure to account for multiple hypothesis testing, a crippling failure to account for degrees of freedom, confusion of statistical significance with significance, and post-hoc rationalization. So most of his conclusions are at best suggestive, and at worst bogus.

But his experiments could have been analyzed correctly. He showed us many creative ways to experiment quantitatively on art. He just didn’t get the logic and math right.

And he did several important experiments correctly, providing strong evidence for some interesting, contentious, and broadly-applicable theories about art. But if you haven’t got a strong background in math, you’ll never be able to tell which of his experiments are the pearls among the rubbish.