The Principal Dimensions of English


I want to talk about the principal dimensions of theories of art, but to do that, I must explain what I mean by “principal dimensions”. Besides, you should learn this stuff anyway.

How to Recommend Stories if you’re a computer

Suppose you want to decide, after reading the first thousand words of a story on your kindle, whether to recommend that Bob read it. Also suppose you’re a computer, so you must summarize the story in numbers. You can list some numbers for each story: its number of words, if it’s a series whether it’s complete (1) or incomplete (0), whether it listed in the romance genre, the action/adventure genre, etc., how good the style is on a scale from 0 to 10, and how often the words “kiss”, “snuggle”, “sesquipedalian”, and “bloody” appear.

You end up with tens of thousands of numbers about the story. What do you do with them?

First, you make up terminology. Let N  be the number of numbers (say, 23724). Call each of the things being counted a “dimension”, and the whole set of N numbers a “point” in “N-dimensional space”. (Don’t worry that you can’t picture N-dimensional space. Nobody can.)

Next, you need to get similar sets of N numbers for a bunch of other stories. Then, you need to know which of those stories Bob likes. Then, you recommend Bob read the story if its values for those numbers are similar to those for the stories Bob likes.

But you might only know 100 stories that Bob likes. Hardly enough to determine exactly how he feels about the word “sesquipedalian”.

What you could do is look at the other stories, and group counted things together that seem to go together most of the time. So, you might notice stories that use the word “snuggle” a lot also use “kiss” more than the others, but use “wart” and “angst” less often. So you make a single new, fake dimension, which for the benefit of any humans reading this I will call Kissiness, like this:

Kissiness = 30*Romance – 10*Dark + 3*count(kiss) + 3*count(cuddle) + count(close) + count(smooth) … – count(angst) – 2*count(wart)

You make some other fake dimensions that group together words that seem more or less likely to be found together:

Sexiness = 30*Sex + 10*Mature – 300*Everyone + count(grope) + count(hard) + count(soft) + 3*count(erect) + …

Violence = 10*Gore + 5*Mature + 3*count(bloody) + 5*count(battle) + count(hard) – count(soft) – count(fuzzy)

Second_Person = 10*count(you) + 5*count(your) – 10*count(I)

Superheroics = 5*count(power) + 2*count(mighty) + 3*count(evil) + count(cape) + 3*count(costume) + 2*count(mask)

Obscenity = count(@$!) + 5*count(@#$@#$) – count(fudge) – count(hay)

Bigwordiness = count(assiduous) + count(voracious) + count(punctilious) + …

(I’ve listed several words each, but more realistically there would be hundreds making up each fake dimension.)

Then you can build categories using just the fake dimensions.  In fact, I think this is very similar to what your brain does for you.

Principle Component Analysis (PCA)

There’s a signal processing technique called Principle Component Analysis (PCA) which is one of the Deep Insights into Everything that philosophy students should study instead of Plato’s forms. It does all this for you automatically, optimally, for any category of things described by points in an N-dimensional space. It looks at a whole bunch of such things, then figures out the one single best fake dimension that gives the data the widest spread [0]. Then it removes that dimension from the data, and does the same thing again, figuring out the second-best summary dimension. Do that 10 times, and you get 10 summary dimensions. Compute the values along those 10 dimensions for all the points in your N-dimensional space, throw away the original points, and you’ll still have most of the information that was in the original N dimensions. [1]

Then, predict that the probability that Bob will like a story is the probability that he likes other stories near it in 10-space.

This is the technique that won over all other approaches in the $1,000,000 Netflix contest, which was probably the biggest experiment in predicting ratings ever. The key innovation in the contest was using a fast way to approximate PCA [2]–a way which, incidentally, can be done by neurons.

Plus, once you’ve got the things you’re dealing with down to 10 dimensions or so, you can use logic or computation on them. You can have complex rules like, “If each line of the story has a similar repeated pattern of stresses, it’s probably poetry” if your analysis has discovered dimensions corresponding to “trochee” and “spondee” [3]. (Which it might, if your training stories had a lot of poetry in them.)

A lot of what cultures do, and what your brain does, is basically PCA followed by categorization and then thinking with those categories. All this crazy high-dimensional stuff happens, and people try to come up with concepts to simplify and explain it. The intermediate-level concepts produced, like “pretty”, “harmonious”, and “cruel”, are not real things. They are fake summary dimensions, each a sum (or function) of lots of real dimensions, that capture a lot of the differences between real things.

Then people build more concepts out of that smaller number of intermediate concepts. Because there are fewer of them, they can use more-powerful ways to combine them, like logical rules or lambda functions, to say whether something is “just”, “virtuous”, “beautiful”, or “sublime”. [4]]

Finding Data Points in the Real World

If you don’t have the N-dimensional data for all your objects, don’t worry. You don’t need it. If you can take any 2 objects and say how similar or different they are, or even just whether they’re similar or different, you can jump straight to the lower-dimensional space that PCA would produce. Call it M-dimensional space, M << N. Here’s how:

Compare a bunch of object pairs.  Say for instance difference(kind, compassionate) = 1, difference(kind, hurtful) = 6. Then use the differences between them as distances in a low-dimensional space. Start each of the objects at a random point in M-dimensional space (a popular choice is to distribute them on the surface of an M-1 dimensional sphere around the origin), then repeatedly push pairs apart if they’re too close, and pull them nearer each other if they’re too far (keeping them on the surface of that sphere if you’re doing it that way), until most pairs are about as far apart in your M-dimensional space as the distance says they should be.

(How do you choose M? You make it up. Everything left over gets mashed together in the Mth dimension, so if you want 10 meaningful dimensions, set M = 11.)

The principal dimensions of English

In fact, we can just do this with the English language, using lists of synonyms and antonyms, and see what our M summary dimensions are. In fact, somebody already did. Given some reasonable assumptions and one particular thesaurus, the 10 most-important dimensions of the English language are, roughly:

1. Good/increase vs. bad/diminish

2. Easy vs. hard

3. Start/open/free vs. finish/close/constrain

4.  Peripheral vs. central

5.  Essential vs. extra

6. Pull vs. push (sort of)

7. Above vs. below

8. Old vs. young

9. Normal vs. strange

10. Specific vs. general

I call these the principal dimensions. If we were doing PCA, we’d call them the principal components. Same thing. [5]

By contrast, if you do the same thing for French with a French thesaurus, these are the first 3 dimensions:

1. Good/increase vs. bad/diminish

2. Easy vs. hard

3. Start/open vs. finish/close

Whoops! Did I say by contrast? They’re the same. Because the dimensions that fall out of this analysis aren’t accidents of language. Languages develop to express how humans think. And that’s how humans think, at least Western Europeans. [6, 7, 8]

…but you said this had something to do with art

Here’s how all this is relevant to art: I want to claim I’ve discovered the first principal dimension of theories of art. I’m going to show (hopefully) that the position of different cultures on this dimension predicts something important about what type of art they value. But you need to understand what I mean by their position on this dimension, and what I mean by a type of art.

A type of art is like a mental disease. You diagnose it by noting that it contains, say, any 5 out of a list of 12 symptoms. The art type, or disease type, is a category. Its “symptoms” are measurements on summary (principal) dimensions. The actual data for a culture are going to be things like the degree to which power and wealth are centralized, the level of external threats, the heterogeneity of social roles, and the education level. [9] The principal dimension I’m going to talk about is not a real thing-in-the-world, though it is real. It’s determined by a statistical correlation between actual things in the world.

[0] Technically, the largest variance.

[1] There are many ways of doing PCA, and many related dimension-reduction techniques like “non-linear PCA” and factor analysis. Backpropagation neural networks are doing non-orthogonal PCA, though this wasn’t realized for many years after their invention.

[2] Except that they didn’t technically do PCA because they didn’t have the N-dimensional points. They assumed that each movie was described by an N-dimensional point, and that each user had an N-dimensional preference vector saying how much he liked high values on each dimension, and that their ratings were the dot products of these two vectors. Then they used singular value decomposition (SVD) to construct low-dimensional approximations to both kinds of vectors. So they ended up with the low-dimensional points without ever knowing the “real” original N-dimensional points. If anyone understands how to do this with PCA, please tell me.

[3] If all you want to do is recommend stories to Bob, it turns out it isn’t helpful for a computer program to construct the final genre categories.  It’s already got the point in N-space for a story; saying which genre that point lies within just throws away information. Just do your PCA and predict whether Bob likes that point in N-space. (Reference: The Netflix contest winners and losers.) But if you want to use logic to reason about genres (say, what themes are common in which genres), then you’ll have to categorize them.

[4] Many of the supposed proofs that meaning cannot be compositional (compositional: a term can be defined without reference to the entire dictionary) stem from the fact that philosophers don’t understand that first-order logic is strictly weaker than a Turing machine (lambda functions). “Logic” is a weak form of reason compared to computation.

[5] The fact that you can reconstruct these dimensions, and will get the same answer every time even with significant changes in the data, refutes the cornerstone of post-modern philosophy, which is that scientific theories, social structures, and especially language, are underdetermined by the world. That is, they claim that any one of an approximately infinite number of other ways of doing things, or categorizing things, or thinking about things, would work equally well, and the real world underlying the things we say can never be known. But in fact, casual experimentation proves that language is astronomically overdetermined. (The number of constraints we get from how linguistic terms relate to each other and to sense data is much larger than the number of degrees of freedom in the system.)

[6] Contrary to what Ferdinand de Saussure said, and post-modern philosophers after him assented to, thought came first, language, second. We can excuse him for making this mistake, because he was writing before Darwin’s theories were well-known, except oops no he wasn’t.

[7] Some “synonyms” are words that are opposite on one dimension, and the same on all the others, allowing people to invert a particular dimension. Examples: challenge / obstacle, abundant / gratuitous (differ on good/bad), tug / yank (on easy/hard, funny / peculiar (on normal/strange).

[8] If you feed the algorithm radically different data, you’ll come up with different dimensions after the first few dimensions, as I suppose they did for French in that study.

So what happens if two people had different life experiences, and their brains came up with different principal dimensions?

It turns out we have a word for this, an old word that predates the math needed to understand it this way: we say they have different paradigms. They classify things in the world using a different set of dimensions. When they think about things, they come up with different answers. When they talk to each other, they each think the other is stupid. This is why political debates rarely change anyone’s mind; the people on opposite sites literally cannot understand each other. Their brains automatically compress their opponents’ statements into dimensions in which the distinctions they’re making are lost.

This is, I think, the correct interpretation of Thomas Kuhn’s observation that scientists using different paradigms can’t seem to communicate with each other.  It doesn’t mean that the choice of paradigm is arbitrary. Different paradigms are better at making distinctions in different data sets. Someone who’s grown up with one data set can’t easily switch to a different one; she would have to re-learn everything. But, given agreement on what the data to explain is, paradigms can be compared objectively.

[9] Yes, it turns out I’m a literary Marxist. Sorry.

The annihilation of art


Here’s some music for you to listen to while reading this blog:

A little while ago someone I know made a few insightful observations about poetry. I’m not anti-poetry, and I don’t think he is either, but he makes a good point: Poetry has for decades been caught in a vicious cycle of self-isolation. An elite chooses experimental, inaccessible poems and fills the journals and anthologies with them. Readers drift away from poetry, deriding it as pretentious. The elite learns to associate inaccessibility with quality, and criticism with amateurism, and produces more and more inaccessible works, which it is capable only of praising, never of criticizing. Their tastes drift farther away from the mainstream, casting more and more readers out. Poetry that does not meet their criterion for obscurantism is not published; poetry that does, is not read.

I’m going to paraphrase my friend here:

You mention that to many people you know, poetry is “too difficult, too vague, or too subjective.” I would argue that in many cases, this seems a very accurate description.… And likewise, poetry is often allowed to succeed where other forms of art would not. Many poems are so highly impressionistic that listeners and readers are left struggling to find meaning in the words….

With music or prose or artwork, we can point to something exact and have our opinions judged fairly. I dislike the singing; the characters are bland; the colors are mismatched and give me a headache. All valid criticisms. But when you approach poetry, criticism from the uneducated is treated as such….

For poetry to escape the taint of elitist disdain, it needs to rid itself of the shell that is formed around it. Is this a condemnation of all poetry or even most poets? No, not at all. But the popular conceptualization that poetry is a pastime for a small group of intellectuals, as unfair as it might seem, is grounded in a subjective grain of truth. For the people looking in from outside, poetry is often not some beautiful song waiting to be digested, but a pretentious chunk of purple imagery that revels in its own depth and inaccessibility. Which is, I think we can both agree, a sad state of affairs that harms those on either side of the window.

… We’ve all heard people say they dislike rap, or country, or dubstep; the most common response amongst those respective genres is to attempt to convert the doubter with “good” examples from that genre…. In my personal existence, poetry was never handled the same way.

This isn’t isolated to poetry. Orchestral music has taken exactly the same march into isolation and cultural irrelevance since about 1920. Jazz followed later, starting maybe around 1960. Literature started down that path with Ulysses, and Joyce kept going down it for the rest of his career.

The visual arts, meanwhile, went in a similar but weirdly opposite direction, taking the quickest and easiest route to driving away the common folk. By the 1930s, the goal in architecture, sculpture, and painting was to make everything as simple, boring, and ugly as possible. This 1938 building in Brooklyn wasn’t a slapdash cost-saving construction project; it was a celebrated design by a famous modernist architect. Notice how its color perfectly matches the mixture of dead grass and mud on the ground in front of it.

Brooklyn project William Lescaze 1938

And it wasn’t long ago that if you walked into a modern art museum, all you’d find would be a hundred variations on this:


and this:

That YouTube video at the top? That’s a composition by Brian Ferneyhough. My renter is a composer. He’s trying to earn enough money to go back to grad school in music composition, and Ferneyhough, he says, is considered by many composers to be the greatest living composer. That piece isn’t modern at all—he composed it in 1966. 48 years ago. It represents the pinnacle of the past eighty years of orchestral composition.

To get accepted to grad school, my renter has to write something like it. He has seven folios full of his attempts.

I asked him if it bothered him that he’s spending his whole life struggling to make a kind of music that, if he succeeds, no one outside of academia will want to hear. It will never be played on the radio; it will never appear in a physical music store; it will probably never be played in a concert hall outside of western Europe. He says that this is only to be expected; few people have the intelligence to understand the greatest works in any art form.

An art form that is completely detached from culture. Isn’t that an oxymoron? Is it art, or is it a cult?

I asked him if it was an arbitrary social convention, or else if it was the next logical stage in music—if you rewound the clock and played the 20th century over again, slightly differently, would it inevitably lead to that kind of music, like geometry inevitably led to topology? He said he believes so; that Ferneyhough is not just different than Beethoven, but superior to him.

During the 1930s, the entire European artistic landscape seemed determined to drive people away from art. I think this made nationalism and fascism possible. People outside the elite sensed that culture had deliberately rejected and ejected them, and so they united to destroy it.

It’s seldom a good sign to find yourself in agreement with Hitler. But if Ferneyhough is great, I don’t want to be that great.

The march to self-isolation always starts with great works by a great artist—Picasso, Stravinsky, T.S. Eliot,  Miles Davis, Joyce. People imitate them, and try to take it further. Then it goes too far, and no one can admit it’s gone too far because by that time everybody in the elite power structure of that art has gone on record praising it.

Is this a uniquely 20th-century event? Has it happened before in history that the leaders of an entire art form deliberately isolated it from the masses? As far as I know, it hasn’t.

I think this couldn’t happen before the 19th century because art was funded by patrons, and the artists had to please the patrons. The patrons didn’t have careers in art, so they didn’t have to always find something new and weird to try to stay ahead of the crowd. There were professors of art and of music, but their opinions didn’t matter much.

It bothers me a lot. Orchestral music was, to me, humanity’s greatest achievement, and now we have annihilated it, and many other art forms, and no one understands why.

How and why did a single generation of artists destroy half of the West’s artistic heritage? Is modernism really the single cause behind 20th-century elitism in music, poetry, sculpture, art, and literature? Why didn’t it succeed in literature? How can we make sure it never does? I really wish I had answers.

And I really want to know whether the stuff is actually good, and I’m just too dumb to see it. But I don’t see any way of ever knowing that, even in principle. If the only way people ever come to appreciate Ferneyhough’s music is to be told they can never understand music unless they appreciate it, and to listen to it over and over trying to appreciate it, how can they know whether they appreciate it because it’s good, or because they’ve gotten used to it?