<H1 ALIGN=left>The Sound Archive of the Library of Babel</H1>

The Sound Archive of the Library of Babel

Taken from the paper:

Is Music Real?, Feb 2001


The Sound Archive of the Library of Babel is based on a thought experiment first suggested by the psychologist Theodor Fechner, imaginately elaborated by Jorge Luis Borges in his collection of stories 'Labyrinths' in 1962 and further developed by the philosopher Daniel Dennett in 'Darwin's Dangerous Idea', 1995.  

In the above it is used with regard to text and language, and in the Dennett subsequently to biology (itself based on Dawkins 1982).

I am applying the same idea to music - or rather sonic information, it also could provide the background for a unique comparison of the three major 'creative' media - those based on linguistic, sonic and visual information.



In the Borges version, the hapless and hopeless narrator of the story 'The Library of Babel' has found himself in an apparently endless library - a library which has housed the whole population of his world for as long as he can remember.  A fundamental argument amongst the population is whether the library is finite, infinite or 'cyclic' (the narrator's own opinion).  In Dennett's similar version, the numbers are easier to deal with, so I'll use those - the books in the library are each

"500 pages long, and each page consists of 40 lines of 50 spaces, so there are two thousand character-spaces per page.  Each space is either blank, or has a character printed on it, chosen from a set of 100 (the upper- and lowercase sletters of English and other European languages, plus the blank and punctuation marks)...  Five hundred pages times 2000 characters per page gives 1,000,000 character-spaces per book, so there are 100^1,000,000 books in the Library of Babel.

It is worth spending a moment considering this number.  There are estimated to be between 10^40 and 10^80 particles in the observable universe - against the number of books in the Library of Babel this is insignificant.  We will see a few more of the ramifications of this extraordinary number - the philosopher Willard Quine terms it 'hyper-astronomical' shortly.  It is important to bear in mind that, while the number is physically impossible, it is possible for us to consider because of the clear rules involved.



Attempting the same statistics for the library's Sound Archive.  I am going to use the standard CD as our basic unit.  Most single works will fit on it, and any that are longer simply come in sets:

44100 sample rate at 16bits per sample, for a 60 minute CD
44100 * 60 * 60 samples in 60 minutes:
1,587,600,000 samples
or 2,540,160,000 bits Each bit can be either on or off, so the result is:

So there would be 2^2,540,160,000 possible Mono CDs.

If we insist on stereo, there would be 2^5,080,320,000 of them.

So, if we take an hour long CD as our standard unit, there are *many more* CDs in the library's Sound Archive than there are books in the book collection!  This is because of the number of possible 'spaces' made available by technological procedures.  We could use standard musical notation but it would probably make the situation worse - you'd have to calculate the possible varieties of instrument, the ranges of notes, the 'average' distribution of notes, and together these would certainly make a vast number (although don't worry, I haven't calculated it, yet).  In any case, all of these are *already* available in audio versions within our Sound Archive!

In the book section of the library there is (according to Borges:)

"all that it is given to express, in all languages.  Everything: the minutely detailed history of the future, the archangels' autobiographies, the faithful catalogues of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the Gnostic gospel of Basilides, the commentary on that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book in all languages..."

In the Sound Archive, therefore, we have every possible humanly perceptible sound.  Included in this would be a digital recording of this presentation, someone coughing, the opening bars of Beethoven's Fifth Symphony, the opening bars of that symphony but where every member of the orchestra, extraordinarily, plays the same wrong note at the same time.  The collection includes all sounds as well as all 'music'.  (Incidentally, the Sound Archive also includes all readings of every book in the book section of the library (in fact all possible versions of every reading of every book) - how can this be?  Some CDs will be used more than once - many, for instance those involving just certain sorts of near silence, will be used constantly, others - those consisting of very unusual words, will be used less often).

So, theoretically, as thought the early inmates of Borges' world, it should be possible to find everything about anything, (or any musical composition, or indeed any sound) in the Sound Archive.  The problem is finding it.  For, as Dennett points out:

"The Library of Babel is not infinite, so the chance of finding anything interesting in it is not literally infinitesimal...  Unfortunately, all the standard metaphors - "astronomically large," "a needle in a haystack," "a drop in the ocean" - fall comically short.  No *actual* astronomical quantity is even visible against the backdrop of these huge but finite numbers.  If a readable volume in the Library were as easy to find as a particular drop in the ocean, we'd be in business!  If you were dropped at random into the Library, your chance of ever encountering a volume with so much as a grammatical sentence in it would be so vanishingly small that we might do well to capitalize the term - "Vanishingly small - and give it a mate, "Vastly", short for "Very-much-more-than-astronomically".

Things get worse.  In Borges' library, none of the books are in any sort of order, while Dennett alphabetises his.  In theory, each CD could be ordered by a number representing the value of the 'bits' defined on it. Under these circumstances, the 'first' 218,760,000 CDs would be completely silent (representing silences lasting from 1/44100th of a second to one hour), although we'd have to be careful not to confuse one of these with the several billion containing just one or two scarcely audible 'clicks'.

We might then imagine that we *could* find series' (more like galaxies) of similar items.  Therefore, we might expect to find every possible interpretation of Beethoven's Fifth in this segment, between, say, CDs 2^x and 2^y.  Indeed, we might even find, amongst all the millions upon millions of these CDs (this sector alone contains more information than the currently observable universe!), a higher proportion of Beethoven's Fifths, including the recording of the original first performance.  But amongst this vast number, where would or could we draw the line concerning which were interpretations and which 'errors' - in the sense of having too much noise or to many errors to be considered a 'proper' or acceptable recording?  Dennett uses Moby Dick as an example:

"Even a volume with 1000 [typographical errors] - 2 per page on average - would be unmistakably recognizable as Moby Dick, and there are Vastly many of these volumes.  It wouldn't matter which of these volumes you found, if you could only find one of them.  They would almost all be just about equally wonderful reading and all tell the same story, except for the truly negligible - almost indiscriminable - differences."

Even if we had located the vast 'Beethoven's Fifth' galaxy, in terms of sequencial numbering, we could just as easily have an entire and reasonable version of Beethoven's Fifth which according to the numbering system which could be vast eons away from exactly the same version with, say, a cough just before the first note.  In other words, our usual way of thinking about classification and location of material, of ideas, that we take to be obvious and intuitive, have no relevance here (that is, without some poor librarian listening to all of them!).

What is on the overwhelming majority of CDs in our archive?  It's a simple answer - noise, and noise of almost perfectly uniform 'structure'.  In both Borges' and Dennett's versions the chances of finding anything - a single comprehensible sentence in any language - are remote in the extreme.  Similarly, finding anything that resembled a comprehensible sound would be as, if not more, remote.  The chance of finding a single piano tone playing any note, framed by silence, is so ridiculously small as to be hopeless in human terms, and yet in reality we consider it to be obvious, intuitive, even mundane.


There is one other implication to be drawn, involving the 'readership' of this information.  In the negibly few cases where we are the (conscious) originators of it, we are structurally bound up in it, which is why we may feel that it is impossible to fully appreciate the size of the complete archive.  The possibilities inherent in our genome are of a similar magnitude, and yet the 'information' contained within it is filtered by the 'readers' that are contained within our cells.  There is a complex relationship between the two, but many billions of possibilities do not exist automatically because they are not even considered.  



C.f. Hofstadter and Dennett, The Mind's Eye
Penrose, The Emperor's New Mind, 1989
David Deutsch, The Fabric of Reality, p113, 1997
Keats, Ode to a Nightingale, 1820
Interview in Omni magazine (June 1986), quoted in 'Penrose Tiles to Trapdoor Ciphers', Martin Gardner, 1989
Dennett Consciousness Explained, 1991, p94-95
Jorge Luis Borges, 'Labyrinths' 1962
Daniel Dennett, 'Darwin's Dangerous Idea', 1995
Dawkins, The Extended Phenotype, 1982
Richard Dawkins, The Blind Watchmaker, 1986, p162