The Interpretation of Topics: Looking for Freud’s Unconscious and the Role of the Reader through Topic Modeling
by Rebecca Chenoweth
As I have acquainted myself with the tools of the digital humanities, many of my experiments through practicum assignments have led me to the uncanny feeling of having experienced the same thing before, either in my theoretical studies or in my reading of fiction. Looking back over my quarter of experimenting with this discipline, I might ham-handedly characterize the field, or at least its relevance in my own work, as a method by which to de- and then re-familiarize myself with literature and literary studies. I have had two major Unheimlich moments over the course of my exploration of the digital humanities; and given my dependence on the language and concepts of psychoanalysis to describe these experiences, I had better start with the Freudian one.
In my search for a text that seemed large enough to warrant study via tools often used on larger corpora, and one that had also aged enough to allow easy digital access and eliminate the threat of copyright infringement, I ultimately turned to The Interpretation of Dreams. A hefty text that often gets boiled down to a few basic elements (often the most scandalous bits), Freud’s introductory work on psychoanalysis seemed like an ideal test case for many of the tools we would be using during the quarter. If DH gives us the means to study literature that is either very familiar to us or vast in scope, what would it do with a text that could fit into both of these categories?
I began my study of the text with something that I thought would be easy enough to use and interpret: a word cloud of the text. Scientific in its use of word frequency, and yet abstract or perhaps subjective in its choices of color and position, this tool seemed to have a clear correlation with dream-work and psychoanalytic interpretation. My results, however, turned out to be a bit boring and messy, in the way that most accounts of one’s own dreams surely are to the average listener:
I created this word cloud before I learned about the magic of stop-lists, so unsurprisingly (though it did shock me in a sense), “dream” and “dreams” take up the majority of the cloud. As I have said, however, it is probably more authentic to run an unedited version of Freud’s text through this and any other program; if we begin to pre-interpret in this way, one could argue that ultimately we will not be any closer to finding the subconscious of Freud’s text than when we began.
The results of this word cloud treatment did help me to visually analyze one hypothesis: parental relationships are not the prime focus of Freud’s work. Both “father” and “sexual” are among the smaller words displayed here. We have to keep in mind, however, that these are the most frequent words in Freud’s work, not all of the words that appear there; so even the relatively small appearance of these words does denote that they come up often in the text. Not the most satisfying experiment, but somewhat fruitful nonetheless.
One tool that I did not expect to resonate with Freudian concepts, or perhaps with any concept I had encountered before, was topic modeling. There are quite a few digital humanists that tackle the formidable challenge of explaining what might be an unexplainable tool, most notably and entertainingly Matthew Jockers’ “The LDA Buffet is Now Open.” The idea of words being sorted into groups of ten based on how closely they are associated with each other within the original text makes basic sense. The difficulty starts to arise when we see that the word banks change as we change the amount of banks that the computer can create; and things become odder still when we see that the word banks change even when we re-enter the same text and ask it to create the same number of word banks (topics). Lisa M. Rhody’s piece on “Topic Modeling and Figurative Language” also does a nice job of unpacking the history, triumphs and perils of topic modeling literary texts; and she cites another piece on LDA that seems to come from a more scientific standpoint (being written by computer scientists), but has a title that reflects my sense of this program and its products the best- “Reading Tea Leaves: How Humans Interpret Topic Models.” (I prefer to think of topic modeling results as ink blots, personally, but as I hope to prove, much of what we’re doing is learning about ourselves as readers rather than the topics or tea dregs, so my figurative language is as valid as anyone else’s.)
What are we to make of a program that sorts our data for us, and yet makes no attempt at telling us why it has chosen to create these categories, or what we should do with this information? What, in the end, do these results tell us: do they help us to identify underlying trends within a text, or within ourselves as readers? In the program’s original social science, non-fiction context, these problems may not have been so troubling or immediate; but now that we humanists have gotten our hands on this tool, it is time to investigate the process and products of this tool. Thankfully, I did not tackle these questions alone; as my fellow UCSB “Intro to DH” students and I shared the results of our topic modeling of chosen texts, we all impulsively interpreted each other’s results. How could we make sense of these categories, or could we? After seeing my classmates’ reactions to the results, I felt more prepared to reflect on what the tool tells us about reading and ourselves, even if I’m not always certain what some topics say about the text at hand. If you’d like to play along, here is the first batch of results that came back as I ran The Interpretation of Dreams through David Mimno’s In-Browser Topic Modeling tool:
- dreams theory shall dream between material fact again further form
- his man dream upon attention whose childhood fact could indeed
- impression impressions same indifferent even two dreams gives different some
- his had him friend dream little about too myself just
- were his dreams seems such though same very since know
- dreams psychic certain dream-life whole psychological activities problem problems series
- dream may did yet see such content too before matter
- her she had into out woman herself some then think
- then being merely without person find dreamed established some against
- dream interpretation very subject even however see given idea upon
- waking state things place already images some three mind thoughts
- dream patient still between another myself patients becomes while then
- dream interpretation another here further made man meaning course found
There is much that can be said about these results (and has been said, in the context of my class discussion). What I’d like to examine here, instead, is the process of creating and analyzing these results, in broader strokes. Topic modeling seems, to me, to highlight two ways of thinking about what it means to read, or to interpret (to continue with my Freudian theme). As the readers of these topics, we may see ourselves as being in the position of a psychoanalyst, or to be more accurate, the popular concept of a psychoanalyst: someone who is perceived to be all-knowing, to have the (right or wrong) answers (see important disclaimer below*). If we seat ourselves in the analyst’s chair (rather than the analysand’s famous couch), we might see ourselves as either interpreting the free association products of the text, its author, or its historical period; or, more often than not, we may take the program itself as the object of study. We may be trying to figure out why the topic modeling program “chose” to group these word sets together, to see what we can learn about the program itself by studying these results (symptoms, manifest dream content, and so on). In our collective experiment, I had the feeling that if I looked at these word sets long or hard enough, I would learn more about the program that created them. This impulse to solve the topic modeling puzzle, to figure out why the program “says” what it does, puts us as reader-analysts in the interesting position of interpreting the interpreter—an exotic-sounding paradox, until you realize that this is one of the ways that people have looked at the practice of reading.
We also might see ourselves as patients in another stereotypical vision of psychological encounters. The practice of interpreting these topic models, depending on our approach, makes us akin to readers of a Rorschach test. What do we make of these blots on the page (screen), grouped perhaps at random? In this case, we are more in the position of the patient, trying to learn about ourselves via free association or acts of interpretation; and what seems like analysis of an external object is actually a test of our own mindset or assumptions. Seeing these arguably arbitrary categories may also lead us to question what exactly we consider to be valid topics. Some groups of words fall into topics that have clear counterparts within the humanities: there are classes on gender, politics, mental processes, and so forth. But when categories like this get thrown in with others that seem like nonsense, what are we to make of the process by which we categorize things? Do these categories illustrate topics and fields not yet discovered, or are they essentially white noise?
The ultimate question here for me is what, in the end, we are studying. Are we studying the primary text (here The Interpretation of Dreams)? Are we studying the program through which it has been run? Are we studying ourselves and our own reaction to the program’s results, our impulse to make sense of something that seems to have no clear meaning? These are also, of course, the same questions that we have to approach as we study literature. Are we approaching a text with the idea of learning about the author or (to widen the field) the historical setting in which it was written? Are we approaching the text to learn about ourselves, to see what is relevant in it to our own lives on a personal, social, or global level? Or are we studying the text simply for the sake of hearing what they have to say, in a New Critical sense; trying to get “back to the text” through this impersonal tool?
To me, there is no one-to-one correlation between the topic modeling tool and our position as readers, which may be the most productive answer that I can give to conclude this exploration. When the field of digital humanities presents us with (or perhaps is presented with) a tool like this, it may be that studying our approach to the tool and our reaction to it, rather than focusing on the data it creates, is the most productive approach we can muster.
*I must emphasize that this idea of the analyst as an individual who “makes sense of” the “nonsense” that his patients bring to a session of free association is a dated one, operating on the assumption that the analyst is at once an omniscient and impartial judge of the patient’s mind.