How can digital humanities help us consider the structure of the Shakespearean tragedy? This is the essential question at the core of this midterm project. The data analysis portion of this project uses five tragedies by the playwright, all in relatively modern English: Othello, Hamlet, King Lear, Julius Caesar, and The Tragedy of Titus Andronicus. For comparison, some graphs also include the play Votes for Women by Elizabeth Robins; this is a different play both by genre and by topic. If you want to think about this piece in scientific terms, perhaps we could say that it serves as a control — that is, a baseline from which we can determine whether what we see as trends in these tragedies is really specific to Shakespearean tragedies or just a result of common language use. All plays are sourced from Project Gutenberg (linked by name); you can access each individual script by clicking on its name. Data for this specific project was cleaned via two processes: 1) removal of legal information that is a part of the text documentation in Project Gutenberg work (legal restrictions still apply but could not be included in the data); 2) removal of all character names. To explain the latter, let’s take a look at how plays are formatted (taken from King Lear as linked above):

ACT I. Scene I.
[King Lear's Palace.]

Enter Kent, Gloucester, and Edmund. [Kent and Gloucester converse. Edmund stands back.]

  Kent. I thought the King had more affected the Duke of Albany than Cornwall.
  Glou. It did always seem so to us; but now, in the division of the ...

Looking at the beginning of this scene, in order to perform text analysis without considering who is talking (often abbreviated, which makes analysis even stranger), data cleaning will change the above text to as follows.

ACT I. Scene I.
[King Lear's Palace.]

Enter Kent, Gloucester, and Edmund. [Kent and Gloucester converse. Edmund stands back.]

  I thought the King had more affected the Duke of Albany than Cornwall.
  It did always seem so to us; but now, in the division of the ...

Evidently, this makes reading or performing the play harder; however, it also ensures that the data will not be skewed by the structure of the play. Other name cleaning is sometimes necessary, but only the speaker name is removed (not, for instance, stage directions). Thus it is also possible to analyze character movements, but this can be easily separated out as connected to specific words, like “Enter” or “Exeunt.” (Name cleaning was done via OpenRefine, a free software for cleaning data).

Several graphs are presented below — the first two use all of the data I was originally planning to use, just to get a grip on how that would look, but the last three have to do with examining trends within specific texts. I found that to be much more helpful than having them all lumped together, at least generally.

Analysis of the text is provided through Voyant Tools (linked).

If you are searching for the header image, it is a stock image and can be found at this link. I make no claims of ownership.


Let’s look into the data Voyant used. For example, take a look at the above embedded interactive graphic that describes one commonly used word (in this case, “Lord;” check out “Enter” for consideration of character movements, as I discussed in my introduction) and what words surround it. Please note that this link directs to a word analysis of only the Shakespearean plays; further analysis will also involve Votes for Women.

Focus on death and blood seems to be different in Shakespearean tragedies from focus on killing (see above). Interestingly, the action indicator (“kills” as opposed to “kill”) is not necessarily correlated with the discussion of killing; that is, killing does not always occur before the audience, and when it doesn’t, it seems more likely to be outright discussed. And while specifically killing comes up more in the tragedies than in Votes for Women, interestingly, the specific term of “kill” is actually not all that common in the tragedies (nor are the specific killing words, such as “stab” or “stabs,” in the case of Julius Caesar). The dead are consistent, but seem to show up less when instead killing is spoken of. This is an interesting trend referring to the way in which Shakespearean plays refer to death and the dead depending on context. Julius Caesar — whose focus is on the killing of Caesar and Brutus’s response to this event — is more gore-focused (“blood”) and less interested in the spiritual (“heaven” or “hell”). Compared with Votes for Women, it is clear based on word use that the non-tragedy is less interested in gore, and thus that its use of the word “hell” might be different from the uses in the tragedies. Also, even if you have not read the tragedies listed, it appears based on the relatively low frequencies of the specific death terminology that in fact these Shakespearean tragedies seem to be largely focused not on the death itself but on the ramifications and social strata surrounding the deaths around which they are built.

For a more specific look at how this might work, let’s examine how Voyant can structure an individual text’s analysis, in this case Hamlet. I want to clarify a key difference here that the trends shown are not text-to-text (between two different plays) but actually chronologically in a single text (so you can see where one word is used more often in that single text). To examine how this short death period works, let’s zero in on a particular portion, as in the first graph above. It is clearer in this graph how about three quarters of the way into Hamlet, the play reaches its tragic climax; the most death is occurring, and additionally, it appears that the most killing actions are being given. This graph demonstrates the structure of Shakespeare’s tragedy, wherein a long setup and plot lead to a tragic event, and the remaining quarter of the text examines the ramifications of the tragic event. This is an oversimplification, but we can see a similar trend in Othello‘s graph, the second one above where near the end (in this case essentially at the end) of the play, the major deaths occur; the majority of the play exists in the setup for the tragedy and the establishment of social strata (consider the prevalence of titles and formal words).

By comparison, it is not possible to analyze Votes for Women in this way (see graph above). The words “kill” and “kills” are present just as in the other works, but they do not function in the same single climactic form as the two tragedies I just described*. For a shorter form consideration of what this might mean, please see “Significance.”

*Please consider that this data does not include Julius Caesar, which has different patterning due to its plotline in which Caesar dies but is not the main focus of the play.


Looking at the Shakespearean tragedy head-on, one basic understanding of it is that it has its focus around the death of an important character, and this generally holds true. However, analyzing the word usage of several tragedies, it becomes clearer that in fact, the majority of the tragedy is setup for this death: in fact, the social systems, relationships, and circumstances surrounding the death of the main character(s) is often the true center of the Shakespearean tragedy, as Voyant helped to demonstrate. This can be contrasted against at least our control example of a play that is not a Shakespearean tragedy — where the climax of the Shakespearean tragedy is often late in the play and involves the death of primary characters, it is less possible to analyze other plays in such a fashion, and their structuring can not be measured as well by consideration of deaths within the play.