Analysis
For each section, click on the image to see the movie name in the caption.
OK, so let's start from square one... the expectations. I expect the action movies to have less words and shorter sentences because it takes fewer words to blow something up than to describe how you feel when you see your man prioritize work over a romantic date...
As we can see, THAT stereotype does hold up. In fact, chick flicks contain about 50% more words than action movies, on average. Movie lengths vary for both, and are comparable on average. That being said, look at the difference in word rate between Rambo and Predator, and between Casablanca and Breakfast at Tiffany's. Let's look further...
Background/Methods: R was used here to create word clouds made up of the most frequently said words in the film. Main character names were excluded from the word cloud because they are likely to be repeated many times in a film but this doesn't tell us much. We played around with removing stopwords or not, ultimately, we left them in because it is also interesting to see if the different movie types use stopwords more or less frequently.
Analysis and discussion: Here, some of the same words come up in both chick flicks and action movies. In particular, the verb "get" comes up all the time. "Get" happens to be one of the most frequently used verbs in English and is often used to show a change in situation. Think of "get married/divorced", "get down!", "get over here!" "get a call" etc. Out of all the movies, "get/got" seems to be more prevalent in the action movies.
The other dominant word in all films is "you". Of course, in any film, the point is usually some kind of interaction between characters.
Interestingly, Machete is dominated by "will", at the expense of "get/got". This is probably because Machete WILL skin you alive, he doesn't GOT to do it. Machete don't got to do anything.
Coming back to chick flicks, there's a lot of "just", a word used to either clarify or say what happened a minute ago (for example,"I just mean that..." or "I just saw him"). The former could be seen as a way of saying things more gently and/or apologetically, perhaps trying to bridge a gap between the two people who are supposed to be together by the end of the movie. Notice that this word is not prevalent in action movies.
Still, nothing here is the "AHA!" moment we were after, let's continue...
Background/Methods: Movies convey dominant emotions just as we do in our interactions with others. Here's a look at how the following emotions play into chick flicks and action movies: trust, disgust, sadness, joy, fear, anticipation, anger, surprise.
To accomplish this, we used the get_nrc_sentiment() function from the excellent package syuzhet , and then plotted the bars using ggplot. For this whole analysis, I'm using the Bing method of sentiment analysis.
Analysis: Overall, all movies display dominant feelings of trust (although perhaps this is a vocabulary calibration issue?). For all chick flicks, the three dominant feelings are trust, joy and anticipation while disgust and anger are the least prevalent. The above is not true of any of the action movies as they focus much more on anger, fear and anticipation, except for The Last Dragon which is behaving... strangely.
Discussion: So "trust" looks to be the dominant feeling in both chick flicks and action movies, although we're not sure that it's well balanced with say, "disgust" as a feeling. Assuming the result is reliable, one could imagine that this is due to the necessary building of relationships in both scenarios. In other words, this data could be telling us that brotherhood and sisterhood are equally important!
With trust, joy and anticipation being the dominant emotions in chick flicks, one could imagine that this is supposed to be the winning emotional combination in the quest for everlasting love and companionship!
On the other hand, our action movie heroes and villains face a "healthy" dose of fear, anger and anticipation. This combination makes sense if we imagine the plot of any action movie, where the good guy hoards off constant threats (scary) and attacks (anger) until all the bad guys are dead (we are in anticipation of this moment).
Background/Methods: I saw this 'squiggle' on Jan Vallandingham's blog , who in turn took it from Stefanie Posavec . The concept is simple: use the number of characters per sentence as the line length, then turn right and draw the next line, until you have this crazy knot of squiggles! As soon as I saw it, I thought it would be a great tool for this analysis, if I could add a sentiment element to it. Or at least hella fun. I got so excited that I didn't even see that code to accomplish this in R had already been developed by TRinker . Oh well, it was fun to construct another algorithm to accomplish the same task (and mine colors each sentence using the sentiment for each sentence using get_sentiment() from syuzhet, so green = positive, grey = neutral, red = negative). For what it's worth, it's in my github repo . Black dot is the start, blue dot is the end of the script.
Analysis: At a glance, it appears that there's more density in chick flicks and more sparsity in action movies. More importantly however, it seems like the chick flicks have clusters of green. Which means there are a lot of positive short sentences that cluster together. If there is any clustering evident in action movies, it would be red, so a negative valence. For action movies there's more grey overall, suggesting there might be more neutral sentences whereas it appears that most sentences in chick flicks are charged one way or the other. The big exception is "The Last Dragon" which ... ok, I'll say it, is acting like a chick flick again... hrm.
It should be said that there are lots of VERY long lines in the scripts above. These are songs, or monologues and the like, which were not broken up by the function get_sentences() of syuzhet. We could change the delimiting to force the sentences to be smaller, but we're not gigantically bothered.
Discussion: It seems like the overall trend in chick flicks is that they tend to be more positive than negative. These drawings also tell us to expect more overall emotion in chick flicks than in action movies. Intuitively this makes sense as chick flicks tend to be about relationships which are emotional in nature. As we say, action movies have more negative sentences (indeed more overall negative things tend to happen in these types of movies) but they also have neutral sentences conveying no emotion at all. Perhaps these sentences are "process-oriented", as in "Pass me that rope", "Get on the elevator", "Hide behind that tree" etc...
In conclusion, sentences are more emotional overall and more positive in chick flicks than in action movies, but we can prolly do better...
Background/Methods: So a German TV producer I met on a plane got me thinking about sentiments in movies, and how quickly they can make characters feel good then bad again. That led me to mess around with this kind of thing, where up is "good", and down is "bad". For example, for the movie Shrek, I drew out the perceived sentimental valence for each character:
This is what started this whole analysis, what would happen if we plot the sentiment of each sentence in the movie. Again we used the syuzhet package, this time the get_percentage_values() function to figure out the average sentiment per minute of movie.
Analysis: In both chick flicks and action movies, the sentiments generally appear erratic. In chick flicks, the emotions vary from very positive to a bit negative, whereas action movies are kinda the opposite.
In action movies, the emotions seem to vary a lot more. The two that stick out most are The Last Dragon and Predator where feelings run very low indeed (-1 for Predator where frankly one could understand how being hunted could get you down) compared to only -0.2 in Breakfast at Tiffany's. In addition, some movies like Die Hard are incredibly erratic and the emotions go up and down very quickly! Die Hard and Notting Hill have slight "U" shapes, which would mean that stuff was good, got bad, but ended good.... which isn't necessarily correct?
Discussion: Stereotypically, we would imagine that there's more emotion in chick flicks, and therefore, more ups and downs... however, we see that whether you're a chick flick or an action movie, your emotional charge will vary throughout the film (makes for a more interesting movie), but it seems more important for some action movies like Die Hard with fast-paced action to vary the emotion between positive and negative very quickly indeed (perhaps contributing to the "on the edge of your seat" effect).
One thing's for sure, chick flicks tend to favour higher levels of positive emotion than negative emotion, and overall, there are no more ups and downs in chick flicks than in action movies.
Background/Methods: So since the lines were too sharp and pointy, we used syuzhet::get_transformed_values() to soften up the lines a bit more and provide something that's more intuitive, like "first it was good, then it was bad". This kind of plot has been described by people smarter than me . The function takes inputs and applies a standardized set of filtered and reverse-transformed values by specifying the movie length as the number of bins.
Analysis: We get something that makes a bit more sense!
Three chick flicks start badly and two start well, whereas all action movies start well. There's something to be said about the number of ups and downs:
Ups and downs
Chick flicks
Action movies
Has five ups and downs
3
1
Has four ups and downs
1
1
Has three ups and downs
1
3
Also interesting that Predator and The Last Dragon manage to "stay afloat" so to speak. In other words, they go from good to slightly less good but not bad. No chick flick does that.
Discussion: There is no discernible pattern differentiating chick flicks and action movies. They all behave in their own way and have their own set of good and bad moments. Perhaps each individual story dictates the frequency and severity of ups and downs rather than whether it's a chick flick or an action movie.
Of all the movies, two sets of movies behave similarly: Bridget Jones and Machete, then Rambo and Notting Hill! For these, the movie sentiment goes very evenly from good to bad, to good. Then, in the case of Machete and Bridget Jones, there's another bad segment. The only difference there is that about 5 minutes before the end of Bridget Jones, things get good again (she gets her man Darcy, FINALLY). In Machete, things just end badly...
Anyway, since this doesn't differentiate chick flicks from action movies, this is not the analysis we are looking for.
Background/Methods: It occurred to me that one of the weaknesses of sentiment analysis is that all phrases "stand on their own". Sure, averages and trends can be identified, but I think that in life (and maybe in movies), what came before must surely affect the emotional sentiment that comes next. Take the following conversation:
Person 1: I love you (+1)
Person 2: aw... I love you too! (+1)
Person 1: Actually, I hate you, you're destroying my life. (-1)
Person 2: ... what? (+0)
Person 1: Just kidding! You're awesome. (+1)
Person 2: Lol... you so crazy! (-1)
This kind of discussion would alternate between positive and negative in traditional emotional analysis, if anything, there's somewhat of a negative trend. However, in life, this is a goodish conversation (although person 1 is a bit playful and inconsistent...). So how can we capture that and keep track of the sentiment history?
Introducing EmoMo , the emotional momentum chart! Instead of simply plotting the emotional valence of each new sentence starting from zero, we keep adding the emotional valence to that of the preceding sentence. That way we can sort of come up with the emotional momentum. The following shows the impact of the EmoMo chart considering the example conversation from above (fancy highlighter use costs extra):
So the basic mechanic is go up the valence amount, and then go right, and then go up/down/neutral depending on the next sentence. Another variant is multiply the valence by the number of letters in the sentence, but it was a bit exaggerated and didn't add anything to these charts, so we didn't do that. BTW, the fastest way to do this is to simply plot the dplyr::summarize(cumsum(thingie)), but we wanted to plot showing the color of each line (green is positive, red is negative), so we made our own function. Anyway, without further ado, here are the emoMos for each movie:
Analysis: The trends here are pretty clear: in chick flicks, positive emotions increase over the course of the movie with only a few negative bits in between. In action movies, the trend is the opposite in all cases except "The Last Dragon" which, for some reason, is behaving like a chick flick again. Other than this film, all action movies tend to start with positive words and emotions and progress to a darker, more sinister place as the movie goes on...
Discussion: Because chick flicks tend to end well and not have that many negative emotions in them, it makes sense that the trend would be upwards with just a few little hiccups on the way. In action movies, the problem-solving, suspense/shock value involved and all the obstacles being put in the way of the hero must surely make the film increasingly negative as one event compounds another. For instance, in Die Hard poor John McClane has to deal with the increasingly elaborate problems created by Hans Gruber's hostage-taking. Luckily for him and other action movie heroes, the movie does have a little positive moment at the end, even if it never comes back to the original level of positivity at the beginning of the movie. Hell, we're just happy the hero got out alive! Not so for the heroes in Predator however...in this film, the film goes unrelentingly downhill from the beginning until the end and Dutch, the main character, is the only man standing after barely surviving the predator. Oh, spoiler alert btw.