Sentiment analysis on my girlfriend’s text messages

When I told my friends that I wanted to give my girlfriend an infographic of us (centered around a sentimental analysis of our texts) as a gift for our first anniversary, most of them told me that was a terrible idea. Yeah… well… CHALLENGE ACCEPTED!! Without further ado, this is what love looks like:


What… um…. what are we looking at?

This is a plot of the aggregate sentiment per day per person of our skype and whatsapp messages. We don’t sms and email was:

a) annoying to extract [does anyone know how  an easy way to get gmail into R?], and

b) one email consists of lots of words, few times per day… as opposed to many short messages per day, therefore I didn’t think it was right to mix with text messages.

The sentiment was evaluated by comparing to positive and negative word lists. In other words: When I say that “we are getting sweeter”, I can actually prove it (within a reasonable margin of error)*.

How come she didn’t break up with me on the spot?

OK ok ok…. I didn’t just give her this, I made this part of a whole infographic  about our year together with wordclouds and other stuff that made it lighter. I also made it personal by calling out specific individual days from that plot and displaying the text within to add context to each point. So that’s when you realize each point is actually a day from our lives, some good, some bad… it’s like a memory thing. Especially true because those few I called out were special days in which we texted especially nice things. It was sweet… I guess u had to be there. Also, the first year anniversary is paper, so it’s perfectly apropos. Also2, I spent a TON of time on this. Chicks dig it when you spend effort on them. Also3, she’s dating a geek, she expects this kinds of thing from me and loves it.

How did I do it?

I typically post my analyses to github, but I won’t this time for obvious reasons… here’s the general flow of how I did it:

  1. Find someone to love. Write lots of text messages to each other for a year.
  2. Get the logs. I used whatsapp (email yourself the whole log, it’s in the settings) and skype (at the time I did the analysis you could get up to 6 months of history. Just copy paste it into a text file)
  3. Clean the logs. This part is super annoying. Every time we were texting and anyone pasted in something from somewhere else (like a link or copypasting from another conversation), it breaks the line-number scheme. There might be better ways to clean it, but for me it was a bit of regex, a LOT of manual cleaning and iterating. There are also a lot of encoding problems if your logs are in more than one language, and lastly, not all emoji translates to text nicely.  If they didn’t, I just deleted them… which sucks (This is kind of a big deal since there’s a lot of sentiment in emojis 🙁 :'(. Somebody should come up w/ a emoji sentiment valence table for whatsapp). What you want in the end is a text file that has 3 columns: Timestamp, name, clean text, seperated by a unique delimiter, for example “|”. Keep munging till you have that.
  4. Read Whatsapp log into R
    1. Realize that the logs within the  current year don’t have the year in the timestamp, so add it manually.
  5. Read in Skype logs and combine w/ the Whatsapp
  6. Realize that the timestamps are different between the logs. Pull your hair out remembering how to deal w/ dates in R and munge and munge till they are the same. (no, I’m not going to learn to use lubridate, I’m not a quitter).
  7. Done! Now start analyzing!
  8. Sentiment analysis- compare each individual text message against the sentimental Lexicon from Hu and Liu.
    1. PROTIP – for easy mode, use the score_sentiment function from Jeffrey Breen.
    2. Cap positive sentiment greater than 4 to 4… that’s good enough. I guess you could cap negative sentiment as well, but I didn’t have the need.
    3. To create the chart above, aggregate the sentiment scores PER PERSON PER DAY. Now you have two sets of dots, one for you and one for your lover. Plot those bad boys and add the smoothing!
    4. Now that you have the sentiment analysis of each text message, other fun things you can do (not shown here, but you’ll get the picture):
      1. When and how do we communicate? and at what time of day are we sweetest and least-sweet
      2. Are we sweeter on Skype or Whatsap?
      3. Sentiment-sensitive wordclouds, etc

*OK fine, but what does it mean?

OK fine, it don’t mean a goddam thing but it is interesting to analyze anyway! The rise in sweetness halfway through the year is due to the fact that we were apart, and were forced to be sweet by text and calls more than in person. Interesting that the texts stayed sweet after that. Also interesting that the amount of communication since then really increased.

With regards to the future…OF COURSE in a normal relationship, text messages start off like this:

“I’m thinking of you, sleep with the angels sweet one”

and end up like this:

“Did you forget the milk?!”

That’s just what happens in relationships, because we’ve all got stuff to do and when you share your life with someone, you become part of a team, and from time to time, the team needs milk and sometimes that milk is forgotten for extremely valid and completely unavoidable reasons. So eventually we will get less sweet via text message and what will that mean? Probably nothing at all. Anyway, what do I care? At least we’re getting sweeter now. 🙂 I’ll worry bout tomorrow tomorrow.

Joking aside, if nothing else, doing analyses like this force people like me to TRY EXTRA HARD to be sweet even if it’s not necessary. And intention when text-messaging is important since there’s NEVER any context to text messages and misunderstandings are common.

So keep up the sweet text-messages, geeks of the world! Don’t want the trendline to go negative, do we?

Edited by Laure Belotti


UPDATE: For more “Love data”, check out other people that analyzed their partners’ text messages (HERE and HERE and HERE), two people that hacked online dating for their own purposes (HERE and HERE), and of course, the motherload of Love-data: Enjoy!


  8 comments for “Sentiment analysis on my girlfriend’s text messages

  1. J B
    2016/02/04 at 17:17

    This is a very interesting bit of personal research. I really enjoyed reading the report and following your conclusions regarding the results.

    In the meantime, Novak et al. (2015) have constructed an emoji sentiment lexicon comprising 751 unicode emoji. The authors had annotators label Tweets that contained emojis for sentiment and thereby derived a sentiment value for a given emoji. Find the list as a website here:

    Maybe you can try to scrape the list and apply it to your data.

    The paper is: Kralj Novak P, Smailovi? J, Sluban B, Mozeti? I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296

    • Amit
      2016/02/04 at 18:28

      mindblown! Thanks for that link, it definitely looks scrapeable and useful! I don’t remember my old original corpus anymore, but I feel like it wasn’t giving the unicode links… more characters that looked like an encryption problem… but I DEFINITELY will look for something to do with these! Thanks again.

      • 2017/02/06 at 16:24

        Hi nice topics, analysis and an engaging narrative to go with it. Have you been able to incorporate sentiment value of emojis. Thanks to @ma_salmon who shared this link.

  2. CJH
    2017/02/01 at 19:00

    Amit! You are awesome! I just randomly came across this and have been compiling the text data from a group thread of 22 of my best friend since were kids that has been going strong for about 2 years now! So helpful! Might have to hit you if I get stuck. Best, Conan

    • Amit
      2017/02/01 at 19:03

      Ya man, feel free to hit me up! reach out on twitter @vizmonkey or join our rusers slack channel by going to If nothing else, I can think of about 3 fun questions to check for!

  3. CJH
    2017/02/03 at 17:34

    Thanks man! I will do that for sure. What are the 3 questions!?!
    Best, Conan

    • Amit
      2017/02/04 at 03:43

      Well obviously key-word counts per person (after stopwords), then I’d be interested in a network chart of who answers to whom most often or some sort of shifting topic-participation… hrm… would have to think bout how to implement that. You should do who are the loudest and the quietest (in time series so that we can see as a function of time), maybe a steamgraph would be good for this? But do aggregate monthly otherwise it’ll look like garbage. Sentiment analysis on each is a good one. and then based on the word frequencies I think it would be good to conclude shite bou teach person… like “the funniest”, “the academic” etc… people will get a ckick outta that. Arrange all the info in a flexdashboard so it looks polished and ur done! Hrm… OR you could do monthly data uploads and then develop metrics about how the converrsation has gone this month… but that might be too nerdy. Have fun bro!

Leave a Reply