Chord progressions of 5 000 songs!

Update: Full analysis and everything you need at my github

The database contains analyses of over 5000 songs*. These analyses are uploaded by users and allow for all these songs to be analyzed in bulk, as well as individually. One of these ‘all song’ analyses enables users to gather chord progressions on ALL songs (see the analysis file to see how i did it, using the hooktheory API and R). This allowed us to  create a Sankey visualization of all chord progressions in the Hooktheory database.

Check it out!


(If you prefer the dynamic version where you can play with the data, have a look at the following link: Click here!).

Explaining the figure a little bit: What interests us here is the type of chords used, regardless of the song’s scale, so that 1->5->6 in the figure above includes songs in key of C major that have the chord progression C->G->Am and songs in the key of A major that have A->E->F#m (if the songs have the same Roman numerals and are in the same relative major.  In reality, the API blends songs into rough categories regardless of the song’s mode, so it’s impossible to know for sure what we’re dealing with).

The chord progressions start from the left, and continue to the right. So for example, the transition 4->1->5->6 is one of the most popular ones… and is in fact present in 327 songs! Check em out!


In the API, chord probabilities are stated as a percent, such that the relative importance of each chord is known at each step (the normalization technique is not known). In their API, there were 29 chords available at the start of all progressions. For every subsequent transition, the number of chord options increases (which is expected), but for the purpose of this visualization, I only keep the original 29 chords for every transition for graphical purposes (I expect these 29 to be the most common anyway, so it’s not that much of a big deal). Also, since the thickness of the lines I’m plotting are in and of themselves probabilities, and the probability that you are on that chord is different, the “total thickness of each transition” isn’t the same. Very lazily, I just normalized all probabilities across each transition so that each transition “mega bar” is kind-of the same height. I’m sure there’s a better way to do it, the community is invited to improve!

My analysis is here, collaboration and/or remixing with attribution is welcome! (and if you improve the normalization method, please let me know and I’ll update this post).


  • There are several limitations to this assessment since the Hooktheory  API wasn’t really intended for this type of analysis. For example, it doesn’t mention whether “6” is “vi” (minor) or “VI” (major), which is kind of a big deal.
  • As mentioned, I selected only 29 chords to track… I might be missing a lot of progressions.
  • I have no idea if the normalization I applied is valid. I stopped trying when the output I got was semi-reasonable.
  • Blending everything together like this probably obscures some interesting patterns
  • I only did chord-progressions that were 4 steps long… I could have gone farther, but didn’t want to slam the API too much (as you can imagine, the number of queries increases drastically for each ‘step’. The Start -> First step was 1 query that yielded 29 chords, the 2->3 transition was 29 results for each of the 29 chords from step1 (so 29^2 queries), the 3->4 transition was 29^3 queries and so on) .
  • The songs have been uploaded by users from around the world, but represent mostly Western music. It would be awesome to do this with music from other parts of the world.

Possible Legend (thanks to HertzDevil):

The numbers are as they are represented in the Trends search string, here in EBNF metasyntax:

(* Roman numerals *)
numeral = “1” | “2” | “3” | “4” | “5” | “6” | “7”;
(* Borrowed modes, from Dorian to Locrian *)
mode = “D” | “Y” | “L” | “M” | “b” | “C”;
(* Figured bass for triadic and seventh chords *)
inversion = “6” | “64” | “7” | “65” | “43” | “42”;
(* Functions available for applied chords *)
function = “4” | “5” | “7”;
(* Basic chords or borrowed chords in the relative Major key *)
simple-chord = [mode], numeral, [inversion];
(* Applied chords *)
applied-chord = function, [inversion], “/”, numeral;
(* Chord progressions for both the Trends page and the API *)
chord = simple-chord | applied-chord;
trends-progression = chord, {“.”, chord};
api-progression = chord, {“,”, chord};

Parting thoughts:

  • Even though there is a great variety of chords and chord progressions, progressions involving 1,4,5, and 6 are favoured, probably because they ‘sound good’ to our brain. Nowhere is this better illustrated than by Axis of Evil’s song “4 Four Chord Song”. I definitely expected chord 1 to be used frequently, but I was expecting more variability.
  • Music is pretty to look at!
  • If you’re a musician, try weird progressions! I know that what sounds good sounds good, but jeez… how will humanity ever learn to be creative if everyone keeps doing the same thing over and over?


(thanks to Laure Belotti for editorial prowess)


EDIT: I’ve been getting great feedback on this post. Please check out the great conversations here and here. Giving credit where it’s due, turns out Axis of Evil wasn’t the first to talk about Chord-progression overusage, check out this dude. More credit where it’s due, turns out I wasn’t the first one to come up with this idea (great minds indeed…). And finally, I’m sure you nerds all checked out hooktheory, but take a look at these other resources also!


*EDIT2: Originally I was under the impression that the hooktheory database contained over 25000 songs… but a hooktheory admin clarified that in fact there’s just over 5000.

  29 comments for “Chord progressions of 5 000 songs!

  1. Max Galka
    2015/04/15 at 15:19

    Great analysis! Would really like to hear how these progressions sound.

  2. Chris
    2015/04/15 at 18:01

    “…how will humanity ever learn to be creative if everyone keeps doing the same thing over and over?”

    Actually, it’s the other way around: with the proliferation of the chromatic/even tempered scale, music has become homogenized and now microtonal scales present in indigenous folk music sounds ‘wrong’.

    There are few places where folk music still thrives as such with China and Turkey being examples.

    Look out of a show called ‘The History Of Music’ by Howard Goodhall (he wrote the theme music to Red Dwarf btw!) as he documents this in great detail.

    • Amit
      2015/04/16 at 18:53

      found that show. It’s on youtube. Thanks!

  3. 2015/04/16 at 09:52

    Awesome. Fascinating. You’ve got me captivated.

    I am both a Composer/Singer, and a Doctor of Chiropractic, and I must speak with you!

    The human nervous system does not operate like we’ve been taught. It’s not about electro-magnetic impulses traveling up and down nerve tissue. Scientists are showing that the nervous system operates through ACOUSTIC WAVES pulsating up and down nerves.

    There are specific acoustic wave patterns of how a person’s spine and nervous system both distorts (causing sickness, symptoms, discomfort, pain, disease, and death) and creates coherency (causing health, vitality, great function, abundance, etc).

    We all know that music has soothing and healing effects upon us. What I am FASCINATED with and believe you have started to discover, are the exact mechanics, the precise musi-mathematical formulas to help us heal, grow, and evolve. As individuals, and as a culture.

    I believe we could measure the oscillatory frequencies of the patterns you are discovering, and be somehow able to USE this to heal. Not haphazardly. But deliberately. That we could measure the “tone frequency” of a cancerous tumor, and say, for example, “Your prescription is to listen to the Bach Mass in Bb Minor followed by John Denver’s Rocky Mountain High on Day 1…..etc”, and the VIBRATIONAL FREQUENCIES emitted by the music would help that person’s body heal.

    This sounds far fetched, I know. But I also know it’s true. My number is 714-914-3243. I would LOVE to chat with you about this. You’ve possibly developed and created something here with staggering implications.

    • Amit
      2015/04/16 at 18:51

      Hey Scott, glad you liked it!

      It’s not far fetched at all. Sanskrit is all about vibrations, so we’ve known this for some thousands of years now! Google “sanskrit vibrations”… for example,

      I’m just not sure how THIS visualization helps clarify the relationship between vibrations and health. You are welcome to see my analysis and use it as you see fit, the link is provided above.

      Good luck!

  4. Jake
    2015/04/17 at 15:23

    Hooray! I’ve been looking for something like this for years! Thanks!

  5. 2015/04/17 at 15:33

    I would like to see the breakup between music genres. It would be interesting to see if individually, genres like alternative and Jazz would converge within their own category.

  6. Devin
    2015/04/17 at 18:15


    Is the data being used in the chart available for download? I generate music as a hobby and would be interested to see the results of the analysis in action.


    • Amit
      2015/04/17 at 23:10

      Yeah, the data I used is on my github… check the link. Or go to and get whatever you need yourself.

  7. Del
    2015/04/17 at 19:21

    This is blowing my mind.

  8. martin cohen
    2015/04/17 at 20:41

    Makes me think of a fake book I have with about 1000 songs. It has an index called “Find a song”. All you have to know is, starting with the first note, if a note is above or below the following note, for the first few notes. This is enough to identify any of the songs in the book.

  9. noko
    2015/04/17 at 21:56

    What’s the chord progression in Pachebel’s Canon in D?

    • Blh
      2015/04/18 at 11:27

      I V vi iii IV I IV V
      Over and over and over again…..

  10. Jay
    2015/04/18 at 14:26

    Can you please explain what the 56, 564, b7, etc codes mean? I understood the 1,2,3,4,5 but was a little lost for those. Sorry if its a noob question. Looks great, though!

    • Amit
      2015/04/18 at 15:33

      In my edit I posted a link to two discussions… you’ll find all answers there.

  11. david lincoln brooks
    2015/04/19 at 15:19

    Waitaminnit: You say we have no way of knowing whether “6” means the vi- or a VI major ? I’m sorry but that is more than a slight flaw, it is a huge lacuna, and draws the entire chart into question.

    I am a graduate of two music colleges and have no idea what all that “564” stuff means. However ambitious this chart might be, it fails to inform as well as it could or should.

    • Amit
      2015/04/19 at 18:52

      The analysis is provided, and I’m sure there’s gaping holes everywhere! Please do improve if you can, and I’ll post the improved version.

  12. William K. Knowles
    2015/04/20 at 20:48

    Honestly, I’ve been thinking about something like this but different. Perhaps we can talk or text about that sometime.
    It is a lot easier to think of an idea than to make it a reality.
    Yes, there are bugs and a lot more could be done, but what you have done is wonderful and appreciated. Good job!

    • Amit
      2015/04/21 at 14:03

      sure! What’s your idea?

  13. Akshay
    2015/08/31 at 18:58

    Hi.. the analysis was great actually.. however i was trying to do something on my own but could not find the dataset on looks like there API is broken right now. Can you please help me with the data. It l be really great. 🙂

Leave a Reply