I listen to the Roots

I love music and data science

Last updated on Apr 30, 2022 3 min read the roots, data science, statistics, data visualizations

As I am writing this post, I am listening to Lazy Afternoon by the Roots and I am reminiscence my college years. Those were the days! But on a different note I would like to analyze how The Roots changed in terms of their music regarding one of their first albums (Do You Want More?!!!??! - 1995) to one of their most recent albums (Undun - 2011).

Did Jimmy Fallon change one of my favorite hip artist groups?

via GIPHY

They join the Jimmy Fallon’s show in 2009.

Let’s Load the appropriate libraries

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(spotifyr)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

access_token <- get_spotify_access_token()

Step 1: Extract the Data

the_roots_df <- get_artist_audio_features('the roots')

Step 2a: Manipulate the data to get relevant data

mod1_the_roots_df <- the_roots_df %>% 
  filter(album_name %in% c('Do You Want More?!!!??!','Undun'))

Step 2b: Manipulate the data to get relevant data

Issues that arise with Spotify is that some artists can upload multiple versions of the same album. So what I did is just find the album id that is most recent for both albums.

mod2_the_roots_df <- the_roots_df %>% 
  filter(album_id %in% c('14dfGE6B5TLYdrelQ7AOsa','3N0wHnD5Rd8jnTUvNqOXGz')) %>% 
  mutate(m1_valence = 2*valence - 1)

Step 3a: Visualize the the data

We want to see if there is a difference between the albums in terms of:

Valence vs. energy

df = data.frame(x = c(0,1), y = c(0,1))
p1 = df %>% 
  ggplot(aes(x = x, y = y)) +
  geom_blank() +
  geom_vline(xintercept = 0,size = 1) +
  geom_hline(yintercept = 0.5,size = 1) +
  scale_x_continuous(limits = c(-1,1), 
                     expand = c(0, 0),
                     labels = label_number(accuracy = 0.1)) +
  scale_y_continuous(limits = c(0,1), expand = c(0, 0))  +
  theme_bw() +
  theme(axis.title = element_text(size = 18, face = 'bold')) +
  geom_rect(aes(xmin=-1, xmax=0, ymin=0, ymax=1),alpha = 0.15, fill = 'red') +
  geom_rect(aes(xmin=0, xmax=1, ymin=0, ymax=1),alpha = 0.15, fill = 'blue') +
  labs(x = 'Valence',y='Energy') +
  theme(axis.text = element_text(size = 12, face = 'bold'),
        plot.margin = margin(0.3, 0.5, 0.1, 0.5, "cm")
  ) +
  annotate("text", x=-0.5, y=0.25, label= "Sad",size = 15.5, color = 'white') +
  annotate("text", x=0.55, y=0.25, label= "Chill",size = 15.5, color = 'white') +
  annotate("text", x=-0.5, y=0.75, label= "Anger",size = 15.5, color = 'white') +
  annotate("text", x=0.55, y=0.75, label= "Happy",size = 15.5, color = 'white')

p1

p1 +
  geom_point(data = mod2_the_roots_df,aes(x = m1_valence, y = energy, color = album_name)) +
    scale_color_manual(values=c('#999999','#E69F00'))

I listen to the Roots

Step 1: Extract the Data

Step 2a: Manipulate the data to get relevant data

Step 2b: Manipulate the data to get relevant data

Step 3a: Visualize the the data

Immanuel Williams PhD

Lecturer of Statistics

Related