blabla

## 2 0.6098073 
## 3 0.44251 
## 4 0.03555164 
## 5 0.02601014 
## 6 0.02841965 
## 7 0.02566342 
## 8 0.02302715 
## 9 0.02371691 
## 10 0.0224799

By calculating silhoutte distance, and visualizing clustering in first two pc plots, we think five clusterings will be a good result. From the two cluster plot, we also can see 5 plot makes sense.

##    1    2    3    4    5 
## 1626   25    2    1    1

mean(subset(fclust, label == 1)$loudness)

## [1] -8.195274

mean(subset(fclust, label != 1)$loudness)

## [1] -19.96941

mean(subset(fclust, label == 1)$nbeat)

## [1] 487.4502

mean(subset(fclust, label != 1)$nbeat)

## [1] 429.0345

mean(subset(fclust, label == 1)$tempo)

## [1] 126.8395

mean(subset(fclust, label != 1)$tempo)

## [1] 108.29

We using random forest and lasso to do feature selection, and select features which best affects user similarity between two songs. Then, we use these features to do clusterings, again. We found that our songs are separated into two groups clearly, mainstream and niche music. We then compare some key features between these two groups. Eventually, we found that nowadays, people tend to listen louder music with faster tempo.

blabla

April 27, 2016