## 2 0.6098073 
## 3 0.44251 
## 4 0.03555164 
## 5 0.02601014 
## 6 0.02841965 
## 7 0.02566342 
## 8 0.02302715 
## 9 0.02371691 
## 10 0.0224799

By calculating silhoutte distance, and visualizing clustering in first two pc plots, we think five clusterings will be a good result. From the two cluster plot, we also can see 5 plot makes sense.

##    1    2    3    4    5 
## 1626   25    2    1    1
mean(subset(fclust, label == 1)$loudness)
## [1] -8.195274
mean(subset(fclust, label != 1)$loudness)
## [1] -19.96941
mean(subset(fclust, label == 1)$nbeat)
## [1] 487.4502
mean(subset(fclust, label != 1)$nbeat)
## [1] 429.0345
mean(subset(fclust, label == 1)$tempo)
## [1] 126.8395
mean(subset(fclust, label != 1)$tempo)
## [1] 108.29

We using random forest and lasso to do feature selection, and select features which best affects user similarity between two songs. Then, we use these features to do clusterings, again. We found that our songs are separated into two groups clearly, mainstream and niche music. We then compare some key features between these two groups. Eventually, we found that nowadays, people tend to listen louder music with faster tempo.