An Analysis of Global and Regional Mainstreaminess for Personalized Music Recommender Systems

Markus Schedl and Christine Bauer

Department of Computational Perception, Johannes Kepler University Linz Altenberger Straße 69, A-4040 Linz, Austria

E-mail: markus.schedl@jku.at; christine.bauer@jku.at

Received 30 January 2018; Accepted 01 February 2018;
Publication 20 April 2018

Abstract

The music mainstreaminess of a listener reflects how strong a person’s listening preferences correspond to those of the larger population. Considering that music mainstream may be defined from different perspectives, we show country-specific differences and study how taking into account music mainstreaminess influences the quality of music recommendations.

In this paper, we first propose 11 novel mainstreaminess measures characterizing music listeners, considering both a global and a country-specific basis for mainstreaminess. To this end, we model preference profiles (as a vector over artists) for users, countries, and globally, incorporating artist frequency, listener frequency, and a newly proposed TF-IDF-inspired weighting function, which we call artist frequency–inverse listener frequency (AF-ILF). The resulting preference profile for each user u is then related to the respective country-specific and global preference profile using fraction-based approaches, symmetrized Kullback-Leibler divergence, and Kendall’s τ rank correlation, in order to quantify u’s mainstreaminess. Second, we detail country-specific peculiarities concerning what defines the countries’ mainstream and discuss the proposed mainstreaminess definitions. Third, we show that incorporating the proposed global and country-specific mainstreaminess measures into the music recommendation process can notably improve accuracy of rating prediction.

Keywords

music mainstreaminess
music recommender systems
artist frequency-inverse listener frequency
popularity
country-specific differences

1 Introduction

In the era of digitalization, music has become easier to access than ever: a tremendous number of musical recordings are readily available to consume on online platforms such as YouTube, Spotify, or iTunes. This opportunity to access a large number of musical works, though, results in information overload (8), which requires new tools to assist users in choosing from the huge amount of musical content (39). Music recommender systems (MRS) have, thus, become a significant research topic over the past few years (11; 43; 6) and current online music platforms typically use some sort of MRS.

In general, the idea behind recommender systems is to assist users in searching, sorting, and filtering the vast amount of information available (29). MRS are specifically built to assist users in navigating through the myriad of available musical recordings and provide them with music suggestions that would fit the respective user’s interest or, respectively, automatically generate consecutive recommendations that build a personalized playlist (43). The challenge is “to propose the right music, to the right user, at the right moment” (24).

Various automatic approaches to music recommendation have been proposed (45). As summarized in the review by Schedl et al. (45), most MRS rely mainly on some sort of content-based filtering (5) or collaborative filtering (26). Content-based MRS may, for instance, consider acoustic similarity information on the song level (49), or use the song’s music genre, or the performing artist of the music item to quantify similarities (27). MRS employing collaborative filtering do not require exogenous information about neither users nor music items. Instead, a user is suggested music listened to by users with similar preferences or listening patterns (34).

Another variant, popularity-based recommendation approaches, resemble a primitive form of collaborative filtering, where items are recommended to users based on how popular those items are overall among other users. Such approaches are built on the assumption that the target user is more likely to like a very popular item than one of the far less popular items (11; 44). Popularity-based recommendation approaches are particularly applicable in hit-driven domains—such as in the music industry. Accordingly, popularity-based MRS approaches are widely adopted to complement other approaches in cold start situations, when there is limited information about new users and/or items available in the system (13; 50).

One approach for considering popularity in the music domain is to describe music listeners “in terms of the degree to which they prefer music items that are currently popular or rather ignore such trends” (38). Harnessing music mainstreaminess in combination with collaborative filtering techniques tends to deliver better results with respect to music recommendation accuracy and rating prediction error than pure collaborative filtering approaches alone (16; 44; 48; 41).

However, a limitation of existing work on quantifying a user’s music mainstreaminess is that music mainstream is viewed from a global perspective. There exist regional peculiarities to mainstream, though (7). For instance, music consumption behavior is affected by culturally influenced music preferences, market regulations, local radio airplay, etc. (e.g., (47; 20; 10; 35)). In other words, regional aspects shape users’ music preferences and music consumption behavior. Accordingly, we can assume country-specific differences concerning which artists are popular.

With respect to the music recommendation research domain, the definition of specific measures that can capture a user’s mainstreaminess (i) on both, a global and a country-specific level, and (ii) in ways that can easily be operationalized in music recommendation is a new target of research (e.g., (41; 7)). Calling on this, the main contributions of this paper are three-fold: (i) the definition of several novel measures for user mainstreaminess, considering both a global and a regional, country-specific basis, (ii) the illustration of country-specific peculiarities of these mainstreaminess definitions, and (iii) an analysis of the performance of the proposed mainstreaminess measures for personalized music recommendation.

The remainder of the paper is organized as follows. In Section ,2 we provide a brief overview over existing work on mainstreaminess and popularity in music recommendation, and introduce the dataset on which we conduct our experiments. We then detail the proposed mainstreaminess measures in Section 3 and provide examples that show their value to distill the regional mainstream, in addition to a global one. In Section 4, we discuss for a few prototype countries the relationship between their regional mainstream in comparison to the global mainstream. Section 5 shows how to exploit the proposed mainstreaminess measures in collaborative filtering recommendation and highlights the additional values of doing so. Eventually, we round off the paper in Section 6 with a conclusion and directions for future research.

2 Conceptual Foundations and Related Work

2.1 Music Popularity and Mainstreaminess

In the context of recommender systems, popularity-based approaches are widely adopted in numerous domains, including music (13; 23; 50), news (51), or product recommendation in electronic commerce in general (1). Popularity is thereby typically constructed as a general consensus of a group’s attitude about entities (23).

While various ways exist to define and measure popularity (for instance, in terms of sales figures, media coverage, etc.), in the field of MRS, music popularity is frequently characterized by using the total playcounts of a music item—i.e., the number of listening events the music item realizes by all listeners in total cf. (11). With respect to music popularity by using playcounts, the long tail concept as described in (2) is specifically applicable to the (online) music industry (12); on online music platforms there is a concentration of playcounts on the most popular music items (the head), and then there is a long tail of less popular items (11; 9).

A more general concept to popularity concentration is referred to as mainstream. Although literature in the field of popular music studies and popular music cultures references to mainstream frequently, the term itself remains rather poorly defined, cf. e.g., (4). According to the Oxford Dictionaries, mainstream is defined as “The ideas, attitudes, or activities that are shared by most people and regarded as normal or conventional”. Due to the strong connection of the concepts, the terms mainstream and long tail are often used interchangeably. The mainstream is thereby frequently also referred to with other terms and phrases (e.g., hits (11), the head (15)) to circumscribe the phenomenon; the overall concept is also called, for instance, the hit-driven paradigm (11), the long-tail concept (11; 2), etc.

In MRS research, the user feature music mainstreaminess of a user (16; 44) essentially describes whether and how strong a user’s music listening preferences correspond to those of the overall population. While other listening-centric features, for instance, serendipity (52) or novelty (14), are frequently exploited when modeling a user’s music consumption behavior and providing music recommendations, music mainstreaminess is a rather new target of research (16; 44; 48). Thereby, the mainstreaminess feature is used to analyze a user’s ranking of music items and compare it with the overall ranking of artists, albums, or tracks (48).

2.2 Related Work on the Quantification of Music Mainstreaminess

Formal definitions to measure the level of music mainstreaminess of a user are scarce in literature (e.g., (44; 48; 41)). Most existing approaches quantify music mainstreaminess as fractions of the target user’s playcounts among the playcounts of the overall population. A limitation of this approach is that it disproportionately privileges the absolute top hits (41), which is problematic for long-tail distributions, which are present for music item popularity on online music platforms. There is a high concentration of demands on the most popular items and a long tail of less popular items. Privileging the top hits leads to low performance of fraction-based user models of mainstreaminess in collaborative filtering approaches (41).

To overcome this limitation, Schedl and Bauer (41) proposed measurement approaches based on rank-order correlation and Kullback-Leibler (KL) divergence. However, also their work shares with existing fraction-based approaches to quantify mainstreaminess that music mainstream is viewed from a global perspective and does not take regional peculiarities of music mainstream into account.

2.3 Cultural and Regional Aspects Influencing Music Mainstreaminess

As human preferences and behavior are rooted and embodied in culture (22), also music preferences and music consumption behavior are affected by cultural aspects (17; 20; 47). For instance, music perceptions vary across cultures (25; 30; 46; 47) and music preferences are shaped by cultural aspects (3). For example, in the European countries, pop music preferences disconverge rather than converge (10).

Still, not only cultural aspects, but also regional (e.g., country-specific) mechanisms affect music consumption; particularly important are national market structures—including distribution channels, legislation, subsidizing, and local radio airplay—that vary across countries (33; 35; 19). In other words, regional aspects shape users’ music preferences and music consumption behavior. Being aware that culture does not equate nation (21; 28), we emphasize that cultural aspects as well as national market structures contribute to users’ music consumption preferences and behavior. Accordingly, we can assume country-specific differences concerning the popularity of artists. Against this background, we focus on country-specific differences in the paper at hand.

Closest to our work is the study presented in (48), which analyzes the recommendation performance of mainstreaminess (spelled “mainstreamness”) and a user’s country, among other features. Our work significantly differs from (48) in various regards: First, we use an open dataset to allow for replication. Second, (48) propose only one global mainstreaminess measure that compares a user’s preferences to the overall dataset (global population), while we define mainstreaminess in various ways (based on fractional, divergence, and rank correlation functions) and at various levels (global and country-specific). Third, we also propose a novel weighting approach based on “inverse listening frequency” that highlights artists popular in a specific country, thus, contributing to its mainstream, but not necessarily on a global level.

2.4 Data Preparation

For our experiments, we deploy the LFM-1b dataset (39), which covers 1,088,161,692 listening events of 120,322 unique users, who listened to 32,291,134 unique tracks by 3,190,371 unique artists. The core component of the dataset is the cleaned user-artist-playcount matrix (UAM) containing the number of listening events of 120,175 users to 585,095 unique artists. The distribution of listening events of the Last.fm data corresponds to a typical long-tail distribution (11). As 65,132 user profiles do not contain any country information, we exclude those from our experiments since they do not contribute to defining a country’s mainstreaminess.

3 Formalizing Mainstreaminess

When describing how well a user’s listening preferences reflect those of an overall population, e.g., globally or within a country, what is considered mainstream depends on the selection of a population; this is a phenomenon which we will also show in our analysis. Consequently, we propose several quantitative measures for user mainstreaminess, both on a global and on a country-specific level, depending on the selection of the population against which the target user is compared. Our approach is inspired by the well-established monotonicity assumptions in text processing and information retrieval (37): the TF-IDF (term frequency–inverse document frequency) weighting. Based on this assumption, our proposed mainstreaminess measures rely on the concepts of artist frequency (AF), listener frequency (LF), and artist frequency–inverse listener frequency (AF-ILF).

We define AF_a,U₁ as the sum of the number of tracks by artist a listened to by a set of users U₁. Note that U₁ may be a single user u, all users in a country c, or the entirety of users in the collection (i.e., the global population g). Accordingly, we define LF_a,U₂ as the number of listeners of artist a within a user population U₂. And we eventually define AF ⋅ ILF_a,U₁,U₂ as in Equation 1. We set AF ⋅ ILF_a,U₁,U₂ = 0 iff LF_a,U₂ = 0.

A F ​ \cdot ​ I L F_{a, U_{1}, U_{2}} = \log (1 + A F_{a, U_{1}}) \cdot \log (1 + \frac{| U_{2} |}{L F_{a, U_{2}}}) (1)

Note that U₁ and U₂ may represent a single user, all users in the same country, or all users in the dataset (cf. Subsection 2.4). Therefore, this definition allows us to easily formalize both the global and the regional definitions of mainstreaminess, by varying U₁ and U₂. The ILF weighting term can be integrated when computing the preference profile for a user or for a country, e.g., AF ⋅ ILF_a,u,c, where U₁ contains only the user u and U₂ all users in country c (to which u belongs), or AF ⋅ ILF_a,c,g, where U₁ is composed of all users in country c (to which u belongs) and U₂ of all users in the dataset. Using ILF is motivated by the fact that, when determined by AF_a,c or LF_a,c, the top artists in each country c are often identical or very similar to the global top artists (cf. Tables 1, 2, 3, and 4). In order to uncover the respective country-specific mainstream, we therefore use ILF_a,g to penalize globally popular artists.

Table 1 Global top artists in the LFM-1b dataset, according to artist frequency (AF) and listener frequency (LF), considering the 53,258 users with country information

Artist	AF	Artist	LF
The Beatles	2,985,509	Radiohead	24,829
Radiohead	2,579,453	Nirvana	24,249
Pink Floyd	2,351,436	Coldplay	23,714
Metallica	1,970,569	Daft Punk	23,661
Muse	1,896,941	Red Hot Chili Peppers	22,609
Arctic Monkeys	1,803,975	Muse	22,429
Daft Punk	1,787,739	Queen	21,778
Coldplay	1,755,333	The Beatles	21,738
Linkin Park	1,691,122	Pink Floyd	21,129
Red Hot Chili Peppers	1,627,851	David Bowie	20,602

Table 2 Top artists for Finland (1,407 users), according to artist frequency (AF), listener frequency (LF), and artist frequency–inverse listener frequency (AF-ILF)

Table 2

Artist	AF
Stam1na	105,633
In Flames	97,645
CMX	90,032
Kotiteollisuus	82,309
Turmion Kätilöt	78,722
Amorphis	78,159
Nightwish	75,742
Mokoma	73,453
Muse	69,507
Metallica	69,499
Artist	LF
Metallica	703
Nightwish	695
Muse	693
Daft Punk	675
Queen	671
System of a Down	663
Coldplay	634
Nirvana	614
Pendulum	613
Iron Maiden	609
Artist	AF-ILF
St. Hood	70.526
The Sun Sawed in 1/2	67.490
tiko-μ	66.546
Worth the Pain	66.058
Cutdown	65.247
Katariina Hänninen	64.955
Game Music Finland	64.835
Daisuke Ishiwatari	63.565
Altis	63.235
Redrum-187	62.428

Table 3 Top artists for Italy (972 users), according to artist frequency (AF), listener frequency (LF), and artist frequency–inverse listener frequency (AF-ILF)

Artist	AF
Radiohead	68,160
The Beatles	65,498
Pink Floyd	60,558
Fabrizio De André	53,928
Muse	48,168
Depeche Mode	42,586
Afterhours	42,473
Verdena	42,338
Sigur Ros	41,748
Arctic Monkeys	39,755
Artist	LF
Radiohead	556
Pink Floyd	539
The Beatles	505
David Bowie	500
Muse	500
Nirvana	497
Coldplay	475
The Cure	466
Depeche Mode	459
Daft Punk	457
Artist	AF-ILF
CaneSecco	68.451
DSA Commando	66.049
Veronica Marchi	65.864
Train To Roots	65.459
Alessandro Raina	64.228
Machete Empire	63.915
Danti	62.958
Dargen D’Amico	62.453
	62.228

Aquefrigide	61.663

Table 4 Top artists for Turkey (479 users), according to artist frequency (AF), listener frequency (LF), and artist frequency–inverse listener frequency (AF-ILF)

Artist	AF
Pink Floyd	68,887
Metallica	42,784
Daft Punk	42,020
Iron Maiden	34,174
Radiohead	31,390
Massive Attack	30,669
The Beatles	27,951
Opeth	25,744
Depeche Mode	25,075
Dream Theater	24,286
Artist	LF
Pink Floyd	292
Radiohead	289
Metallica	268
Coldplay	261
Nirvana	251
Massive Attack	249
The Beatles	240
Red Hot Chili Peppers	240
Queen	238
Led Zeppelin	236
Artist	AF-ILF
Cüneyt Ergün	64.473
Floyd Red Crow Westerman	61.955
Fırat Tanış	58.666
Acil Servis	58.439
Taste (Rory Gallager)	58.366
Mezarkabul	57.799
Rachmaninoff Sergey	57.733
Mabel Matiz	57.619
Grup Yorum	56.855
Yüzyüzeyken Konuşuruz	56.748

Tables 2, 3, and 4 illustrate the effect of this weighting. It shows the top artists for Finland, Italy, and Turkey, in terms of AF_a,c, LF_a,c, and AF ⋅ ILF_a,c,g, i.e., AF computed on the country level, ILF on the global level. As can be seen, the AF and even more the LF measures are not suited well to distill the essential mainstream of a country, except maybe for countries such as Finland that show a very specific music taste far away from the global taste (40). In contrast, AF-ILF is capable of identifying those artists that are popular in a specific country, but not worldwide.

Based on the above definitions, we compute preference profiles globally (PP_g), for a country (PP_c), and for a user (PP_u). Given the LFM-1b dataset (39), these profiles are 585,095-dimensional vectors containing the AF, LF, or AF-ILF scores over all artists in the dataset. Figure 1 provides an example by visualizing the preference profiles for Finland, a country that does particularly not correspond to the global music mainstream. Please note that artist IDs (on the x-axis) are sorted with respect to their global popularity in regards to the respective measure (AF, LF, or AF-ILF). As can be seen, while the distributions of the AF- and LF-based preference profiles follow a similar trend, the AL-ILF weighting considerably increases the importance of globally less popular, but country-wise more popular artists (also see Tables 2, 3, and 4).

Figure 1 Artist frequency (AF), listener frequency (LF), and artist frequency–inverse listener frequency (AF-ILF) for Finland. Artist IDs (x-axis) are sorted by global AF, LF, or AF-ILF values, respectively.

Table 5 Proposed music mainstreaminess measures on the user level. Terms denote the following: F stands for the fraction-based approach, D refers to the symmetrized Kullback-Leibler divergence approach, and C is used as abbreviation for the approaches based on rank-order correlation according to Kendall’s τ. A is a list of all artists; ^AF denotes the sum-to-unity normalized AF value; ranks(PP_u^W) represents the real-valued preference profile converted to ranks, i.e. the vector containing all normalized item frequencies of user u, with respect to the frequency weighting approach W (AF or LF); in case of AF ⋅ ILF, $r a n k s (P P_{u}^{W})$ is extended to $r a n k s (P P_{u, c}^{A F \cdot I L F})$ , i.e. AF computed for user u, ILF on country c, or $r a n k s (P P_{c, g}^{A F \cdot I L F})$ , i.e. AF computed on country c, ILF globally. Note that we invert the values of some measures (F and D) in order to ensure that higher values always indicate closer to the mainstream

images

Exploiting the profiles, we propose three categories of mainstreaminess measures on the user level: fraction-based (F), symmetrized Kullback-Leibler divergence (D), and rank-order correlation according to Kendall’s τ (C). The adoption of fraction-based measures is motivated by their easy interpretability (due to the share of overlap between a user’s and the global or a country’s preference profiles). Kullback-Leibler divergence is a well-established method to compare distributions (discrete preference profiles in our case). Employing rank-order correlation is motivated by the fact that conversion of feature values to ranks has already been proven successful for music similarity tasks (32).

We provide formulas for the specific measures in Table 5, where ^ X denotes the sum-to-unity normalized vector X and ranks(PP_U^W) represents the real-valued preference profile converted to ranks, i.e. the vector containing all normalized item frequencies of user u, with respect to the frequency weighting approach W (AF or LF). When using AF ⋅ ILF, $r a n k s (P P_{u}^{W})$ is extended to $r a n k s (P P_{u, c}^{A F \cdot I L F})$ , i.e. AF computed for user u, ILF on country c, or $r a n k s (P P_{c, g}^{A F \cdot I L F})$ , i.e. AF computed on country c, ILF globally. Note that we invert the results of the fraction-based formulations and the symmetrized KL-divergences in order to be consistent in that higher values always indicate closer to the mainstream, while lower ones indicate farther away from the mainstream.

4 Analysis of Global Versus Country-Specific Mainstream

In order to identify archetypal countries for mainstreaminess distributions, we investigate these distributions for the 47 countries in the dataset (cf. Subsection 2.4) that contain at least 100 listeners. Figure 2 illustrates four different examples, showing the country-specific listener frequency for the global top 50,000 artists, for the countries United States (US), Finland (FI), Brazil (BR), and Japan (JP). In all four plots, artists are sorted with respect to their global popularity in decreasing order along the x-axis. The black curve indicates the global trend, adjusted to the listener frequency in the respective country. Looking at the United States, we see that—except for some jitter—the distribution of listener frequencies among artists quite closely follows the global distribution (black curve). For Brazil, and even more for Finland, in contrast, a second trend curve becomes visible, indicating that in addition to the global trend (evidenced by a substantial amount of items along the black curve), certain artists within the countries are much more popular than expected from a global perspective. In Finland and Brazil, these country-specific popular artists follow approximately the same pattern as the global trend curve. In contrast, Japan does not reveal a clear secondary trend curve; there are rather many individual outliers that do not seem to follow a particular pattern.

To quantitatively identify and analyze the country-specific outliers that deviate from the global trend, we next use a sliding window of 5 artists, which we run over the top 1,000 AF, LF, and AF-ILF values of artists, sorted in the same way as in Figure 2, i.e., in decreasing order of global popularity, again for the top 47 countries in the dataset. We compute the mean AF, LF, and AF-ILF value within each window and relate it to the corresponding value of the first artist in the window. If this fraction exceeds a certain threshold, we consider the corresponding artist an outlier. For our experiments that we present in the following, we set that threshold to 100%, meaning that an outlier’s value must be at least twice as large as the mean value in its window (in case of a positive outlier); or at most 50% of the value of the mean value in its window (in case of a negative outlier).

In doing so, we identify country-specific outliers that do not correspond to the global trend, meaning that the identified artists are particularly more (if positive) or particularly less popular in the respective country. Table 6 shows examples of positive AF outliers for Finland. Among the most salient outliers, we find the Finnish metal band “Amorphis”, but also metal bands from neighboring countries such as “Soilwork” from Sweden.

Figure 2 Country-specific listener frequency (LF) for global top 50,000 artists, for the United States (US), Finland (FI), Brazil (BR), and Japan (JP). In all four plots, artists are sorted with respect to their global popularity in decreasing order. The black curve indicates the global trend, adjusted to the LF in the respective country.

Table 6 Results of outlier analysis for artist–frequency (AF) values in Finland. The first 20 positive outliers are shown together with their global rank and the difference between their AF values and the mean AF values in a window of size 5, succeeding the artist

Artist	Rank	Difference
In Flames	25	+162.74%
Katatonia	73	+112.78%
Amon Amarth	90	+102.17%
Pendulum	99	+124.77%
Children of Bodom	122	+120.17%
Sonata Arctica	134	+146.35%
Bullet for My Valentine	138	+105.89%
HIM	154	+103.20%
Lamb of God	169	+136.27%
Sabaton	195	+168.01%
Amorphis	203	+229.48%
Infected Mushroom	220	+101.34%
Kamelot	248	+110.62%
Gojira	255	+128.40%
Dimmu Borgir	275	+140.08%
Soilwork	288	+220.73%
Burzum	305	+105.12%
Finntroll	314	+165.20%
Fear Factory	328	+122.30%
Biffy Clyro	365	+140.82%

Table 7 shows the top country-specific positive outliers for Germany. The artist with the highest AF difference to the expected AF values in its neighborhood (window) is “Die Ärzte”, a German punk rock band. Also other German bands rank high (e.g., “Rammstein”, “Volbeat”, and “In Extremo”).

To exemplify also negative outliers, Table 8 shows for the United States, the first (highest global position) positive and negative outliers that appear along the trend when using the AF measure. Among the negative outliers, we find mostly hard rock and metal bands, which corroborates previous findings that these genres are underrepresented in the United States compared to the global mean (42).

Table 7 Results of outlier analysis for artist–frequency (AF) values in Germany. The first 20 positive outliers are shown together with their global rank and the difference between their AF values and the mean AF values in a window of size 5, succeeding the artist

Artist	Rank	Difference
Rammstein	13	+115.87%
Rise Against	59	+128.29%
Mumford & Sons	85	+100.64%
Amon Amarth	90	+122.67%
Enter Shikari	179	+128.08%
Grateful Dead	261	+266.76%
Volbeat	287	+138.91%
3 Doors Down	298	+112.16%
Finntroll	314	+105.71%
Machine Head	325	+115.04%
The Gaslight Anthem	352	+102.57%
Biffy Clyro	365	+142.99%
Flogging Molly	395	+102.68%
Die Ärzte	437	+310.54%
Simple Plan	462	+158.99%
Heaven Shall Burn	505	+173.12%
La Dispute	541	+132.26%
Emilie Autumn	543	+116.91%
In Extremo	563	+194.80%
Combichrist	565	+121.34%

Table 8 Results of outlier analysis for artist–frequency (AF) values in the United States. The first 20 positive and negative outliers are shown together with their global rank and the difference between their AF values and the mean AF values in a window of size 5, succeeding the artist

Artist	Rank	Difference
Radiohead	1	+101.42%
Rammstein	13	-60.13%
Nine Inch Nails	20	+101.68%
Nightwish	23	-54.26%
In Flames	25	-54.56%
AC/DC	36	-53.89%
Korn	39	-53.46%
Marilyn Manson	52	-56.09%
The White Stripes	70	+112.77%
Katatonia	73	-60.63%
Within Temptation	74	-63.20%
30 Seconds to Mars	81	-56.39%
Guns N’ Roses	82	-63.45%
Amon Amarth	90	-55.56%
Anathema	97	-54.23%
Avenged Sevenfold	101	-64.63%
Modest Mouse	105	+142.16%
Bring Me the Horizon	106	-54.01%
Limp Bizkit	116	-73.35%
Blur	129	-54.05%

5 Music Recommendation Tailored to User Mainstreaminess

To evaluate the proposed mainstreaminess measures (cf. Section 3) with respect to their ability to improve performance in music recommendation, we conduct rating prediction experiments, which is a common approach to recommender systems evaluation. For this evaluation, we use again the LFM-1b dataset of user-generated listening events from Last.fm (39), as discussed in Subsection 2.4.

5.1 Experimental Setup

While we are aware that a truly user-centric evaluation would be beneficial for this kind of research, conducting a user study on tens of thousands of users (or even only a representative subset of the users) is beyond the scope of this paper. We therefore stick to the common approach of quantifying the performance of a recommender system by conducting a rating prediction task. To this end, we normalize and scale the playcount values in the UAM to the range [0, 1000] for each user individually, assuming that higher numbers of playcounts indicate higher user preference for an artist.

We apply the common singular value decomposition (SVD) method according to (36) to factorize the UAM and in turn effect rating prediction. In 5-fold cross-validation experiments, we use root mean square error (RMSE) and mean absolute error (MAE) as performance measures.

To obtain a baseline, we first run the rating prediction experiment on the global group of 65,132 users and report results of the error measures in the first row of Table 9. To study the influence of both, the different mainstreaminess definitions and mainstreaminess levels on recommendation performance, we then create subsets of users for each combination of mainstreaminess measure and country with at least 1,000 users.¹ To this end, we split the users in each country into three (almost) equally sized subsets according to their mainstreaminess value: low corresponds to users in the lower 3-quantile (tertile) w.r.t. the respective mainstreaminess definition, mid and high, respectively, to the mid and upper tertile. In the individual experiments, all refers to the group of all users in each considered country, low only to the users in the lower 3-quantile (tertile) w.r.t. the respective mainstreaminess definition, mid and high defined analogously. Further, conducting the same experiment on all users in each country (user set all) allows for a comparison of a pure mainstreaminess filtering approach versus a combination of mainstreaminess filtering and demographic (country) filtering.

Table 9 Weighted root mean square error (RMSE) and weighted mean absolute error (MAE) for various mainstreaminess definitions and levels, i.e. user sets. Rating values are scaled to [0, 1000]. Experiments are conducted on the country level (except for first row using the complete UAM with random item selection in each fold, irrespective of country) and error measures are averaged (arithmetic mean) over all countries with more than 1,000 users and weighted by number of users in the respective country. In the individual experiments, all refers to the group of all users in each considered country, low only to the users in the lower 3-quantile (tertile) w.r.t. the respective mainstreaminess definition, mid and high defined analogously

Mainstreaminess	User Set	w.RMSE	w.MAE
Baseline (global UAM)		29.105	25.202
F_g:AF,u:AF	all	26.377	24.050
	high	3.714	1.308
	mid	12.574	9.887
	low	14.186	11.625
F_{g:AF,u:AF⋅ILF}	all	21.137	18.617
	high	3.681	1.299
	mid	11.035	8.191
	low	14.426	11.868
F_{g:AF⋅ILF,u:AF⋅ILF}	all	19.140	16.769
	high	11.777	9.121
	mid	13.396	10.833
	low	8.708	5.806
F_c:AF,u:AF	all	14.465	11.958
	high	3.723	1.309
	mid	8.681	6.112
	low	12.706	9.952
F_{c:AF⋅ILF,u:AF⋅ILF}	all	17.615	15.301
	high	9.237	6.648
	mid	3.686	1.305
	low	10.122	7.610

D_g:AF,u:AF	all	24.026	21.705
	high	10.561	8.024
	mid	9.854	7.299
	low	5.365	2.909
D_c:AF,u:AF	all	28.021	25.746
	high	5.365	2.912
	mid	13.510	10.840
	low	25.923	22.621
D_{c:AF⋅ILF,u:AF⋅ILF}	all	14.628	11.624
	high	3.656	1.281
	mid	7.035	4.515
	low	8.589	5.670
C_g:AF,u:AF	all	15.906	13.525
	high	3.680	1.291
	mid	7.443	4.472
	low	19.183	16.373
C_c:AF,u:AF	all	14.349	12.032
	high	3.687	1.290
	mid	4.270	1.833
	low	3.692	1.308
C_{c:AF⋅ILF,u:AF⋅ILF}	all	30.827	28.535
	high	7.680	5.187
	mid	4.825	2.340
	low	10.785	8.1084

5.2 Results and Discussion

Table 9 shows the error measures (RMSE and MAE) for different definitions and levels of mainstreaminess, averaged over all considered countries (cf. Subsection 2.4), RMSE and MAE weighted by the number of users in the respective country. In the following discussion, we concentrate on RMSE since it is more common and considers larger differences between predicted and true ratings disproportionately more severe than smaller ones.

As a general finding, our results show that tailoring the recommendations to a user’s mainstreaminess level (low, mid, high) leads to substantial error reductions, irrespective of the applied mainstreaminess measure. More specifically, C_c:AF,u:AF outperforms the other measures in four regards: First, it leads to the lowest overall RMSE of 14.349 (all). Second, the errors realized by C_c:AF,u:AF are also the lowest for each of the three user sets (low, mid, high). If better performance is achieved on a set with another measure, the difference is just in the third position after the decimal point. Third, C_c:AF,u:AF performs on each of the three user sets (low, mid, high) in a balanced way (weighted RMSE amounts to respectively 3.692, 4.270, and 3.687), whereas the other mainstreaminess measures yield a rather unbalanced picture since each of them performs on at least one set far worse than on the other(s), e.g., C_g:AF,u:AF with 19.183, 7.443, and 3.681, respectively, for low, mid, and high. Fourth, C_c:AF,u:AF performs well also on the low mainstreaminess user set (low), which is a user segment that is typically difficult to satisfy.

The fraction-based approaches F_g:AF,u:AF, F_c:AF,u:AF, and F_{g:AF,u:AF⋅ILF} have in common that they perform far better in the high mainstreaminess segment than in the mid and the low one. This could indicate that these measures still privilege globally popular items too much and, thus, produce more errors in the mid and low segments.

Interestingly, the approaches based on symmetrized Kullback-Leibler divergence (D) perform worse when tailored towards a user’s country (D_c:AF,u:AF), compared to their application on a global level (D_g:AF,u:AF). Combining the country-specific tailoring with the AF-ILF weighting allows for better results compared to applying both separately.

While our results do not suggest a general superiority of mainstreaminess measures that incorporate AF-ILF, first results of our deeper analysis on the country level indicate that these measures seem to perform particularly well for countries far from the global mainstream, such as Finland (RMSE of D_{c:AF⋅ILF,u:AF⋅ILF} for all=5.985, high=1.346, mid=1.365, low=1.418), but worse for high mainstream countries, such as the USA (RMSE of D_{c:AF⋅ILF,u:AF⋅ILF} for all=57.489, high=4.071, mid=4.077, low=55.968). In the presented example, the low mainstream country Finland is small, and the respective weighted error measures in Table 9 do not reflect this country’s users to the same extent as the large and high mainstream United States. As part of our ongoing large-scale analysis, delving into detail on country-specific aspects, we will investigate as a next step what factors influence the performance differences between countries for a given mainstreaminess measure.

A direct comparison of the RMSE achieved by our approach with the RMSE reported in (48), the work closest to ours, is unfortunately impossible since Vigliensoni and Fujinaga quantized playcounts into a 5-point Likert rating scale: [1, 5]. Still, in a rough estimation, our results suggest that the accuracy of our best C_c:AF,u:AF approach delivers a new benchmark in the combination of demographic (country) filtering and mainstreaminess filtering, with a RMSE of 14.3 on a [0, 1000] scale. The best RMSE reported in (48) when considering mainstreamness and country information is approximately 0.9 on the much narrower [1, 5] scale (cf. approach u.c.m. in Figure 2 of (48)).

6 Conclusions and Outlook

The music mainstreaminess of a listener reflects how strong a person’s listening preferences correspond to those of the larger population. We consider that music mainstream may be defined from different perspectives. In this paper, we took into account that there are regional differences of what is considered mainstream, due to cultural characteristics and different market structures across countries.

The main contributions of this paper are three-fold: First, we proposed 11 novel measures to quantify the music mainstreaminess of a user, a country, and an entire population. Those are based on fractional (F), divergence (D), and rank correlation (C) functions.

Second, we illustrated country-specific peculiarities of music preferences and country-specific mainstream employing the LFM-1b dataset (39). We identified archetypal countries: (i) those countries where the mainstream of the country corresponds to the global trend (e.g., the United States), (ii) those countries with a distinct country-specific mainstream in addition to the global mainstream (e.g., Finland), and (iii) those countries roughly following the global mainstream trend without a clear secondary trend curve, but showing various country-specific outliers over the whole global artist popularity range (e.g., Brazil and Japan).

Third, we studied the performance of the proposed mainstreaminess measures for personalized music recommendation. Considering that music mainstream may be defined from a global but also a country-specific perspective, we particularly studied how the combination of a user’s mainstreaminess and demographic (country) filtering influences the quality of music recommendations. Based on the LFM-1b dataset (39), we investigated the performance of the proposed measures in a rating prediction task, employing probabilistic matrix factorization. To quantify performance, we computed country-averaged, weighted RMSE and MAE figures for all mainstreaminess definitions and various mainstreaminess levels, and compared these with a global baseline. Overall, our results suggest that incorporating any kind of mainstreaminess information outperforms the baseline. Our best approach combines demographic filtering (based on a user profile’s country) and mainstreaminess filtering based on Kendall’s τ (variant C_{c:AF, u:AF}) and outperforms applying these filtering approaches separately. While our results do not hint at a general superiority of mainstreaminess measures that incorporate AF-ILF, they do show that such measures perform much better than others for countries whose preference profiles are far away from the global taste (e.g., Finland).

As part of future work, we will take an in-depth look at the differences between countries, i.e. analyze in which countries which mainstreaminess functions perform particularly well or poorly. Additionally, we plan to analyze how well our results generalize to other datasets providing demographic user information, e.g., the Million Musical Tweets Dataset (18), a playlist dataset crawled from Spotify users (31), or on a larger scale Spotify’s official Million Playlist Dataset,² released as part of the ACM Recommender Systems Challenge 2018 on automatic playlist continuation. We further plan user studies to investigate with qualitative methods whether incorporating mainstreaminess information improves users’ perceived satisfaction with recommendations.

Acknowledgements

This research is supported by the Austrian Science Fund (FWF): V579.

References

[1] Ahn, H. J. (2006). Utilizing popularity characteristics for product recommendation. International Journal of Electronic Commerce, 11(2), 59–80.

[2] Anderson, C. (2006). The long tail: Why the future of business is selling more for less. Hyperion.

[3] Baek, Y. M. (2015). Relationship between cultural distance and cross-cultural music video consumption on YouTube. Social Science Computer Review, 33(6), 730–748.

[4] Baker, S., Bennett, A., and Taylor, J. (Eds.). (2013). Redefining mainstream popular music. Routledge.

[5] Basu, C., Hirsh, H., and Cohen, W. (1998). Recommendation as classification: Using social and content-based information in recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, 714–720. American Association Intelligence, 1998.

[6] Bauer, C., Kholodylo, M., and Strauss C. (2017). Music recommender systems: Challenges and opportunities for non-superstar artists. In Andreja Pucihar, Mirjana Kljajić Borstnar, Christian Kittl, Pascal Ravesteijn, Roger Clarke, and Roger Bons, editors, Proceedings of 30th Bled eConference, 21–32.

[7] Bauer, C., and Schedl, M. (2018). On the importance of considering country-specific aspects on the online-market: An example of music recommendation considering country-specific mainstream. In 51st Hawaii International Conference on System Sciences (HICSS 3647–3656.

[8] Bawden, D., and Robinson, L. (2009). The dark side of information: overload, anxiety and other paradoxes and pathologies. Journal of Information Science, 35(2), 180–191.

[9] Brynjolfsson, E., Hu, Y., and Simester, D. (2011). Goodbye pareto principle, hello long tail: The effect of search costs on the concentration of product sales. Management Science, 57(8), 1373–1386.

[10] Budzinski, O., and Pannicke, J. (2017). Do preferences for pop music converge across countries–Empirical evidence from the Eurovision Song Contest. Creative Industries Journal, 1–20, 2017.

[11] Celma, O. (2010). Music recommendation. In Music recommendation and discovery, 43–85. Springer, Berlin, Heidelberg.

[12] Celma, Ò., and Cano, P. (2008). From hits to niches: or how popular artists can bias music recommendation and discovery. In Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition, 5.

[13] Cheng, Z., and Shen, J. (2014). Just-for-me: An adaptive personalization system for location-aware social music recommendation. In Proceedings of international conference on multimedia retrieval, 185.

[14] Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., and Ashkan, A. Stefan B üttcher, and Ian MacKinnon. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of SIGIR, 659–666).

[15] Cremonesi, P., Garzotto, F., Pagano, R., and Quadrana, M. (2014). Recommending without short head. In Proceedings of the 23rd International Conference on World Wide Web. 245–246).

[16] Farrahi, K., Schedl, M., Vall, A., Hauger, D., and Tkalčič, M. (2014). Impact of listening behavior on music recommendation. In Proceedings of the 15th International Society for Music Information Retrieval Conference, 483–488.

[17] Ferwerda, B. (2016). Improving the User Experience of Music Recommender Systems Through Personality and Cultural Information. PhD. Johannes Kepler University Linz, Linz, Austria.

[18] Hauger, D., Schedl, M., Košir, A., and Tkalcic, M. (2013). The million musical tweets dataset: what can we learn from microblogs. In Proc. ISMIR, 189–194.

[19] Hracs, B. J., Seman, M., and Virani, T. E. (2016). The production and consumption of music in the digital age, Abingdon: Routledge, 58.

[20] Hu, X., Lee, J. H., Choi, K., and Downie, J. S. (2014). A cross-cultural study of mood in k-pop songs. In Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 217–238,

[21] Jones, M. L. (2007). Hofstede-culturally questionable. In Proceedings of the Oxford Business & Economics Conference (OBEC).

[22] Kitayama, S., and Park, H., (2007). Cultural shaping of self, emotion, and well-being: How does it work? Social and Personality Psychology Compass, 1(1) 202–222.

[24] Kumar, R., Verma, B. K., and Rastogi, S. S. (2014). Social popularity based SVD++ recommender system. International Journal of Computer Applications, 87(14).

[25] Laplante, A. (2014). Improving music recommender systems: what can we learn from research on music tags? In Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 451–456.

[26] Lee, J. H., and Hu, X. (2014). Cross-cultural similarities and differences in music mood perception. iConference 2014 Proceedings. Linden,

[27] Linden, G., Smith, B., and York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1), 76-80.

[28] McFee, B., Barrington, L., and Lanckriet, G. (2012). Learning content similarity for music recommendation. IEEE transactions on audio, speech, and language processing, 20(8), 2207–2218.

[29] McSweeney, B. (2002). Hofstede’s model of national cultural differences and their consequences: A triumph of faith-a failure of analysis. Human relations, 55(1), 89–118.

[30] Montaner, M., López, B., and De La Rosa, J. L. (2003). A taxonomy of recommender agents on the internet. Artificial intelligence review, 19(4), 285–330.

[31] Morrison, S. J., and Demorest, S. M. (2009). Cultural constraints on music perception and cognition. Progress in brain research, 178, 67–77.

[32] Pichl, M., Zangerle, E., and Specht, G. (2015). Towards a context-aware music recommendation approach: What is hidden in the playlist name?. In Data Mining Workshop (ICDMW), 2015 IEEE International Conference on 1360–1365.

[34] Pohle, T., Knees, P., Schedl, M., and Widmer, G. (2006). Automatically adapting the structure of audio similarity spaces. In Proc. 1st Workshop on Learning the Semantics of Audio Signals (LSAS), 66–75.

[35] Power And, D., and Hallencreutz, D. (2007). Competitiveness, local production systems and global commodity chains in the music industry: entering the US market. Regional Studies, 41(3), 377-389.

[36] Ricci, F. (2015). Recommender Systems Handbook: /Francesco Ricci, Lior Rokach, Bracha Shapira–Springer Science+ Business Media New York, 1003 p. ISBN 978-1-4899-7636-9.

[37] Rutten, P. (1991). Local popular music on the national and international markets. Cultural Studies, 5(3) 294–305.

[38] Salakhutdinov, R., and Mnih, A. (2007). Probabilistic Matrix Factorization. In Proceedings of the 20th International Conference on Neural Information Processing Systems, 1257–1264.

[39] Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

[40] Schedl, M. (2013). Ameliorating music recommendation: Integrating music content, music context, and user context for improved music retrieval and recommendation. In Proceedings of International Conference on Advances in Mobile Computing & Multimedia,

[41] Schedl, M. (2016). The lfm-1b dataset for music retrieval and recommendation. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 103–110.

[42] Schedl, M. (2017). Investigating country-specific music preferences and music recommendation algorithms with the LFM-1b dataset. International journal of multimedia information retrieval, 6(1), 71–84.

[43] Schedl, M., and Bauer, C. (2017). Distance-and Rank-based Music Mainstreaminess Measurement. In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, 364–367.

[44] Schedl, M., and Ferwerda, B. (2017). Large-scale Analysis of Group-specific Music Genre Taste From Collaborative Tags. In The 19th IEEE International Symposium on Multimedia (ISM2017), Taichung.

[45] Schedl, M., Gómez, E., and Urbano, J. (2014). Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval, 8(2–3), 127–261.

[46] Schedl, M., and Hauger, D. (2015). Tailoring music recommendations to users by considering diversity, mainstreaminess, and novelty. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 947–950.

[47] Schedl, M., Knees, P., McFee, B., Bogdanov, D., and Kaminskas, M. (2015). Music recommender systems. In Recommender Systems Handbook, 453–492.

[48] Singhi, A., and Brown, D. G. (2014). On Cultural, Textual and Experiential Aspects of Music Mood. In ISMIR, 3–8.

[49] Stevens, C. J. (2012). Music perception and cognition: A review of recent cross-cultural research. Topics in cognitive science, 4(4), 653–667.

[50] Vigliensoni, G., and Fujinaga, I. (2016). Automatic Music Recommendation Systems: Do Demographic, Profiling, and Contextual Features Improve Their Performance. In ISMIR, 94–100.

[52] Xiao, L., Lu, L., Seide, F., and Zhou, J. (2009). Learning a music similarity measure on automatic annotations with application to playlist generation. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, 1885–1888.

[53] Yan, Y., Liu, T., and Wang, Z. (2015). A Music Recommendation Algorithm Based on Hybrid Collaborative Filtering Technique. In Chinese National Conference on Social Media Processing, 233–240.

[54] Yang, J. (2016). Effects of popularity-based news recommendations (“most-viewed”) on users’ exposure to online news. Media Psychology, 19(2), 243–271.

[55] Zhang, Y. C., Séaghdha, D. Ó., Quercia, D., and Jambor, T. (2012). Auralist: introducing serendipity into music recommendation. In Proceedings of the fifth ACM international conference on Web search and data mining, 13–22.

Biographies

images

Markus Schedl is an Associate Professor at the Johannes Kepler University Linz/Department of Computational Perception, Austria.

He graduated in Computer Science from the Vienna University of Technology and earned his Ph.D. in Computer Science from the Johannes Kepler University Linz. Markus further studied International Business Administration at the Vienna University of Economics and Business Administration as well as at the Handelshögskolan of the University of Gothenburg, which led to a Master’s degree. His main research interests include web and social media mining, information retrieval, multimedia, and music information research.

Markus (co-)authored more than 150 refereed conference papers and journal articles (among others, published in ACM Multimedia, ICMR, SIGIR, ECIR, IEEE Visualization; Journal of Machine Learning Research, ACM Transactions on Information Systems, Springer Information Retrieval, IEEE Multimedia). Furthermore, he is associate editor of the Springer International Journal of Multimedia Information Retrieval and serves on various program committees and reviewed submissions to several top-tier conferences and journals (among others, ACM Multimedia, ECIR, IJCAI, ICASSP, IEEE Visualization; IEEE Intelligent Systems, IEEE Transactions on Multimedia, Elsevier Data & Knowledge Engineering, Elsevier Pattern Recognition Letters, ACM Transactions on Intelligent Systems and Technology, Elsevier Information Sciences).

images

Christine Bauer is Senior Postdoc Researcher at the Johannes Kepler University Linz/Department of Computational Perception, Austria, and Lecturer at University of Vienna, Austria, spanning the fields of Information Systems, Informatics, and Business Administration.

She holds a Doctoral degree in Social and Economic Sciences and a Master’s degree in International Business Administration both from the University of Vienna, Austria. Furthermore, she holds a Master degree in Business Informatics from the Vienna University of Technology (TU Wien), Austria. Further studies at the University of Wales Swansea, United Kingdom, Konservatorium der Stadt Wien, Austria, and Vienna University of Economics and Business (WU Wien), Austria.

Christine has (co-)authored more than 65 papers in refereed journals and conference proceedings, four of them awarded best paper and four additional nominations for best paper awards. Articles have been published in, amongst others, IEEE Transactions on Industrial Informatics, Information and Software Technology, and the Journal of Systems and Software.

Furthermore, she serves on various program committees and reviewed submissions to several top-tier conferences and journals, amongst others, CHI, ICIS, RecSys, ACM Transactions on Intelligent Systems and Technology, European Journal of Information Systems, Computers in Human Behavior, IEEE Transactions on Human-Machine Systems, Electronic Markets, and Business & Information Systems Engineering.

¹The restriction to countries with at least 1,000 users was made to allow for a meaningful analysis, as performed in (40).

²https://recsys-challenge.spotify.com/details