Transcript: BERT-Based Neural Collaborative Filtering and Fixed-Length Contigous Tokens Explanation

 Abstract

We propose a novel, accurate, and explain-

able recommender model (BENEFICT) that

addresses two drawbacks that most review-

based recommender systems face. First is

their utilization of traditional word embed-

dings that could influence prediction perfor-

mance due to their inability to model the

word semantics’ dynamic characteristic. Sec-

ond is their black-box nature that makes the

explanations behind every prediction obscure.

Our model uniquely integrates three key ele-

ments: BERT, multilayer perceptron, and max-

imum subarray problem to derive contextual-

ized review features, model user-item interac-

tions, and generate explanations, respectively.

Our experiments show that BENEFICT consis-

tently outperforms other state-of-the-art mod-

els by an average improvement gain of nearly

7%. Based on the human judges’ assess-

ment, the BENEFICT-produced explanations

can capture the essence of the customer’s pref-

erence and help future customers make pur-

chasing decisions. To the best of our knowl-

edge, our model is one of the first recom-

mender models to utilize BERT for neural col-

laborative filtering.

1 Introduction

In recommender systems research, collaborative

filtering (CF) is the dominant state-of-the-art rec-

ommendation model, which primarily focuses on

learning accurate representations of users (user

preferences) and items (item characteristics) (Chen

et al., 2018; Tay et al., 2018). The earliest rec-

ommender models learned these representations

based on user-given numeric ratings that each item

received (Mnih and Salakhutdinov, 2008; Koren

et al., 2009). However, ratings, which are values

on a single discrete scale, oversimplify user prefer-

ences and item characteristics (Musto et al., 2017).

The large amount of users and items in a typical

online platform consequently results in a highly

sparse rating matrix, making it hard to learn accu-

rate representations (Zheng et al., 2017).

To alleviate these issues, review texts have in-

stead been utilized to model such representations

for subsequent recommendation and rating predic-

tion, and this approach has attracted growing at-

tention in research (Catherine and Cohen, 2017;

Zheng et al., 2017). The main advantage of reviews

as the source of features is that they can cover user

opinions’ multi-faceted substance. Because users

can explain their reasons underlying their given

ratings, reviews contain a large amount of latent

information that is both rich and valuable, and that

cannot be otherwise obtained from ratings alone

(Chen et al., 2018; Wang et al., 2019). Recently,

models that incorporate user reviews have yielded

state-of-the-art performances (Zheng et al., 2017;

Chen et al., 2018). These approaches learn user

and item representations by using traditional word

embeddings (e.g., word2vec, GloVe) to map each

word in the review into its corresponding vector.

The review is transformed into an embedded matrix

before being fed to a convolutional neural network

(CNN) (Chen et al., 2018). CNNs have been shown

to effectively model reviews and have illustrated

outstanding results in numerous natural language

processing tasks (Wang et al., 2018a).

Nevertheless, there are drawbacks that most

review-based recommender models experience.

First is the utilization of traditional or mainstream

word embeddings to learn review features. Their

static nature is a hindrance, as each word sense is as-

sociated with the same embedding regardless of the

context. In other words, such embeddings cannot

identify the dynamic nature of each word’s seman-

tics. For review-based recommenders, this could be

an issue in modeling users and items, which could,

in turn, affect recommendation performance (Pile-

hvar and Camacho-Collados, 2019). Also, once a

CNN is fed with the matrix of word embeddings,

the word frequency information of contextual features, said to be crucial for modeling reviews, will

be lost (Wang et al., 2018a).

Another drawback is the inherent black-box nature of deep learning-based models that makes

the explanations behind every prediction obscure

(Ribeiro et al., 2016; Wang et al., 2018b). The complex architecture of hidden layers has opaqued the

models’ internal decision-making processes (Peake

and Wang, 2018). Providing explanations could

help persuade users to make decisions and develop

trust in a recommender system (Zhang et al., 2014;

Ribeiro et al., 2016; Costa et al., 2018; Peake and

Wang, 2018). However, this leads us to a dilemma,

i.e., a trade-off between accuracy and explainability.

Usually, the most accurate models are inherently

complicated, non-transparent, and unexplainable

(Zhang and Chen, 2018). The same is also true

for explainable and straightforward methods that

sacrifice accuracy. Formulating models that are

both explainable and accurate is a challenging yet

critical research agenda for the machine learning

community to ensure that we derive benefits from

machine learning fairly and responsibly (Peake and

Wang, 2018).

In this paper, we propose a unique model:

BERT-Based Neural Collaborative Filtering and

Fixed-Length Contiguous Tokens Explanation

(BENEFICT). Our model learns user and item representations simultaneously using two parallel networks. To address the first drawback, we incorporate BERT as a key component in each parallel network. BERT affords us to extract more meaningful,

contextualized features adaptable to arbitrary contexts; such features cannot be derived from mainstream word embeddings (Pilehvar and CamachoCollados, 2019; Zakbik et al., 2019). BERT can

also retain the word frequency information that

makes CNN an unnecessary component of our

model. Once user and item representations are

learned, they are concatenated together in a shared

hidden space before being finally fed to an optimal

stack of multilayer perceptron (MLP) layers that

serve as BENEFICT’s interaction function.

To address the second drawback, we introduce

a novel component in our model that integrates

BERT’s self-attention and an implementation of the

fixed-length maximum subarray problem (MSP),

which is considered to be a classic computer science problem. BERT applies self-attention in

each encoder layer that consequently produces selfattention weights for each token. These are passed 

to the successive encoder layers through feedforward networks. We argue that these self-attention

weights can be the basis for explaining rating predictions. Based on this premise, MSP then selects

a segment or subarray of consecutive tokens that

has the maximum possible sum of self-attention

weights.

1.1 Contributions

Our work aims to fill the research gap by implementing a solution that is both accurate and explainable. We propose a novel model that uniquely

integrates three vital elements, i.e., BERT, MLP,

and MSP, to derive review features, model useritem interactions, and produce possible explanations. To the best of our knowledge, BENEFICT

is one of the first review-based recommender models to utilize BERT for neural CF. Also, to the

extent of our knowledge, BENEFICT is one of

the first models to repurpose a portion of the Neural Collaborative Filtering (NCF) framework (He

et al., 2017) as the user-item interaction function

for review-based, explicit CF. Moreover, our experiments have demonstrated that our model achieves

better rating prediction results than the other stateof-the-art recommender models.

2 Related Work and Concepts

Designing a CF model involves two crucial steps:

learning user and item representations and modeling user-item interactions based on those representations (He et al., 2018). Before the advancements

provided by neural networks, matrix factorization

(MF) was the dominant model representing users

and items as vectors of latent factors (called embeddings) and models user-item interactions using the

inner product operation. The said operation leads

to poor performance because it is sub-optimal for

learning rich yet complicated patterns from realworld data (He et al., 2018). To address this scenario, neural networks (NN) have been integrated

into recommender architectures. One of the initial

works that have laid the foundation in employing

NN for CF is NCF (He et al., 2017). Their framework, originally implemented for rating-based, implicit CF, learns non-linear interactions between

users and items by employing MLP layers as their

interaction function, granting it a high degree of

non-linearity and flexibility to learn meaningful

interactions. Two common designs have emerged

when it comes to leveraging MLP layers: placing

an MLP above either the concatenated user-item

embeddings (He et al., 2017; Bai et al., 2017) or the

element-wise product of user and item embeddings

(Zhang et al., 2017; Wang et al., 2017).

As far as rating prediction is concerned, two

notable recommender models have yielded significant state-of-the-art prediction performances.

DeepCoNN is the first deep model that represents

users and items from reviews jointly (Zheng et al.,

2017). It consists of two parallel, CNN-powered

networks. One network learns user behavior by

examining all reviews that he has written, and the

other network models item properties by exploring all reviews that it has received. A shared layer

connects these two networks, and factorization machines capture user-item interactions. The second

model is NARRE, which shares certain similarities with DeepCoNN. NARRE is also composed of

two parallel networks for user and item modeling

with respective CNNs to process reviews (Chen

et al., 2018). Rather than concatenating reviews to

one long sequence the same way that DeepCoNN

does, their model introduces an attention mechanism that learns review-level usefulness in the form

of attention weights. These weights are integrated

into user and item representations to enhance the

embedding quality and the subsequent prediction

accuracy. Both DeepCoNN and NARRE employ

traditional word embeddings.

Other relevant studies have claimed to provide

explanations for recommendations such as EFM

(Zhang et al., 2014), sCVR (Ren et al., 2017), and

TriRank (He et al., 2015). These models initially

extract aspects and opinions by performing phraselevel sentiment analysis on reviews. Afterward,

they generate feature-level explanations according

to product features that correspond to user interests

(Chen et al., 2018). However, these models have

some limitations; manual preprocessing is required

for sentiment analysis and feature extraction, and

the explanations are simple extraction of words or

phrases from the review text (Zhang et al., 2014;

Ren et al., 2017). This also has the unintended

effect of distorting the reviews’ original meaning

(Ribeiro et al., 2016; Chen et al., 2018). Another

limitation is that textual similarity is solely based

on lexical similarity; this implies that semantic

meaning is ignored (Zheng et al., 2017; Chen et al.,

2018).




Post a Comment

0 Comments