SANVis: Visual Analytics for Understanding Self-Attention Networks

VIS VAST Short 2019

Cheonbok Park
Korea University
Inyoup Na
Korea University
Yongjang Jo
Korea University
Sungbok Shin
University of Maryland
Yoo Jaehyo
Korea University
Bum Chul Kwon
IBM Research
Jian Zhao
University of Waterloo
Hyungjong Noh
Yeonsoo Lee
Jaegul Choo
Korea University

Demo video


Attention networks, a deep neural network architecture inspired by humans' attention mechanism, have seen significant success in image captioning, machine translation, and many other applications. Recently, they have been further evolved into an advanced approach called multi-head self-attention networks, which can encode a set of input vectors, e.g., word vectors in a sentence, into another set of vectors. Such encoding aims at simultaneously capturing diverse syntactic and semantic features within a set, each of which corresponds to a particular attention head, forming altogether multi-head attention. Meanwhile, the increased model complexity prevents users from easily understanding and manipulating the inner workings of models. To tackle the challenges, we present a visual analytics system called SANVis, which helps users understand the behaviors and the characteristics of multi-head self-attention networks. Using a state-of-the-art self-attention model called Transformer, we demonstrate usage scenarios of SANVis in machine translation tasks


VIS VAST Short, 2019.
Cheonbok Park, Inyoup Na, Yongjang Jo, Sungbok Shin, Yoo Jaehyo, Bum Chul Kwon, Jian Zhao, Hyungjong Noh, Yeonsoo Lee and Jaegul Choo. "SANVis: Visual Analytics for Understanding Self-Attention Networks"

Overview of SANVis

Responsive image
Overview of SANVis: (A) The network view displays multiple attention patterns for each layer according to three type of visualization options: (A-1) the attention piling option, (A-2) the Sankey diagram option, and (A-3) the small multiples option. (A-4) The bar chart shows the average attention weights for all heads (each colored with its corresponding hue) per each layer. (B) The SANVis view helps the user analyze what the attention head learned by showing representative words and by providing statistical information of part-of-speech tags and positions.

Made by

Korea University University of Maryland IBM Research University of Waterloo NCSoft Co., Ltd.

Cite [copy]

    author    = {Cheonbok Park and Inyoup Na and Yongjang Jo and Sungbok Shin and Jaehyo Yoo and Bum Chul Kwon and Jian Zhao and Hyungjong Noh and Yeonsoo Lee and Jaegul Choo},
    title     = {SANVis: Visual Analytics for Understanding Self-Attention Networks},
    journal   = {IEEE Conference on Visual Analytics Science and Technology (VAST) Short},
    year  = {2019}