当前位置：首页 >

自然语言界面数据可视化【英文版】

2021年09月22日
50 金币

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 1 Towards Natural Language Interfaces for Data Visualization: A Survey Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, and Jianmin Wang Abstract—Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than worrying about operating the interface to visualization tools. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed both within academic research and commercial software, especially in recent years. In this article, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer. The following seven stages are used: query understanding, data transformation, visual mapping, view transformation, human interaction, context management, and presentation. Finally, we also shed light on several promising directions for future work in the community. Index Terms—Data Visualization, Natural Language Interfaces, Survey. ! arXiv:2109.03506v1 [cs.HC] 8 Sep 2021 1 INTRODUCTION T HE use of interactive visualization is becoming increasingly popular in data analytics [17]. As a common part of analytics suites, Windows, Icons, Menus, and Pointer (WIMP) interfaces have been widely employed to facilitate interactive visual analysis in current practice. However, this interaction paradigm presents a steep learning curve in visualization tools since it requires users to translate their analysis intents into tool-speciﬁc operations [127], as shown in the upper part of Figure 1. Over the years, the rapid development of Natural Language Processing (NLP) technology has provided a great opportunity to explore a natural language based interaction paradigm for data visualization [18], [277]. With the help of advanced NLP toolkits [1], [3], [21], [83], [156], a surge of Visualization-oriented Natural Language Interfaces (V-NLI) emerged as a complementary input modality to traditional WIMP interaction recently, which supports generating visualizations according to the user’s NL queries. The emergence of V-NLI can greatly enhance the usability of visualization tools in terms of: (a) Convenient and novicefriendly. Natural language is a skill that is mastered by the public. By leveraging natural language to interact with computers, V-NLI closes the tool-speciﬁc manipulations to users as shown in Figure 1, facilitating the ﬂow of analysis for novices. (b) Intuitive and effective. It is a consensus that visual analysis is most effective when users can focus on their data rather than the manipulation on interface of analysis tools [85]. With the help of V-NLI, users can express their analytic task in their own terms. (c) Humanistic care. A sizeable amount of information we access nowadays is supported by visual means. V-NLI can be an innovative means for non-visually accessing, which promotes the inclusion of blind and low vision (BLV) people. • All the authors are from Tsinghua University, Beijing, China. E-mail: {slx20, luoyy18, yangxc18, hxm19, zxs21,tzw20}@mails.tsinghua.edu.cn, {shenenya, jimwang}@tsinghua.edu.cn. Manuscript received XX XX, 2021; revised XX XX, 2021. Typical interactive visualization paradigm Analysis Intents Translate analysis intents into tool-specific operations Visualization tools Visualizations Interactive visualization with V-NLIs Analysis Intents Visualization tools Visualizations Fig. 1. Traditional interaction paradigm requires users to translate their analysis intents into tool-speciﬁc operations [127]. With the help of V-NLI, users can express their analysis intent in their own terms. Timeline of V-NLI is shown in Figure 2. Back in 2001, Cox et al. [41] presented an initial prototype of NLI for visualization which can only accept well-structured queries. Almost a decade later, Articulate [241] introduced a two-step process to create visualizations from NL queries. It ﬁrst extracts the user’s analytic task and data attributes and then determines the appropriate visualizations based on those information automatically. Although the infancy-stage research studies were a promising start, as natural language was not yet a prevalent modality of interaction, the VNLI systems were restricted to simple prototypes. However, since Apple integrated Siri [221] into iPhone, NLIs began to attract more attention. Around 2013, the advent of word embeddings [162] promoted the advances of neural networks for NLP, thus rekindling commercial interest in V-NLI. IBM ﬁrstly published their NL-based cognitive service, Watson Analytics [4], in 2014. Microsoft Power BI’s Q&A [5] and Tableau’s Ask data [2] were announced in 2018 and 2019, respectively, offering various features like autocompletion, underspeciﬁed utterances inference, etc. DataTone [64] ﬁrst introduced ambiguity widgets to manage ambiguities in the queries while Eviza [207] explored analytic conversations. After a few years of technology accumulation, the past ﬁve years have experienced outbreak of V-NLI (See number IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 2 Infancy Stage Cox et al. Articulate 20.01 (.1) .200.2 (1.) .200.5 (1.) 2010 (1) 2011 (1) 2012 (2) Development Stage IBM Watson Analytics DataTone 2013 (1) 2014 (1) 2015 (1) 2016 (3) Orko Outbreak Stage Microsoft Tableau Power BI Q&A Ask Data InChorus, Data@Hand, NL4DV ncNet, ADVISor 2017 (4) 2018 (7) 2019 (9) 2020 (13) 2021 (9) : Academic research Apple Siri Word embeddings Seq-to-Seq Attention : Commercial software : NLP milestone Eviza : Dataset ELMO, BERT FlowSense Quda, NLV nvBench Fig. 2. Timeline of V-NLI. We brieﬂy divide the timeline into infancy stage, development stage, and outbreak stage. The number of yearly published papers is attached. The timeline consists of four parts: academic research, commercial software, NLP milestone, and dataset. of yearly published papers in Figure 2). With the development of hardware devices, synergistic multimodal visualization interface gained notable interest. Orko [234] was the ﬁrst system to combine touch and speech input on tablet devices, and Data@Hand [278] focused on smartphones. InChorus [229] incorporated a pen as a third modality for consistent interaction experience. The pretrained language models obtained new state-of-the-art results on various NLP tasks from 2018, which provided great opportunities to improve the intelligence of V-NLI [51], [181]. Quda [62] and NLV [231] contributed datasets of NL queries for visual data analytics and nvBench produced the ﬁrst V-NLI benchmark [150]. ADVISor [142] and ncNet [149] were deep learning-based solutions for V-NLI. Beyond data exploration, FlowSense [280] augmented a dataﬂow-based visualization system with V-NLI. NL4DV [174] toolkit can be easily integrated into existing visualization systems to provide V-NLI service. Literature on V-NLI research is growing rapidly, covering aspects such as Visualization (VIS), Human-Computer Interaction (HCI), Natural Language Processing (NLP), and Data Mining and Management (DMM). As a result, there is an increasing need to better organize the research landscape, categorize current work, identify knowledge gaps, and assist people who are new to this growing area to understand the challenges and subtleties in the community. For this purpose, there have been several prior efforts to summarize the advances in this area. For example, Srinivasan and Stasko (Short paper in EuroVis 2017 [233]) conducted a simple examination of ﬁve existing V-NLI systems by comparing and contrasting them based on the tasks they allow users to perform. They (Journal paper in CGA 2020 [235]) further highlighted key challenges for evaluating V-NLI and discussed three popular taskframing strategies of their beneﬁts and considerations when applying. Although these two surveys can provide valuable guidance for follow-up research, with the outbreak of V-NLI in recent years, there are considerable new works to be covered and details to be discussed. To the best of our knowledge, this paper is the ﬁrst step towards comprehensively reviewing on V-NLI in a systematic manner. The remainder of this paper is structured as follows: First, we explain the scope and methodology of the survey in Section 2. What follows is the classiﬁcation overview of existing works in Section 3. Then, the comprehensive survey is presented in Section 4, 5, 6, 7, 8, 9, and 10, corresponding to seven stages extracted in the information visualization pipeline [30]. Finally, we shed light on several promising directions for future work in Section 11. The conclusion of the survey is in Section 12. 2 SURVEY LANDSCAPE 2.1 Scope of the Survey In order to narrow the scope of the survey within a controllable range, we focused on visualization-oriented natural language interface, which accepts natural language queries as input and generate appropriate visualizations automatically. Users TABLE 1 Relevant venues Topic Venues Visualization (VIS) IEEE VIS (InfoVis, VAST, SciVis), EuroVis, PaciﬁcVis, TVCG, CGF, CGA Human-Computer (HCI) Interaction CHI, UIST, IUI Natural (NLP) Language Processing ACL, EMNLP, NAACL, COLING Data Mining (DMM) and Management KDD, SIGMOD, ICDE, VLDB can input natural language in various ways, including type with keyboard (e.g., Datatone [64] in Figure 7), speech in microphone (e.g., InChorus [229] in Figure 9), select text from articles (e.g., Metoyer et al. [161] in Figure 13), and input existing textual description (e.g., Vis-Annotator [122] in Figure 12). In addition, there are several ﬁelds that are closely related to V-NLI, as listed in Table 2. In order to make the survey more comprehensive, when introducing V-NLI in the following sections, we will involve additional crucial discussions on the aforementioned related ﬁelds, with explanations of its importance and relationship with V-NLI. For example, Visualization Recommendation (VisRec) [189], [295] acts as the back-end engine of V-NLI to recommend visualizations. Natural Language Interface for Database (NLIDB) [6] and Conversational Interface for Software Visualization (CISV) [20] share the similar principle of V-NLI. Natural language description generation for Visualization (NLG4Vis) [143], [178] and Visual Question Answering (VQA) [111], [157] complement visual data analysis with natural language as output. Annotation [192], [215] and Narrative Storytelling [244] present fascinating visualizations by combining textual and visual elements. 2.2 Survey Methodology To comprehensively survey V-NLI, we performed an exhaustive review of relevant journals and conferences in the past twenty years (2000-2021), which broadly covers VIS, HCI, NLP, and DMM community. The relevant venues are shown in Table 1. We begun our survey by searching keyword (“natural language” AND “visualization”) in Google Scholar, resulting in 1433 papers (VIS: 452, HCI: 489, NLP: 289, and DMM: 203). Additionally, we also searched for representative related works that appeared earlier or in the cited references of related papers. During the review process, we ﬁrst examined the titles of papers from these publications to identify candidate papers. Next, abstracts of the candidate papers were browsed to further determine whether they are related to V-NLI. The full text will be gone through to make a ﬁnal decision if we cannot obtain clear information from the title and abstract. Finally, we collected 55 papers about V-NLI which accepts natural language as input and outputs visualizations. Details of the V-NLI systems are listed in Table 3, including NLP Toolkit or Technology applied, chart types supported, visualization recommendation algorithm adopted, and various characteristics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 3 TABLE 2 Representative papers of related works to V-NLI. Topic related to V-NLI Visualization Recommendation (VisRec) Natural Language Interface for DataBase (NLIDB) Conversational Interface for Software Visualization (CISV) Natural Language description Generation for Visualization (NLG4Vis) Visual Question Answering (VQA) Annotation and Narrative Storytelling Survey [189], [295] [6], [296] [157] [202], [244] Representative Papers [33], [38], [43], [47], [53], [72], [86], [88], [107], [110], [115], [129], [137], [141], [148], [151], [153], [167], [169], [170], [187], [188], [205], [237], [250], [256], [264], [265], [266], [268], [283], [294] [13], [14], [15], [19], [22], [56], [74], [75], [76], [82], [89], [94], [96], [118], [134], [135], [136], [140], [173], [176], [190], [195], [204], [212], [222], [223], [255], [273], [275], [281], [291], [293] [20], [59], [79], [133], [171], [172], [203], [218] [40], [43], [45], [46], [60], [70], [112], [125], [143], [165], [166], [178], [186], [225], [262], [274], [276] [29], [31], [98], [99], [111], [142], [154], [224], [263] [11], [26], [33], [34], [35], [36], [37], [42], [63], [65], [66], [84], [91], [102], [113], [115], [119], [122], [123], [145], [161], [164], [185], [192], [215], [243], [257], [260], [261], [270], [282] TABLE 3 Summary of representative works in V-NLI. The ﬁrst ﬁve columns are name, publication, NLP toolkit or technology applied, various visualization types supported, and visualization recommendation algorithm. The left columns are characteristics of V-NLI with categorical dimensions described in Section 3. Details of each column will be discussed in Section 4 - 10. 4 567 8 9 10 Semantic and Syntax Analysis Task Analysis Data Attributes Inference Default for Underspeciﬁed Utterances Transformation for Data Insights Spatial Substrate Mapping Graphical Elements Mapping Graphical Properties Mapping View Transformation Ambiguity Widget Autocompletion Multimodal WIMP Context Management Visual Presentation Annotation Narrative Storytelling NL Description Generation Visual Question Answering Name Publication NLP Toolkit or Technology Visualization Type Recommendation Algorithm Cox et al. [41] IJST’01 Sisl B/Ta InfoStill Kato et al. [106] COLING’02 Logical form B/L/P/A Template RIA [290] InfoVis’05 - I Optimization Articulate [241] LNCS’10 Stanford Parser S/R/B/L/P/Bo Decision tree Contextiﬁer [91] CHI’13 NLTK L - NewsViews [65] CHI’14 OpenCalais M - Datatone [64] UIST’15 NLTK/Stanford Parser S/B/L Template Articulate2 [120], [121] SIGDIAL’16 ClearNLP/Stanford Parser/OpenNLP B/L/H Decision tree Eviza [207] UIST’16 ANTLR M/L/B/S Template TimeLineCurator [63] TVCG’16 TERNIP Tl Template Analyza [52] IUI’17 Stanford Parser B/L Template TSI [26] TVCG’17 Template L/A/H - Ava [133] CIDR’17 Controlled Natural Language S/L/B Template DeepEye [147] SIGMOD’18 OpenNLP P/L/B/S Template Evizeon [85] TVCG’18 CoreNLP M/L/B/S Template Iris [59] CHI’18 Domain-speciﬁc language S/T Template Metoyer et al. [161] IUI’18 CoreNLP I Template Orko [234] TVCG’18 CoreNLP/NLTK N - ShapeSearch [219], [220] VLDB’18 Stanford Parser L ShapeQuery Valletto [104] CHI’18 Spacy B/S/M Template ArkLang [210] IUI’19 Cocke-Kasami-Younger B/G/L/M/P/S/Tr Template Ask Data [2], [245] VAST’19 Proprietary B/G/L/M/P/S/T Template Data2Vis [53] CGA’19 Seq2Seq B/A/L/S/T/St Seq2Seq Hearst et al. [80] VIS’19 WordNet - - Voder [227] TVCG’19 Stanford Parser St/Bo/B/S/D Template AUDiaL [168] LNCS’20 CoreNLP B/L Knowledge Base Bacci et al. [10] LNCS’20 Wit.ai P/L/B/S Template DataBreeze [230] TVCG’20 CoreNLP/NLTK S/Uc Template FlowSense [280] TVCG’20 CoreNLP/SEMPRE L/S/B/M/N/Ta Template InChorus [229] CHI’20 CoreNLP/NLTK L/S/B Template NL4DV Quda [62] arXiv’20 CoreNLP St/B/L/A/P/S/Bo/H Template Sneak Pique [208] UIST’20 ANTLR - - Story Analyzer [164] JCIS’20 CoreNLP Wc/C/T/Fg/B/M/I Template Text-to-Viz [42] TVCG’20 Stanford Parser I Template Vis-Annotator [122] CHI’20 Spacy/Mask-RCNN B/L/P - Sentiﬁers [209] VIS’20 ANTLR - - NL4DV [174] TVCG’21 CoreNLP St/B/L/A/P/S/Bo/H Template Data@Hand [278] CHI’21 Compromise/Chrono B/L/S/Ra/Bo Template Retrieve-Then-Adapt [185] TVCG’21 Template I Template ADVISor [142] PaciﬁcVis’21 End-to-end network B/L/S Template Seq2Vis [150] SIGMOD’21 Neural translation model B/L/S/P Seq2Seq ncNet [149] VIS’21 Seq2Seq B/L/S/P Template Snowy [232] UIST’21 CoreNLP - - × × × × ××× ×××× × ×××××× ×××× × ×××××× × × ×× × ×××××× ×××× × × ××××××××× ×× × ××××××××× ×× × × ×× × ×××× × ×× × ×××× × × × ×××× × × ××××××××× × × × × ×××× × × ××××××××× ×× × × × ××× ××× ×× ×××× × ×××× × × × ×××× × × × ××× ××× × × ××××××× × × × × × × ×××× ×× × × ×× × ×××× × ×× ×××× ×××× × ×××× × × ×××× × ×× ×××× × ×××× × × × × × × ××× × ××× × × × ×××× × ×× × × × ×× × ×××× × ×× × × ×××× × × × ×××× × ×× × × ×××× × ×× × ×××× × ×× × ×××× × ××××× ×××××××× × ×× ×××××× × ×× × ×× ×××××× × × × × ××××××××× ××× ××××××× ××××× × ×× × ×××× × × × ×××× ××××××××××× × × × ×××× × ×× ×× × ×××××× ×××× ×× × ×××××× ×××× × ××××× ×× ××××× * Abbreviations for Visualization Type: Bar(B), Table(Ta), Infographic(I), Scatter(S), Radar(R), Line(L), Pie(P), Boxplot(Bo), Icon(Ic), Map(M), Heatmap(H), Timeline(Tl), Area(A), Network(N), Tree(Tr), Strip(St), Donut(D), Gantt(G), Word clouds(Wc), Force graph(Fg), Range(Ra), Unit column charts(Uc), and Graphics(Gr). IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 Raw data B Data Transformation Visualization-oriented Natural Language Interface Transformed Data C Visual Mapping Visual Structures D View Transformation G Views 4 F NL query A Data Space Human Interaction E Visualization Space Fig. 3. Extension of classic information visualization pipeline proposed by Card et al. [30] with V-NLI. It depicts how V-NLI works in each stage for constructing visualizations. It consists of the following stages: (A) Query Understanding, (B) Data Transformation, (C) Visual Mapping, (D) View Transformation, (E) Human Interaction, (F) Context Management, and (G) Presentation. in V-NLI, which will be discussed in Section 4 - 10. On this basis, we proceeded to a comprehensive analysis of the 55 papers to systematically understand the main research trends. We also collected 283 papers of related works described in Section 2.1. Table 2 lists the representative papers of each topic for reference. We will selectively discuss them with V-NLI characteristics in the subsequent sections. 3 CLASSIFICATION OVERVIEW We brieﬂy recall the main elements of the information visualization pipeline presented by Card et al. [30] (See Figure 3), which describes how the raw data transits into visualizations and interacts with the user. It consists of: (1) Leveraging data transformations to transform raw data into data tables. (2) Transforming data tables into visual structures via the use of visual mapping. (3) Applying view transformations to transform visual structures into views that are visible to the user. (4) User interacts with the visualization interface to feedback into the pipeline with their analytic task. We extend this pipeline with an additional V-NLI layer (Green colored in Figure 3). On this basis, we move forward to develop categorical dimensions by focusing on how V-NLI facilitates the visualization generation. Referring to the classiﬁcation method introduced by McNabb et al. [159], to facilitate the categorization process, the following stages in the pipeline are used: • Query Understanding (A): Since we add the V-NLI layer in the pipeline, query understanding is a foundational stage. Semantic and syntax analysis is generally performed ﬁrst to discover hierarchical structures of the NL queries so that the system can parse relevant information in terms of data attributes and analytic tasks. Due to the vague nature of natural language, dealing with the underspeciﬁed utterances is another essential task in this stage. Details can be found in Section 4. • Data Transformation (B): In the original pipeline, this stage mainly plays the role in transforming raw data into data tables along with various operations (e.g., aggregation and pivot). Since the majority of raw data to analyze has been already in a tabular format, we rename it transformed data. Data transformation is responsible for generating alternative data subsets or derivations for visualization. All operations on the data plane belong to this stage. Details can be found in Section 5. • Visual Mapping (C): This stage focuses on how to map the information extracted from the NL queries to visual structures. Three elements of visual mapping for information visualization are spatial substrate, graphical elements, and graphical properties [30]. Spatial substrate is the space to create visualizations and it is important to consider the layout conﬁguration to apply. Graphical elements are the visual elements appearing in the spatial substrate (e.g., points, lines, surfaces, and volumes). Graphical properties can be implemented on the graphical elements to make them more noticeable (e.g., size, orientation, color, textures, and shapes). Details can be found in Section 6. • View Transformation (D): View transformation (rendering) transforms visual structures into views by specifying graphical properties that turn these structures into pixels. Common forms include navigation, animation, and visual distortion (e.g., ﬁsheye lens). However, in the context of our survey, this stage is rarely involved in V-NLI. Details can be found in Section 7. • Human Interaction (E): Human interactions with the visualization interface feed back into the pipeline. A user can connect with a visualization manually by modifying or transforming a view state, or by reviewing the use, effectiveness, and their knowledge on the visualization. Dimara et al. [54] deﬁned interaction for visualization as: “The interplay between a person and a data interface involving a data-related intent, at least one action from the person and an interface reaction that is perceived as such.” Describing interaction requires all of the mandatory interaction components: interplay, person, and data interface. Details can be found in Section 8. • Context Management (F): Interpreting an utterance contextually is an essential requirement around visualization system intelligence, which is particularly evident in V-NLI. This stage involves each step in the pipeline and concentrates on facilitating a conversation with the system based on the current visualization state and previous utterances. Details can be found in Section 9. • Presentation (G): We name this stage as Presentation. The classic pipeline focuses on how to generate visualizations but ignores presenting them to the user. With natural language integrated in the pipeline, the vast majority of V-NLI systems accept natural language as input and directly display generated visualizations. Furthermore, complementing visualizations with natural language can provide additional surprises to the user. Details can be found in Section 10. 4 QUERY UNDERSTANDING Query understanding is the foundation of all subsequent stages. In this section, we will discuss how to perform semantic and syntax analysis of the input natural language queries, infer the analytic task of the user and data attributes to be analyzed, and make default for underspeciﬁed utterances. 4.1 Semantic and Syntax Analysis Semantic and syntax analysis can be powerful to discover hierarchical structures and understand meanings in human language. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 5 TABLE 4 Comparison of commonly used NLP Toolkits Toolkit CoreNLP [156] NLTK [21] OpenNLP [1] SpaCy [83] Stanza [184] Flair [7] GoogleNLP [3] Programming language Java Tokenization Sentence segmentation Part-of-speech tagging Parsing Lemmatization Named entities recognition Coreference resolution Entity linking Chunker - Sentiment Text classiﬁcation - Train custom model - Python Java Python Python Python Python - - - - - - - - - - - - - - - - Fig. 4. Semantics parsing in FlowSense [280], the derivation of the query is shown as a parse tree. The semantic parser can conduct a series of NLP sub-tasks on the query string of V-NLI to extract valuable details that can be used to detect relevant phrases. The processing steps include tokenization, identifying parts-of-speech (POS) tags, recognizing name entities, removing stop words, performing stemming, creating a dependency tree, generating N-grams, etc. For example, Flowsense [280] is a natural language interface designed for a dataﬂow visualization system [279]. It applies a semantic parser with special utterance (table column names, node labels, node types, and dataset names) tagging and placeholders. Figure 4 displays a parse tree for the derivation of the user’s query. The ﬁve major components of a query pattern and its related parts are highlighted by a unique color, and the expanded sub-diagram is illustrated at the bottom. Flowsense is powered by Stanford SEMPRE framework [180], [288] and CoreNLP toolkit [156]. The development of advanced NLP toolkits [1], [3], [7], [21], [83], [156], [184] allows developers to quickly integrate NLP services into their systems. As semantic and syntax analysis is a basic task for V-NLI, almost all the systems support semantic parse by directly using existing NLP toolkits, such as CoreNLP [156], NLTK [21], OpenNLP [1], SpaCy [83], Stanza [184] Flair [7], and GoogleNLP [3], as listed in Table 3 (Column NLP Toolkit or Technology). We also sort out commonly used characteristics of existing NLP toolkits in Table 4 for reference. 4.2 Task Analysis 4.2.1 Task modeling There is a growing body of literature recognizing that the user’s analytic task is vital for automatic visualization creation [24], [109], [114], [193], [196]. As for task modeling, there have been many prior efforts to provide deﬁnitions of the analytic tasks, laying the foundation for follow up research on visualization design. For instance, Amar et al. [9] proposed ten low-level analytic tasks that capture the user’s activities while employing visualization tools for data exploration. The ten tasks are later extensively applied in numerous visualization systems [114], [141], [167], [196], [214], [227], [248], [257], as listed in Table 5. Saket et al. [196] evaluated the effectiveness of ﬁve commonly used chart types across the ten low-level tasks [9] by a crowdsourced experiment and derived recommendations on which visualizations to choose based on different tasks. Kim et al. [114] measured subject performance across task types derived from the ten tasks [9] and added compare values task in addition. The analytic tasks are further grouped into two categories: value tasks that just retrieve or compare individual values and summary tasks that require identiﬁcation or comparison of data aggregation. NL4DV [174] additionally includes a Trend task. AutoBrief [108] further enhances visualization systems by introducing domain-level tasks while Matthew et al. [24] contributed a multi-level typology of visualization tasks. Deep into scatter charts, Sarikaya et al. [198] collected model tasks from a variety of sources in data visualization literature to formulate the seeds for scatterplot-speciﬁc analytic task list. Recently, Shen et al. [213] summarized 18 classical analytic tasks by a survey covering both academia and industry. They further proposed a task-oriented recommendation engine based on detailed modeling of the analytic tasks. 4.2.2 Task inference Although considerable previous works focus on task modeling, before the emergence of natural language interfaces, few visualization systems have attempted to infer the user’s analytic task. HARVEST [72] monitors the user’s click interactions for implicit signals of user intent. Steichen et al. [236] and Gingerich et al. [69] used eye-gazing patterns of users while interacting with a given visualization to predict the user’s analytic task. Battle et al. [16] investigated the relationship between latency, task complexity, and user performance. However, these behavior-based systems are limited to a small number of pre-deﬁned tasks, generalization for automatic visualization systems does not exist yet. Rather than inferring the analytic task through the user’s behavior, systems supporting NL interaction depend on analyzing the user’s intent through understanding the NL utterances since it may hint at the user’s analysis goals. Most systems infer the analytic tasks by comparing the query tokens to a predeﬁned list of task keywords [64], [85], [174], [207], [241], [280]. For example, NL4DV [174] identiﬁes ﬁve low-level analytic tasks: Correlation, Distribution, Derived Value, Trend, and Filter. A task keyword list is integrated internally (e.g., Correlation task includes ‘correlate’, ‘relationship’, etc., Distribution task includes ‘range’, ‘spread’, IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 6 TABLE 5 Ten low-level analytic tasks proposed by Amar et al. [9]. They have been widely applied for automatic visualization creation. Task Description Representative papers Characterize Distribution Given a set of data cases and a quantitative the distribution of that attribute’s values. attribute of interest, characterize [9], [62], [142], [174], [196], [198], [228], [257], [280] Cluster Given a dataset, ﬁnd clusters of similar attribute values. [9], [62], [196], [214], [228], [280] Compute Derived Value Given a dataset, compute an aggregate numeric representation of the data. [9], [62], [88], [142], [196], [228], [257], [280] Correlate Given a dataset and two attributes, determine useful relationships between [9], [62], [88], [142], [174], [196], [198], [214], [228], the values of those attributes. [257], [280] Determine Range Given a dataset and a data attribute, ﬁnd the span of values within the set. [9], [62], [196], [228], [280] Filter Find data cases satisfying the given concrete conditions on attribute values. [9], [62], [174], [196], [228], [257], [280] Find Anomalies Identify any anomalies within a given set of data cases with respect to a given relationship or expectation, e.g. statistical outliers. [9], [62], [87], [196], [198], [214], [228], [257], [280] Find Extremum Find data cases possessing an extreme value of an attribute over its range. [9], [62], [87], [114], [142], [196], [228], [257], [280] Retrieve Value Given a set of speciﬁc cases, ﬁnd attributes of those cases. [9], [62], [87], [114], [174], [196], [228], [257], [280] Sort Given a dataset, rank them according to some ordinal metric. [9], [62], [196], [228], [257], [280] etc.). NL4DV also leverages POS tags and query parsing results to model relationship between query phrases and populate task details. For conversational analytics systems, Nicky [116] interprets task information in conversations based on a domain-speciﬁc ontology. Evizeon [85] and Orko [234] support follow-up queries through conversational centering [73], which is a model commonly used in linguistic conversational structure, but only for ﬁlter task. ADE [39] supports the analytic process via task recommendation invoked by inferences about user interactions in mixed-initiative environments. The recommendation is also accompanied with a natural language explanation. Instead of simply matching keywords, Fu et al. [62] maintained a dataset of NL queries for visual data analytics and trained a multi-label task classiﬁcation model based on an advanced pre-trained model, BERT [51]. Although most V-NLI systems support task analysis, as shown in Table 3 (Column Task Analysis), the task types integrated into the system are limited (e.g., a subset of ten low-level tasks [9]). More tasks with hierarchical modeling should be considered to better cover the user’s intent. Besides, rulebased approaches still account for the majority, various advanced learning models provide a practical opportunity to infer analytic tasks in an unbiased and rigorous manner. 4.3 Data Attributes Inference A dataset contains numerous data attributes, however, the user may only be interested in several certain data attributes in a query. The systems should be able to extract data attributes that are mentioned both explicitly (e.g., directly refer to attribute names) and implicitly (e.g., refer to an attribute’s values or alias). Illustrated by an example, NL4DV [174] maintains an alias map and allows developers to conﬁgure aliases (e.g., GDP for Gross domestic product, investment for production budget). It iterates through the generated N-grams in Section 4.1, checking for both syntactic and semantic similarity between N-grams and a lexicon composed of data attributes, aliases, and values. For syntactic similarity, NL4DV adopts cosine similarity function; for semantic similarity, it computes Wu-Palmer similarity score based on WordNet [163]. If the similarity score reaches the threshold, the corresponding data attributes will be further analyzed and presented on the visualization. Most V-NLI systems have taken a similar approach. Besides, Analyza [52] utilizes additional heuristics to derive information from data attributes and expands the lexicon with a proprietary knowledge graph. Recently, Liu et al. [142] proposed a deep learning-based pipeline, ADVISor. It uses BERT [51] to generate the embeddings of both the NL query and the table headers, which are then used by a deep neural networks model to decide data attributes (columns), ﬁlter conditions (rows), and aggregation type. Closer to ADVISor, Luo et al. [149] presented an end-to-end solution using a transformer-based [251] sequenceto-sequence (seq2seq) model [12] and it supports more complex data transformation types such as relational join, GroupBY, and OrderBY. In the ﬂow of visual analytical conversations, data attributes can also be extracted through co-reference resolution, which will be discussed in Section 9.2. 4.4 Default for Underspeciﬁed Utterances Numerous systems support ﬁlter task [64], [85], [174], [207], [280], aiming at both selecting data attributes and data ranges. An input query would be underspeciﬁed if it lacks enough information for the system to infer data attributes and perform ﬁltering. Presenting default in the interface by inferring the underspeciﬁed utterances can be an effective option to address this issue. However, the area has received relatively little attention as shown in Table 3 (Column Default for Underspeciﬁed Utterances). Within the research, vague modiﬁer like “tall” and “cheap” is a prevalent kind of underspeciﬁcation in human language. Hearst et al. [80] made the ﬁrst step toward design guidelines for dealing with vague modiﬁers based on existing cognitive linguistics research and a crowdsourcing study. Tableau’s Ask Data [2], [245] internally leverages lightweight and intermediate language Arklang [210] to infer underspeciﬁed details. It emphasizes how the linguistic context of the previous utterance affects the interpretation of a new utterance. Sentiﬁers [209] can determine which data attributes and ﬁlter ranges to associate the vague predicates using word cooccurrence and sentiment polarities. As shown in Figure 5, when analyzing the earthquakes dataset, the user inputs “where is it unsafe” and Sentiﬁers automatically associates “unsafe” with the magnitude attribute. A top N ﬁlter of magnitude 6 and higher is applied and similar negative sentiment polarities are marked red on the map. Although useful, combinations of vague modiﬁers and some more complex interpretations are still not supported in current V-NLI. So when encountering ambiguity, in addition to formulating a sensible default, human interaction (e.g., ambiguity widgets) is another suitable method to ﬁnally determine the data attributes, which will be discussed in Section 8.1. 5 DATA TRANSFORMATION After extracting data attributes, various data transformation operations can be made to transform raw data into focused data. In this section, we will introduce how V-NLI systems transform raw data for data insights, with additional discussions on a closely related topic, NLI for database. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 7 Fig. 5. Vague modiﬁer interpretation in Sentiﬁers [64]. The system associates the vague modiﬁer “unsafe” with magnitude attribute. 5.1 Transformation for Data Insights A consensus is that the purpose of visualization is insight, not pictures. So in visualization systems, the raw data should be transformed (e.g., aggregation, binning, and grouping) ﬁrst in order to explore data insights and derive a data structure that can be used for visualization. A “boring” dataset may become interesting after data transformations. Fortunately, the common-used visualization speciﬁcation languages (e.g., Vega [200], Vega-Lite [199], and VizQL [237]) all support data transformation primitives, which is convenient for developers to perform data transformations. Identifying the data transformation intent of the user is a key characteristic of V-NLI. To describe the related works, we categorize data into four types: temporal, quantitative, nominal, and ordinal, which are widely adopted in the visualization community [167]. For temporal data, systems can support binning based on temporal units [63], [147], [164], [168], [174], [207], [280]. For instance, Setlur et al. [207] developed a temporal analytics module based on the temporal entities and expressions deﬁned in TimeML [93]. To parse temporal expressions, the module incorporates the following temporal token types: Temporal Units (e.g., years, months, days, hours and minutes, and their abbreviations.), Temporal Prepositions (e.g., ‘in’, ‘during’, ‘near’, and ‘around.’), and Temporal Connectives (e.g., ‘before’ and ‘after’). For quantitative data, various aggregation functions (e.g., count, average, and sum) can be performed, binning and grouping operations are also commonly used [10], [39], [42], [52], [64], [85], [121], [147], [158], [179], [207], [210], [227], [280], [290]. For example, DeepEye [147] includes various aggregation functions in the visualization language. Arklang, which is an intermediate language to resolve NL utterances to formal queries, integrates a ﬁnite set of aggregation operators. Eviza [207] enables aggregation using regular spatial bins. Different from continuous data, nominal and ordinal data are usually used as grouping metrics. If the data transformation information can be explicitly extracted from the user’s NL queries, the back-end engine can directly perform calculations and visualize the results to users. If not, despite advances in technologies for data management and analysis [17], [68], it remains time-consuming to inspect a dataset and construct it to a visualization that allows meaningful analysis to begin. So a practical system should be in a position to automatically recommend data insights for users. To deal with the problem, various systems were designed to generate data facts in the context of visual data exploration, such as DataSite [43], Voder [227], Wrangler [100], SeeDB [249], QuickInsights [55], Foresight [47], and SpotLight [77]. They can serve as backend engines for insight-driven recommendation. 5.2 NLI for Database Natural language interface for database (NLIDB), or Text-toSQL, is a task to automatically translate a user’s query text in natural language into an executable query for database like SQL [6]. NLIDB is strongly related to data transformation in V-NLI and V-NLI can augment NLIDB with effective visualizations of query results. Besides, not all the queries about a dataset need a visualization as a response. For instance, one might ask “What was GDP of the USA last year?” or “Which country won the most gold medals in the Tokyo Olympics?”. In such cases, NLIDB can directly compute and present values in response to the user’s questions instead of displaying through visualizations that require the user to interpret for answers. Generally, there are three sorts of methods for NLIDB, among which are symbolic approach, statistical approach, and connectionist (neural network) approach [176]. The symbolic approach utilizes explicit linguistic rules and knowledge to process the user’s query, which dominates the area in the early years. DISCERN [223] is a representative example and integrates a model which extracts information from scripts, lexicon, and memory. Statistical approach exploits statistical learning methods such as hidden Markov models [140] and probabilistic context-free grammars [89]. From the view of the connectionist approach, Text-to-SQL task is a kind of special machine translation (MT) problem but is hard to directly handled with a standard endto-end neural machine translation (NMT) model because of the absence of detailed speciﬁcation in the user’s query [75]. Another problem is the out-of-domain (OOD) mentioned in the user’s query because of the user’s unawareness of the ontology set. Aiming at the two problems above, IRNet [75] proposes a tree-shaped intermediate representation synthesized from the user’s query which is later fed into an inference model to generate SQL statements deterministically using a domain knowledge base. Furthermore, to capture the special format of SQL statement, SQLnet [273] adopts a universal sketch as template and predicts values for each slot, while [255] employs a two-stage pipeline to subsequently predict the semantic structure and generate SQL statement with structure-enhanced query text. Recently, TaPas [56], [82] extends BERT’s architecture to encode tables as input and trains from weak supervision. During the development of the community, there also generated some benchmarks like WikiSQL [293] and Spider [281], which can be utilized for further V-NLI research. For more details, please refer the survey [6], [296] and related papers in Table 2. 6 VISUAL MAPPING In this section, we will introduce how V-NLI performs visual mapping in three aspects: spatial substrate, graphical elements, and graphical properties. 6.1 Spatial Substrate Spatial substrate is the space to create visualizations. It is important to determine the layout conﬁguration to apply in the spatial substrate, such as which axes to use. Some V-NLI systems support explicitly specifying layout information like inputting “show GDP series at y axis and year at x axis grouped by Country Code”. If the mapping information is not clear in the query, the search space will be very huge. However, some combinations of data attributes and visual encodings may not generate a valid visualization. For instance, the encoding type “Y-axis” is not appropriate for categorical attributes. Fortunately, there are many design rules IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 8 TABLE 6 Mappings between data attribute combinations (N: Numeric, C: Categorical), analytic tasks, and chart types in Voder [227]. Attributes N C NxN CxN CxC NxNxN CxNxN CxCxN Task(s) Visualization Find Extremum Strip plot Strip plot Characterize Distribution Box plot Histogram Find Anomalies Strip plot Box plot Find Extremum Bar chart Characterize Distribution Domut chart Corelation Characterize Distribution Scatterplot Characterize Distribution(+Derived Value) Bar chart Find Extremum(+Derived Value) Domut chart Find Extremum Strip plot Scatterplot Find Extremum Stacked bar chart Characterize Distribution(+Find Extremum) Scatterplot+Size Characterize Distribution Scatterplot+Size Correlation Scatterplot Correlation(+Filter) Scatterplot+Color Characterize Distibution(+Filter) Scatterplot+Color Scatterplot+Size Find Extremum Strip plot+Color Scatterplot+Color Strip plot+Color Find Extremum(+Derived Value) Scatterplot+Color Scatterplot+Size either from traditional wisdom or from the users to help prune meaningless visualizations. These design rules are typically given by experts. Voyager [264], [265], [266], DIVE [88], Show me [153], Polaris [237], Proﬁler [101], DeepEye [148], and Draco [167] all contribute to the enrichment of design rules with more data types. Besides, many systems [167], [250], [266] also allow users to specify their interested data attributes and assign them on certain axes, which is a more direct way to perform visual mapping. In Falx [253], users can specify visualizations with examples of visual mapping, and the system automatically infers the visualization speciﬁcation and transforms data to match the design. With the development of machine learning in recent years, advanced models can be applied for more effective visual mapping. VizML [86] identiﬁes ﬁve key design choices while creating visualizations, including mark type, X or Y column encoding, and whether an X or Y is the single column represented along that axis or not. For a new dataset, 841 dataset-level features extracted are fed into neural network models to predict the design choices. Luo et al. [149] proposed an end-to-end approach that applies transformer-based seq2seq [12] model to directly map NL queries and data to chart templates. In addition to single visualization, considering multiple visualizations, Evizeon [85] uses a grid-based layout algorithm to position new charts while Voder [227] allows the user to choose a slide or dashboard layout. 6.2 Graphical Elements Graphical element, which is usually named mark or chart type, is an important part of visualization. Choosing an appropriate mark can convey information more efﬁciently. Similar to decide the layout, some V-NLI systems allow users to specify marks like inputting “create a scatterplot of mpg and cylinders”, which is a simple way to map mark information of visualizations. However, in most cases, the mark information is not accessible. So after parsing the NL queries, most systems integrate predeﬁned rules or leverage Visualization Recommendation (VisRec) technologies to Fig. 6. User interface of Orko [234] to explore a network of European soccer players. deal with the extracted elements. Voyager [264], [265], [266] applies visualization design best practices drawn from prior research [152], [153], [194], [205] and recommends mark types based on the data types of the x and y channels. Srinivasan et al. [227] developed the heuristics and mappings between data attribute combinations, analytic tasks, and chart types in Table 6 through iterative informal feedback from fellow researchers and students. TaskVis [213] integrates 18 classical low-level analysis tasks with their appropriate chart types by a survey both in academia and industry. Instead of considering common visualization types, Wang et al. [256] developed rules for automatic selection of line graph or scatter plot to visualize trends in time series, while Sarikaya et al. [198] deepened into scatter charts. To better beneﬁt from design guidance provided by empirical studies, Moritz et al. [167] proposed a formal framework that models visualization design knowledge as a collection of answer set programming constraints. 6.3 Graphical Properties Jacques Bertin ﬁrst identiﬁed seven “retinal” graphical properties (visual encoding properties) in visualization: position, size, value, texture, color, orientation, and shape. Some other types are later expanded in the community, such as “gestalt” properties (e.g., connectivity and grouping) and animation [152], [246]. In visual mapping of V-NLI, color, size, and shape channel are most commonly applied to the graphical elements which make them more noticeable. The rule-based approach is still dominant here and human perceptual effectiveness metrics are usually considered. For example, the carnality of variables in the color/size/shape channel should not be too high (e.g. 100), otherwise, the chart would be too mass to distinguish. The aforementioned works also developed various design rules for graphical properties, especially for color, size, and shape channel in legend [88], [101], [148], [153], [167], [213], [237], [265], [266]. Besides, Text-to-Viz [42] integrates a color module that aims to generate a set of color palettes for a speciﬁc infographic. Similarly, InfoColorizer [282] recommends color palettes for infographics in an interactive and dynamic manner. Wu et al. [269] proposed a learning-based method to automate layout parameter conﬁgurations, including orientation, bar bandwidth, max label length, etc. Liu et al. [145] explored data-driven mark orientation for trend estimation in scatterplots. CAST [66] and Data Animator [243] enables interactive creation of chart animations. 7 VIEW TRANSFORMATION After visual mapping, the generated visualization speciﬁcations can be rendered through a library (e.g., D3 [23]). View transformations IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 9 Fig. 8. Autocompletion in Sneak pique [208]. The user is prompted with autocompletion widgets that provide appropriate previews of the underlying data in various types. Fig. 7. Ambiguity widgets in Datatone [64]. Users can correct the imprecise system decisions caused by ambiguity. are also supported here. The three commonly used types of view transformation are location probes which reveal additional information through locations, viewpoint controls that scale or translate a view, and distortions that modify the visual structure [30]. Surprisingly, this stage is rarely used in the context of our survey. To our knowledge, there have been no works that focus on natural language interaction for view transformation. It is usually realized through multimodal interaction, as shown in Table 3 (Column View Transformation). For example, Orko [234] facilitates both natural language and direct manipulation input to explore network visualization. Figure 6 is the user interface of Orko and it shows an example that explores a network of European soccer players. When the user says “Show connection between Ronaldo and Bale”, the system will zoom to display the details of nodes. It also allows users to perform view transformation (e.g., zooming and panning) through ﬁnger or pen. Similarly, Valletto [104], InChorus [229], Data@Hand [278], and DataBreeze [230] all support view transformation through multimodal interaction. Besides, the software visualization community also presents several closely related works. Bieliauskas et al. [20] proposed an interaction approach with software visualizations based on a conversational interface, like Orko [234], it can automatically display best ﬁtting views according to meta information from natural language sentences. Seipel et al. [203] explored natural language interaction for software visualization in Augmented Reality (AR) with Microsoft HoloLens device, which can provide various view transformation interactions for users through gesture, gaze, and speech. In general, view transformations combined with V-NLI are currently mainly focused on viewpoint navigation. Future research can focus more on other aspects, such as animation [243], data-GIF [217], and visual distortions [177]. 8 HUMAN INTERACTION This section will discuss how V-NLI allows the user to provide feedbacks to a visualization with human interactions. The main purpose is to help users better express their intents. The commonly applied methods include ambiguity widgets, autocompletion, and multimodal interaction. 8.1 Ambiguity Widgets Due to the vague nature of natural language, the system may fail to recognize the user intent and extract data attributes [160]. Considerable research has been devoted to addressing the ambiguity and underspeciﬁcation of NL queries. Mainstream approaches can be divided into two categories. One is to generate appropriate default by inferencing underspeciﬁed natural language utterances, which has been discussed in Section 4.4. The other is to return the decision right to users through ambiguity widgets. Ambiguity widget is a kind of interactive widget that allows users to input through a mouse. Datatone [64] ﬁrst integrates ambiguity widgets and presents a mixed-initiative approach to manage ambiguity in the user query. As shown in Figure 7, for the query in (b), DataTone detects data ambiguities about medal, hockey, and skating. Three corresponding ambiguity widgets are presented for users to correct the system choices in (c). Additional design decision widgets like color and aggregation are also available in (e). Eviza [207], Evizeon [85], NL4DV [174], and AUDiaL [168] all borrow the idea of DataTone, and expand the ambiguity widgets to richer forms (see Table 3 (Column Ambiguity Widgets)), such as maps and calendars. DIY [173] enables users to interactively assess the response from a NLIDB system for correctness. However, most systems now still leverage heuristics for algorithmic resolution of ambiguity, lacking a general probabilistic framework. 8.2 Autocompletion and Command Suggestion Users may be unaware of what operations the system can perform and whether a speciﬁc language structure is preferred in the system. Although advanced text understanding models give users the freedom to express their intents, system discoverability to help formulate analytical questions is still an indispensable part of V-NLI. Discoverability entails awareness and understanding. Awareness means making users aware of what operations the system can perform; understanding means educating users about how to phrase queries that can be interpreted correctly by the system [226]. Generally, current V-NLI systems pay relatively little attention to the discoverability of the system. Major attempts include autocompletion and command suggestion. This characteristic offers interpretable hints or suggestions matching the visualizations and datasets, which is considered a fruitful interaction paradigm for information sense-making. When using V-NLI of Tableau [2], [245] or Power BI [5], autocomplete content will be presented as we are typing, especially when there is a parsing error. They are mostly reminders of commonly used queries, such as data attributes and aggregation functions. When the user has formulated a valid query, suggestions will not intrusively appear. Similarly, Bacci et al. [10] and Yu et al. [280] both adopted template-based approaches to integrate the autocompletion characteristic into their V-NLIs. In addition to text prompts, data previews can be more useful across all autocompletion variants. Deeper into this area, three crowdsourcing studies conducted by Setlur et al. [208] indicated IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 10 Fig. 9. Multimodal interaction interface in InChorus [229], it allows pen, touch, and speech input on tablet devices. that users preferred widgets for previewing numerical, geospatial, and temporal data while textual autocompletion for hierarchical and categorical data. On this basis, they built a design probe, Sneak Pique. As shown in Figure 8, when analyzing a dataset of coronavirus cases, the user is prompted with widgets that provide appropriate previews of the underlying data in various types. The system also supports toggling from a widget to a corresponding text autocompletion dropdown to drill down to hierarchical and categorical data. Most recently, Srinivasan et al. [232] proposed Snowy, a prototype system that generates and recommends utterance recommendations for conversational visual analysis by suggesting data features while implicitly making users aware of which input an NLI supports. However, current utterance realization approaches are all template-based, which just works effectively for a small set of tasks. It can be rather challenging to work for large-scale systems. Among all these systems, another important consideration that needs further research is which commands to show, and when and how the suggestions will be presented. 8.3 Multimodal Interaction Fig. 10. Conversational V-NLI in Evizeon [85]. The user has a back-andforth information exchange with the system. with ﬂexible unit visualizations, enabling a novel data exploration experience. RIA [290] is an intelligent multimodal conversation system to aid users in exploring large and complex datasets, which is powered by an optimization-based approach to visual context management. Data@Hand [278] leverages the synergy of speech and touch input modalities for personal data exploration on the mobile phone. Srinivasan et al. [226] proposed to leverage multimodal input to enhance discoverability and human-computer interaction experience. It’s obvious that natural language interface is an essential part in the context of post-WIMP interaction with visualization systems. Typing and speech are both commonly used modalities to input natural language, however, speech has unique challenges from a system design aspect, such as triggering of speech input, lack of assistive features like autocompletion, and transcription errors [128], which deserve more attention in the future. Besides, little research has incorporated human gesture recognition and tracking technology [95] to facilitate visualization creation, especially with the large display. Van Dam [247] envisioned post-WIMP user interfaces as “one containing at least one interaction technique not dependent on classical 2D widgets such as menus and icons.” With the advancements in hardware and software technologies that can be used to support novel interaction modalities, researchers are empowered to take a step closer to the post-WIMP user interfaces, enabling users to focus more on their tasks [128]. A qualitative user study conducted by Saktheeswaran et al. [197] also found that participants strongly prefer multimodal input over unimodal input. In recent years, much effort has begun to examine how multiple input forms (e.g., mouse, pen, touch, keyboard, and speech) can be combined to provide more natural and engaging user experience. For instance, Orko [234] is a prototype visualization system that combines both natural language interface and direct manipulation to assist visual exploration and analysis of graph data. Valletto [104] allows users to specify visualizations through a speech-based conversational interface, multitouch gestures, and a conventional GUI interface. InChorus [229] is designed with the goal of maintaining interaction consistency across different visualizations, and it supports pen, touch, and speech input on tablet devices. Its interface components are shown in Figure 9, including typical WIMP components (A, B, C), speech command display area (D), and content supporting pen and touch (E, F, G). DataBreeze [230] couplings pen, touch, and speech-based multimodal interaction 9 CONTEXT MANAGEMENT This section will discuss context management in V-NLI. Given the conversational nature of NLIs, users may frequently pose queries depending on their prior queries. By iterating upon their questions, the user can dive deep into their interested aspects of a chart and reﬁne existing visualizations. 9.1 Analytical Conversations Conversional NLIs have been widely applied in many ﬁelds to prompt users to open-ended requests, such as recommender systems [103] and intelligent assistant [221]. With conversational agent, users can also reinforce their understanding of how the system parses their queries. In the visualization community, Eviza [207] and Evizeon [85] are two representative visualization systems that provide V-NLI for visual analysis cycles. Figure 10 shows a backand-forth information exchange between Evizeon [85] and the user when analyzing the measles dataset. The system supports various forms of NL interactions with a dashboard applying pragmatic principles. For example, when the user ﬁrst types “measles in the UK”, all the charts presented are ﬁltered to cases of measles in the United Kingdom (Figure 10(a)). Then, the user types “show me the orange spike”, and Evizeon will add details to the spike in IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 11 TABLE 7 Examples of co-reference types Type Example Resolution PronounNoun Show me the most popular movie in New York and its rating. its → movie’s NounNoun How many Wal-Mart stores in Seattle and how’s the annual proﬁts for the malls? malls → Wal-Mart PronounPronoun The manchurian tigers have been heavily hunted, which caused a dramatic drop of their existing number and they may ﬁnally get extinct. they, their → manchurian tigers Pronoun- We are all the Canadians while Lily was born in the We → Canadians, Pronoun U.S. and she immigrated to Canada two years ago. she → Lily Fig. 11. Conversational transitions model [245] that describes how to transition a visualization state during an analytical conversation. the orange line as it understands that this is a reference to visual properties of the line chart (Figure 10(b)). Similarly, Articulate2 [121] is intended to transform the user queries into visualizations automatically using a full-ﬂedged conversational interface. Analyza [52] combines two-way conversation with a structured interface to enable effective data exploration. Besides, Fast et al. [59] proposed a new conversational agent, Iris, which can generate visualizations from and combine the previous commands, even non-visualization related. Ava [133] uses controlled NL queries to program data science workﬂow. Empirical studies conducted by Hearst et al. [79] showed that when asking questions in such conversational interfaces, participants tended to prefer additional contents beyond the exact answers, which had signiﬁcant implications for designing interactive conversation interfaces in the future. 9.2 Co-reference Resolution Co-reference resolution (CR) is the task of ﬁnding all linguistic expressions (called mentions) in a given text that refers to the same real-world entity [238]. CR is an important sub-task in context management as the user usually uses pronouns to refer to certain visual elements. A common scenario is to identify the entity to which a pronoun in the text refers, while the task also involves recognizing co-reference relations between multiple noun phrases and so on [183]. Table 7 shows examples of co-reference types. Before the booming development of deep-learning-based language models, human-designed rules, knowledge, and features dominate CR tasks [286]. Nowadays most state-of-the-art CR models are neural networks and employ an end-to-end fashion, where the pipeline includes encoding context, generating representations for all potential mentions, and predicting co-reference relations [130] [97]. Beneﬁted from powerful understanding and predicting ability of language models like BERT [51] and GPT-3 [25], reliance on annotated span mentions is becoming weaker [117], and CR task can even be simply treated as a token prediction task [118] with which there are several existing models to deal. Additionally, the usage of structured knowledge base [285] [144] and higher-order information [131] have been proven to be beneﬁcial to the overall performance of CR task. Co-reference resolution is also an essential task in the multimodal sense as users may even pose queries that are followups to their direct manipulations on the interface. When the user faces a graphical representation, each element in the representation can become a referent. To make multimodal interaction more smooth, the system should have a declarative representation for all these potential referents and ﬁnd the best match. Articulate2 [120] addresses this issue by leveraging Kinect to detect deictic gestures on the virtual touch screen in front of a large display. If referring expressions are detected and a gesture has been detected by Kinect, information about any objects pointed to by the user will be stored and then the system can ﬁnd the best match between properties of each relevant entity. The properties of visualizations and objects keeping track of include statistics and trends in the data, title, mark, and any more prominent objects within the visualization (e.g., hotspots, street names, bus stops, etc.). In multimodal systems [104], [197], [226], [229], [230], [234], [278], users can focus on certain visual elements in the visualization by selecting with mouse, pen, or ﬁnger. The follow-up queries will automatically link to related visual elements or data. 9.3 Conversational Transitions Grosz et al. [73] explored how the context of a conversation adjusts overtime to maintain coherence through transitional states (retain, shift, continue, and reset). On this basis, Tory and Setlur [245] ﬁrst proposed a conversational transitions model (See Figure 11) that describes how to transition visualization states during an analytical conversation. The model emerged from their analysis in the design of Tableau’s natural language interface feature, Ask data. After interpreting a generated visualization, a user may formulate a new NL query to continue the analytical conversation. The user’s transitional goal means how the user wishes to transform the existing visualization to answer a new question. The model contains the following transitional goals: elaborate, adjust/pivot, start new, retry, and undo, which in turn drive user actions. The visualization states (select attributes, transform, ﬁlter, and encode) are ﬁnally updated and new visualizations are presented to the user. An important insight during the analysis of Tory and Setlur was that applying transitions to ﬁlters alone (like Evizeon [85] and Orko [234]) is insufﬁcient, the user’s intent around transitions may apply to any aspects of a visualization state. A more intelligent system should infer the user’s transitional goals based on their interactions and then respond around each visualization state accordingly. 10 PRESENTATION In most cases, V-NLI accepts natural language as input and outputs well-designed visualizations. As shown in Table 3 (Column Visualization Type), the output is not limited to traditional charts (e.g. Scatter charts, bar charts, and line charts) but also involves other richer forms (e.g. Maps, trees, networks, and infographics). In addition to just present generated visualizations, nowadays, an emerging theme in the visualization community is to complement visualizations with natural language. The previous sections have discussed in detail natural language as an input modality. This section will introduce using natural language as an output modality. 10.1 Annotation Data are just a collection of numbers until we turn it into a story. Annotation plays an important role to explain and emphasize key points in the dataset. The systems should generate valuable IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 12 Fig. 12. Visualizations with annotations automagically generated by VisAnnotator [122]. descriptions and map the text to a visualization appropriately. Annotation tools were ﬁrst applied to complement news articles. Kandogan [102] introduced the concept of just-in-time descriptive analytics that helps users easily understand the structure of data. Given a piece of news, Contextiﬁer [91] ﬁrst computes clusters, outliers, and trends in line graphs and then automatically produces annotated stock visualizations based on these information. Although it offers a promising example of how human-produced news visualizations can be complemented in a speciﬁc context, it is only suitable for stock sequence data. NewsViews [65] later extends Contextiﬁer to support more types of appropriate data such as time series and georeferenced data. In addition, several works incorporate the visual elements in the chart into annotations, synchronizing with textual descriptions. As shown in Figure 12, Vis-Annotator [122] leverages Mask R-CNN model to identify visual elements in the target visualization, along with their visual properties. Textual descriptions of the chart are synchronously interpreted to generate visual search requests. Based on the identiﬁed information, each descriptive sentence is displayed beside the described focal areas as annotations. Similarly, Click2Annotate [34] and Touch2Annotate [35] are both semiautomatically annotations generators. To be interactive, Calliope [215] and ChartAccent [192] support creating stories via interactive annotation generation and placement. Most recently, ADVISor [142] can generate visualizations with annotations to answer the user’s NL questions on tabular data. However, the scalability of current annotation works is weak as the template-based method accounts for the majority. Smarter NLP models can be leveraged to improve system usability. 10.2 Narrative Storytelling Narrative storytelling gives data a voice and helps communicate insights more effectively. A data story means a narrative to the data that depicts actors and their interactions, along with times, locations, and other entities [202]. An important problem for narrative storytelling is how to construct various visual and textual elements in a visualization. To address this issue, template is the most commonly applied method. For instance, Text-to-Viz [42] is designed to generate infographics from NL statements with proportion-related statistics. It builds 20 layout blueprints that describe the overall look of the resulting infographics based on prior layout taxonomies [11], [202]. At the synthesis step, for each layout blueprint, Text-to-Viz enumerates all extracted segments by text analyzer and then generates all valid infographics. DataShot [258], TSIs [26], Chen et al. [37], and Retrieve-Then-Adapt [185] all use template-based approaches to involve natural language descriptions when automatically generating infographics to tell stories. With the advances of deep learning technology, some works leverage generative adversarial networks (GAN) [71] to Fig. 13. Coupling from text about NBA game report to visualizations for storytelling [161]. synthesizes layouts [138], [292]. Those learning-based methods do not rely on handcrafted features, however, they often fail to achieve comparable performance with rule-based methods due to the limited training data. Although the aforementioned systems are promising and work relatively well, they are restricted to one speciﬁc type of information respectively. Storytelling has likewise been extensively used to visualize narrative text produced by online writers and journalism media. Metoyer et al. [161] proposed a novel approach to generate data-rich stories. It ﬁrst extracts narrative components (who, what, when, where) from the text, and then generates narrative visualizations by integrating the supporting data evidence with the text. It also allows the reader to ask their own valuable questions. Illustrated with Figure 13, given the NBA game report sentence highlighted on the left side, the system generates an initialized dashboard containing various visualizations based on the detection of mentioned player (Kevin Durant), the speciﬁc quarter (4th quarter), the time series (3 minutes left), and the important event (score 14 points). Similarly, Story Analyzer [164] extracts subjects, actions, and objects from the narrative text to produce interrelated and user-responsive dashboards. RIA [290] dynamically derives a set of layout constraints to tell spatial stories, such as ensuring visual balance and avoiding object occlusion. 10.3 Natural Language Description Generation Comparatively speaking, semantic and syntax analysis technologies may be more appropriate for natural language understanding and generation problems. In the past two decades, there has been a body of research on generating descriptions or captions for charts automatically. The majority of early works are rule-based [40], [45], [58], [165], [166]. While modern pipelines generally include a chart parsing module and a subsequent caption generation module. Chart parsing module [8], [182], [201] deconstructs the original charts and extracts useful elements such as text, axis, and lines using relevant techniques like text localization, optical character recognition, and boundary tracing. Deshpande et al. [50] proposed a novel method for chart parsing which adopts Question Answering approach to query key elements in charts. The organized elements are subsequently fed into caption generation module [143], [178] to output caption. A common shortcoming of the models above is the demand for manually designed procedures or features. For example, Al-Zaidy et al. [8] relied on pre-deﬁned templates to generate sentences and Chart-to-Text [178] needs additional annotated tabular data to describe elements in a given IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 13 11 DISCUSSION AND OPPORTUNITY By conducting a comprehensive survey under the guidance of an information visualization pipeline, we found that, as a rapidly growing ﬁeld, V-NLI faces many thorny challenges and open research opportunities. In this section, we organize these from a macro perspective into ﬁve aspects, including knowledge, model, interaction, presentation, and dataset, with some additional discussions on applications. Our target is to cover the critical gaps and emerging topics that we believe deserve more attention. Fig. 14. The automatic VQA pipeline [111] answers all three questions correctly (marked in green) and gives correct explanations. chart. Beneﬁted from the development of deep learning, several endto-end models have been proposed recently [70], [178], [186], [225], [262]. FigJAM [186] employs ResNet-50 [78] to encode chart as a whole image and OCR to encode text respectively to generate slot values, which along with image feature vector as initialization of a LSTM network to generate captions. Spreaﬁco et al. [225] exploited the encoder-decoder LSTM architecture which takes time-series data as input and generates corresponding captions. Gong et al. [70] and Obeid et al. [178] applied transformer-based model [251] for generating chart summaries. Most recently, Kim et al. [112] explored how readers gather takeaways when considering charts and captions together. Results suggest that the takeaways differ when the caption mentions visual features of differing prominence levels, which provides valuable guidance for future research. 11.1 Knowledge A major limitation of most existing V-NLI systems is the absence of domain knowledge. We have conducted an evaluation on four state-of-the-art open-source V-NLI, both academic [174], [280] and commercial [2], [5]. We found that none of the systems can recognize that nitrogen dioxide and NO2 have the same meaning, nor can they recognize the relationship between age and birth year. Therefore, the support of domain knowledge at the bottom layer is crucial, regardless of whether the upper layer adopts a rule-based or learning-based method. CogNet [252], a knowledge base dedicated to integrating linguistic knowledge (FrameNet), world knowledge (YAGO, Freebase, DBpedia, and Wikidata), and commonsense knowledge (ConceptNet), will be useful for broadening the repertoire of utterances supported. Researchers can wholly or partly integrate relevant universal or vertical domain knowledge bases into their system, adding depth and meaning to the features provided. 10.4 Visual Question Answering Visual Question Answering (VQA) is a semantic understanding task that aims to answer questions based on a given visualization and possibly along with the informative text. In this subsection, we only focus on VQA with charts or infographics, rather than images in the computer vision community. Generally, visualizations and questions are respectively encoded and then fused together to generate answers [157]. For visualizations, Kim et al. [111] leveraged the semantic parsing model [180], [288] to develop an automatic chart question answering pipeline with visual explanations describing how the answer was produced. Figure 14 shows that the pipeline answers all three questions correctly (marked in green) and gives correct explanations of how it obtained the answer. While for infographics, the OCR module additionally needs to identify and tokenize the text which serves as facts for answer generation [31], [98], [99], [157], [224]. Later models employ more sophisticated fusion modules, among which the attention mechanism has reached great success. LEAF-Net [31] comprises chart parsing, question, and answer encoding according to chart elements followed by an attention network is proposed. STL-CQA [224] applies a structural transformer-based learning approach that emphasizes exploiting the structural properties of charts. Recently, with the development of multimodal Transformer [139], [239], Mathew et al. [157] proposed a pipeline where question and infographic are mapped into the same vector space and simply added as the input to stacks of Transformer layers. Besides, they also delivered a summary of VQA datasets for reference. In this community, VQA for infographic account for the majority, and most works are rule-based. However, people refer to the visual features of visualizations using various words. The rule-based approach may fail to detect synonyms for these features. So a more generalizable model is needed. 11.2 Model 11.2.1 Application of more advanced NLP models The performance of V-NLI depends to a great extent on NLP models. As shown in Table 3 (Column NLP Toolkit or Technology and Column Recommendation Algorithm), most of the existing V-NLI systems just apply hand-crafted grammar rules or typical NLP toolkits for convenience. Recently, several state-of-the-art NLP models have reached close-to-human intelligence in various tasks (e.g., text classiﬁcation, paraphrase generation, and question answering), such as ELMO [181], BERT [51], GPT-3 [25], and CPM-2 [289]. Several works have applied the advances for data visualization [254], [267], however, few works have applied them in V-NLI. During our survey, we also found that existing works provide limited support for free-form NL input. To construct a more robust system, a promising direction is to apply more advanced NLP models to learn universal grammar patterns from a large corpus of conversations in visual analysis, improving the performance of various stages like task inference and data attribute inference. In the process, we believe that the existing high-quality text datasets will help train and evaluate robust NLP models for V-NLI. Besides, the community should focus more on the end-to-end approach since it can directly map natural language and dataset to visualizations and supports more complex data transformations [53], [149], [294]. To address this issue, advanced neural machine translation models [44] can be applied with various optimization schemes. NLP models with multiple languages queries also deserve attention. 11.2.2 Deep interpretation of dataset semantics Semantic information plays an important role in V-NLI. The stateof-the-art systems have considered leveraging semantic parsing toolkit like SEMPRE [288] to parse NL queries [111], [280]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 14 TABLE 8 Summary of existing V-NLI datasets. Name Kim et al. [111] Quda [62] NLV [231] nvBench [150] Publication CHI’20 arXiv’20 CHI’21 SIGMOD’21 NL queries 629 14,035 893 25,750 Data tables 52 36 3 780 Benchmark × × Other Contributions VQA with explanations Three Quda applications Characterization of utterances NL2SQL-to-NL2VIS and SEQ2VIS Website https://github.com/dhkim16/VisQA- release https://freenli.github.io/quda/ https://nlvcorpus.github.io/ https://github.com/TsinghuaDatabaseGroup/nvBench However, just considering the semantics of the NL query is limited, and the semantics of the dataset should also be taken into consideration. Besides, the existing technologies for data attribute matching only conﬁne to the letter-matching level but do not deep into the semantic-matching level, as described in Section 4.3. For example, when the analyzed dataset is about movies, a practical system should be able to recognize that the Name attribute in the dataset refers to the movie name, and automatically associate it with other attributes appearing in the query. A promising way to augment semantic interpretation ability is to connect with recent semantic data type detection models for visualization like Sherlock [92], Sato [284], ColNet [32], Meimei [242], TURL [49], DCOM [155], EXACTA [272], and Doduo [240]. Incorporating such connection will help better infer attribute types upon initialization and may also reduce the need for developers to manually conﬁgure attribute aliases. Besides, the aforementioned models are limited to a ﬁxed set of semantic types. An additional essential work is to extend the existing semantic data type detection models to support more semantic types. Additionally, with deep interpretation of dataset semantics contextually, supporting queries for multi-table schema would be an interesting research hotspot. 11.3 Interaction 11.3.1 Richer prompt to improve system discoverability The user may not be aware of what input is valid and what chart types are supported by the system with open-ended textboxes. As described in Section 8.2, discoverability has received relatively little attention in current V-NLI compared to other characteristics. The existing practice has revealed that real-time prompts can effectively help the user better understand system features and correct input errors in time when typing queries [208], [226], [280] (See Table 3 (Column Autocompletion)). The most common method is template-based text autocompletion [280], however, this method is relatively monotonous in practice and does not work for spoken commands. Several works offer the user prompt widgets of data previews to speed up the user’s input speed [208], but the supporting prompt widget types are limited. We believe that richer interactive widgets with prompts and multimodal synergistic interaction can greatly improve the usability of the system. Showing provenance of prompt behavior can also enhance the interpretability of visualization results. Concerning this issue, Setlur et al. [208] conducted a crowdsourcing studies regarding the efﬁcacy of autocompletion suggestions. The insights drawn from the studies are of great value to inspire the future design of V-NLI. In addition, although the community has conducted comprehensive research about multimodal interaction [128], there still remains an open area for the discoverability of speech-based V-NLI. The tone of voice can also provide some insights about the user’s sentiments in a voice message [146]. 11.3.2 Take advantage of the user’s interaction history The intent behind NL queries can be satisﬁed by various types of charts. These visualization results are too broad that existing systems can hardly account for varying user preferences. Although several conversational V-NLI systems have been proposed [85], [121], [207] to analyze NL queries in context, few systems have taken the user’s interaction history into account. Recently, Zehrung et al. [283] conducted a crowd-sourced study analyzing trust in human versus algorithmically generated visualization recommendations. Based on the results, they suggested that the recommendation system should be customized according to the speciﬁc user’s information search strategy. Personalized information derived from historical user interaction and context can provide a richer model to satisfy the user’s analytic tasks. And a large number of innovative models in the recommendation system [287] can be applied for reference to improve user experience. Besides, Lee et al. [129] recently deconstructed categorization in visualization recommendation. Several works [81], [211] study the associations and expectations about verbalization and visualization reported by users. Integrating these information for further modeling of the user’s interaction history is another interesting research topic. 11.4 Presentation Existing V-NLI systems mostly only support 2D visualization. Nowadays, Immersive Analytics (IA) is a quickly evolving ﬁeld that leverages immersive technologies (e.g., immersive environment or virtual reality) for data analysis [57], [61], [206]. In the visualization community, several works have augmented static visualizations with virtual content [36], [124]. Seipel et al. [203] explored NLI for software visualization with AR devices. Reinders et al. [191] studied blind and low vision (BLV) people’s preference when exploring interactive 3D printed models (I3Ms). However, there has been no systems that support natural language interface for data visualization in an immersive way. A related work for reference is that Lee et al. [126] proposed the concept of data visceralization and introduced a conceptual data visceralization pipeline in relation to the information visualization pipeline [30], as shown in Figure 15. It would be interesting to enrich this pipeline with the integration of V-NLI. What’s more, the emergence of various toolkits for immersive analytics [27], [28], [175] provides a great opportunity to expand V-NLI to immersive scenes. Besides, as described in Section 7, view transformations are rarely involved in existing systems. With development of immersive technologies, more view transformations can be explored and integrated in V-NLI to provide an immersive interactive experience. 11.5 Dataset There is a widely recognized consensus that large-scale data collection is an effective way to facilitate community development (e.g., ImageNet [48] for image processing and GEM [67] for natural language generation). Although there are various datasets for general NLP tasks, they can hardly be directly applied to provide training samples for V-NLI models. In the visualization community, several works have begun to collect large datasets for V-NLI, as shown in Table 8. To assist the deployment of IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 15 a complementary input modality for visual analytics. However, the community lacks a comprehensive survey of related works to guide follow-up research. We ﬁll the gap and the resulting survey gives a comprehensive overview of what characteristics are currently concerned and supported by V-NLI. We also propose several promising directions for future work. To our knowledge, this paper is the ﬁrst step towards reviewing V-NLI in a novel and systematic manner. We hope that this paper can better guide future research and encourage the community to think further about NLI for data visualization. Fig. 15. Conceptual data visceralization pipeline [126]. REFERENCES learning-based techniques for parsing human language, Fu et al. [62] proposed Quda, containing diverse user queries annotated with analytic tasks. Srinivasan et al. [231] conducted an online study to collect NL utterances and characterized them based on their phrasing type. VizNet [87] is a step forward in addressing the need for large-scale data and visualization repositories. Kim et al. [111] collected questions people posed about various bar charts and line charts, along with their answers and explanations to the questions. Fully considering the user’s characteristics, Kassel et al. [105] introduced a linguistically motivated two-dimensional answer space that varies in the level of both support and information to match the human language in visual analysis. The major limitation of them is that the query types contained are not rich enough. Besides, unfortunately, a benchmark or the ground truth of a given dataset is usually unavailable. Although Srinivasan et al. [231] made an initial attempt to present a collection of NL utterances with the mapping visualizations, Luo et al. [150] synthesized NL-to-Vis benchmarks by piggybacking NL-to-SQL benchmarks and produced a NL-to-Vis benchmark, nvBench, the supporting visualization types and tasks are limited. Therefore, collecting large-scale datasets and creating new benchmarks that support more tasks, domains, and interactions will be an indispensable research direction in the future. 11.6 Application Apart from facilitating data analysis, much attention has been paid to beneﬁt real-world applications by integrating visual analysis and natural language interface. Huang et al. [90] proposed a query engine to convert, store and retrieve spatial uncertain mobile trajectories via intuitive NL input. Leo John et al. [132] proposed a novel V-NLI to promote medical imaging and biomedical research. With an audio travel podcast as input, Crosscast [271] identiﬁes geographic locations and descriptive keywords within the podcast transcript through NLP and text mining techniques. The information is later used to select relevant photos from online repositories and synchronize their display to align with the audio narration. MeetingVis [216] leverages ASR and NLP techniques to promote effective meeting summaries in team-based workplaces. PathViewer [259] leverages ideas from ﬂow diagrams and NLP to visualize the sequences of intermediate steps that students take. Since V-NLI can be easily integrated as a module into the visualization system, more applications can be explored in the future. 12 CONCLUSION The past two decades have witnessed the rapid development of visualization-oriented natural language interfaces, which act as [1] Apache OpenNLP. [Online]. Available: http://opennlp.apache.org/. [2] Ask data. [Online]. Available: https://www.tableau.com/products/new- features/ask-data. [3] Google NLP. [Online]. Available: https://cloud.google.com/natural- language/. [4] IBM Watson Analytics. [Online]. Available: http://www.ibm. com/analytics/watson-analytics. [5] Microsoft Power BI. [Online]. Available: https://docs.microsoft.com/en- us/power-bi/create-reports/power-bi-tutorial-q-and-a. [6] K. Affolter, K. Stockinger, and A. Bernstein. A comparative survey of recent natural language interfaces for databases. VLDB J., 28(5), 2019. [7] A. Akbik, D. Blythe, and R. Vollgraf. Contextual String Embeddings for Sequence Labeling. In Proc. COLING’19. ACM, 2018. [8] R. A. Al-Zaidy and C. L. Giles. Automatic Extraction of Data from Bar Charts. In Proc. ICKC’15. ACM, 2015. [9] R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. In Proc. INFOVIS’05. IEEE, 2005. [10] F. Bacci, F. M. Cau, and L. D. Spano. Inspecting Data Using Natural Language Queries. Lect. Notes Comput. Sci., 12254:771–782, 2020. [11] B. Bach, Z. Wang, M. Farinella, and et al. Design Patterns for Data Comics. In Proc. CHI’18. ACM, 2018. [12] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proc. ICLR’15, 2015. [13] C. Baik, H. V. Jagadish, and Y. Li. Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In Proc. ICDE’19. IEEE, 2019. [14] F. Basik, B. Ha¨ttasch, A. Ilkhechi, and et al. DBPal: A learned NLinterface for databases. In Proc. SIGMOD’18. ACM, 2018. [15] H. Bast and E. Haussmann. More Accurate Question Answering on Freebase. In Proc. CIKM’15. ACM, 2015. [16] L. Battle, R. J. Crouser, A. Nakeshimana, and et al. The Role of Latency and Task Complexity in Predicting Visual Search Behavior. IEEE Trans. Vis. Comput. Graph., 26(1):1246–1255, 2020. [17] L. Battle and C. Scheidegger. A Structured Review of Data Management Technology for Interactive Visualization and Analysis. IEEE Trans. Vis. Comput. Graph., 27(2):1128–1138, 2021. [18] Y. Belinkov and J. Glass. Analysis Methods in Neural Language Processing: A Survey. Trans. Assoc. Comput. Linguist., 7:49–72, 2019. [19] S. Bergamaschi, F. Guerra, M. Interlandi, and et al. Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst., 55:1–19, 2016. [20] S. Bieliauskas and A. Schreiber. A Conversational User Interface for Software Visualization. In Proc. VISSOFT’17. IEEE, 2017. [21] S. Bird. NLTK: the natural language toolkit. In Proc. COLING-ACL’06. ACL, 2006. [22] L. Blunschi, C. Jossen, D. Kossmann, and et al. SODA: Generating SQL for business users. Proc. VLDB Endow., 5(10):932–943, 2012. [23] M. Bostock, V. Ogievetsky, and J. Heer. D3: Data-Driven Documents. IEEE Trans. Vis. Comput. Graph., 17(12):2301–2309, 2011. [24] M. Brehmer and T. Munzner. A Multi-Level Typology of Abstract Visualization Tasks. IEEE Trans. Vis. Comput. Graph., 19(12), 2013. [25] T. Brown, B. Mann, N. Ryder, and et al. Language Models are Few-Shot Learners. In Proc. NeurIPS’20. MIT Press, 2020. [26] C. Bryan, K. L. Ma, and J. Woodring. Temporal Summary Images: An Approach to Narrative Visualization via Interactive Annotation Generation and Placement. IEEE Trans. Vis. Comput. Graph., 2017. [27] W. Bu¨schel, A. Lehmann, and R. Dachselt. MIRIA: A Mixed Reality Toolkit for the In-Situ Visualization and Analysis of Spatio-Temporal Interaction Data. In Proc. CHI’21. ACM, 2021. [28] P. W. S. Butcher, N. W. John, and P. D. Ritsos. VRIA: A Web-Based Framework for Creating Immersive Analytics Experiences. IEEE Trans. Vis. Comput. Graph., 27(7):3213–3225, 2020. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 16 [29] Z. Bylinskii, S. Alsheikh, S. Madan, and et al. Understanding Infographics through Textual and Visual Tag Prediction. arXiv, 2017. [30] S. Card, J. Mackinlay, and B. Shneiderman. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann, 1999. [31] R. Chaudhry, S. Shekhar, U. Gupta, and et al. LEAF-QA: Locate, encode attend for ﬁgure question answering. In Proc. WACV’20, 2020. [32] J. Chen, E. Jime´nez-Ruiz, I. Horrocks, and C. Sutton. ColNet: Embedding the Semantics of Web Tables for Column Type Prediction. In Proc. AAAI’19. AAAI, 2019. [33] S. Chen, J. Li, G. Andrienko, and et al. Supporting Story Synthesis: Bridging the Gap between Visual Analytics and Storytelling. IEEE Trans. Vis. Comput. Graph., 26(7):2499–2516, 2020. [34] Y. Chen, S. Barlowe, and J. Yang. Click2Annotate: Automated Insight Externalization with rich semantics. In Proc. VAST’10. IEEE, 2010. [35] Y. Chen, J. Yang, S. Barlowe, and D. H. Jeong. Touch2Annotate: Generating better annotations with less human effort on multi-touch interfaces. In Proc. CHI’10. ACM, 2010. [36] Z. Chen, W. Tong, Q. Wang, and et al. Augmenting Static Visualizations with PapARVis Designer. In Proc. CHI’20. ACM, 2020. [37] Z. Chen, Y. Wang, Q. Wang, and et al. Towards automated infographic design: Deep learning-based auto-extraction of extensible timeline. IEEE Trans. Vis. Comput. Graph., 26(1):917–926, 2020. [38] J. Choo, C. Lee, H. Kim, and et al. VisIRR: Visual analytics for information retrieval and recommendation with large-scale document data. In Proc. VAST’14. IEEE, 2014. [39] K. Cook, N. Cramer, D. Israel, and et al. Mixed-initiative visual analytics using task-driven recommendations. In Proc. VAST’15. IEEE, 2015. [40] M. Corio and G. Lapalme. Generation of texts for information graphics. In Proc. EWNLG’99. ACL, 1999. [41] K. Cox, R. E. Grinter, S. L. Hibino, and et al. A multi-modal natural language interface to an information visualization environment. Int. J. Speech Technol., 4(3):297–314, 2001. [42] W. Cui, X. Zhang, Y. Wang, and et al. Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements. IEEE Trans. Vis. Comput. Graph., 26(1):906–916, 2020. [43] Z. Cui, S. K. Badam, M. A. Yalc¸in, and N. Elmqvist. DataSite: Proactive visual data exploration with computation of insight-based recommendations. Inf. Vis., 18(2):251–267, 2019. [44] R. Dabre, C. Chu, and A. Kunchukuttan. A Survey of Multilingual Neural Machine Translation. ACM Comput. Surv., 53(5), 2020. [45] S. Demir, S. Carberry, and K. F. McCoy. Generating textual summaries of bar charts. In Proc. INLG’08. ACL, 2008. [46] S. Demir, S. Carberry, and K. F. McCoy. Summarizing Information Graphics Textually. Comput. Linguist., 38(3):527–574, 2012. [47] C¸ . Demiralp, P. J. Haas, S. Parthasarathy, and T. Pedapati. Foresight: Recommending visual insights. VLDB Endow., 10(12):1937–1940, 2017. [48] J. Deng, W. Dong, R. Socher, and et al. ImageNet: A large-scale hierarchical image database. In Proc. CVPR’09. IEEE, 2009. [49] X. Deng, H. Sun, A. Lees, and et al. Turl: Table understanding through representation learning. Proc. VLDB Endow., 14(3):307–319, 2020. [50] A. P. Deshpande and C. N. Mahender. Summarization of Graph Using Question Answer Approach. In Adv. Intell. Syst. Comput., pages 205–216. Springer, 2020. [51] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. NAACL’19. ACL, 2019. [52] K. Dhamdhere, K. S. McCurley, R. Nahmias, and et al. Analyza: Exploring data with conversation. In Proc. IUI’17. ACM, 2017. [53] V. Dibia and C. Demiralp. Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks. IEEE Comput. Graph. Appl., 39(5):33–46, 2019. [54] E. Dimara and C. Perin. What is Interaction for Data Visualization? IEEE Trans. Vis. Comput. Graph., 26(1):119–129, 2020. [55] R. Ding, S. Han, Y. Xu, and et al. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In Proc. SIGMOD’19. ACM, 2019. [56] J. Eisenschlos, S. Krichene, and T. Mu¨ller. Understanding tables with intermediate pre-training. In Find. Assoc. Comput. Linguist. EMNLP 2020. ACL, 2020. [57] B. Ens, B. Bach, M. Cordeil, and et al. Grand Challenges in Immersive Analytics. In Proc. CHI’21. ACM, 2021. [58] M. Fasciano and G. Lapalme. PostGraphe: a system for the generation of statistical graphics and text Important factors in the generation process. In Proc. INLG’96. ACL, 1996. [59] E. Fast, B. Chen, J. Mendelsohn, and et al. Iris: A conversational agent for complex tasks. In Proc. CHI’18. ACM, 2018. [60] L. Ferres, G. Lindgaard, and L. Sumegi. Evaluating a tool for improving accessibility to charts and graphs. In Proc. ASSETS’10. ACM, 2010. [61] A. Fonnet and Y. Prie. Survey of Immersive Analytics. IEEE Trans. Vis. Comput. Graph., 27(3):2101–2122, 2019. [62] S. Fu, K. Xiong, X. Ge, and et al. Quda: Natural Language Queries for Visual Data Analytics. arXiv. [63] J. Fulda, M. Brehmel, and T. Munzner. TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text. IEEE Trans. Vis. Comput. Graph., 22(1):300–309, 2016. [64] T. Gao, M. Dontcheva, E. Adar, and et al. Datatone: Managing ambiguity in natural language interfaces for data visualization. In Proc. UIST’15. ACM, 2015. [65] T. Gao, J. Hullman, E. Adar, and et al. NewsViews: An automated pipeline for creating custom geovisualizations for news. In Proc. CHI’14. ACM, 2014. [66] T. Ge, B. Lee, and Y. Wang. CAST: Authoring Data-Driven Chart Animations. In Proc. CHI’21. ACM, 2021. [67] S. Gehrmann, T. Adewumi, K. Aggarwal, and et al. The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics. arXiv, 2021. [68] A. Ghosh, M. Nashaat, J. Miller, and et al. A comprehensive review of tools for exploratory analysis of tabular industrial datasets. Vis. Informatics, 2(4):235–253, 2018. [69] M. Gingerich and C. Conati. Constructing models of user and task characteristics from eye gaze data for user-adaptive information highlighting. In Proc. AAAI’15. AAAI, 2015. [70] L. GONG, J. Crego, and J. Senellart. Enhanced Transformer Model for Data-to-Text Generation. In Proc. WNGT’19. ACL, 2019. [71] I. Goodfellow, J. Pouget-Abadie, M. Mirza, and et al. Generative Adversarial Nets. In Proc. NIPS’14. MIT Press, 2014. [72] D. Gotz and Z. Wen. Behavior-driven visualization recommendation. In Proc. IUI’09. ACM, 2009. [73] B. J. Grosz and C. L. Sidner. Attention, intentions, and the structure of discourse. Comput. Linguist., 12(3):175–204, 1986. [74] J. Gu, Z. Lu, H. Li, and V. O. Li. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proc. ACL’16. ACL, 2016. [75] J. Guo, Z. Zhan, Y. Gao, and et al. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. In Proc. ACL’19. ACL, 2019. [76] I. Gur, S. Yavuz, Y. Su, and X. Yan. DialSQL: Dialogue Based Structured Query Generation. In Proc. ACL’18. ACL, 2018. [77] C. Harris, R. A. Rossi, S. Malik, and et al. Insight-centric Visualization Recommendation. arXiv, 2021. [78] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In IEEE, editor, Proc. CVPR’16. IEEE, 2016. [79] M. Hearst and M. Tory. Would You Like A Chart With That? Incorporating Visualizations into Conversational Interfaces. In Proc. VIS’19. IEEE, 2019. [80] M. Hearst, M. Tory, and V. Setlur. Toward Interface Defaults for Vague Modiﬁers in Natural Language Interfaces for Visual Analysis. In Proc. VIS’19. IEEE, 2019. [81] R. Henkin and C. Turkay. Words of Estimative Correlation: Studying Verbalizations of Scatterplots. IEEE Trans. Vis. Comput. Graph., 2020. [82] J. Herzig, P. K. Nowak, T. Mu¨ller, and et al. TaPas: Weakly Supervised Table Parsing via Pre-training. In Proc. ACL’20. ACL, 2020. [83] M. Honnibal and I. Montani. spacy 2: Natural language understanding with bloom embeddings. Convolutional Neural Networks Increm. Parsing, 2017. [84] A. K. Hopkins, M. Correll, and A. Satyanarayan. VisuaLint: Sketchy In Situ Annotations of Chart Construction Errors. Comput. Graph. Forum, 39(3):219–228, 2020. [85] E. Hoque, V. Setlur, M. Tory, and I. Dykeman. Applying Pragmatics Principles for Interaction with Visual Analytics. IEEE Trans. Vis. Comput. Graph., 24(1):309–318, 2018. [86] K. Hu, M. A. Bakker, S. Li, and et al. VizML: A machine learning approach to visualization recommendation. In Proc. CHI’19. ACM. [87] K. Hu, S. S. Gaikwad, M. Hulsebos, and et al. VizNet: Towards a large-scale visualization learning and benchmarking repository. In Proc. CHI’19. ACM, 2019. [88] K. Hu, D. Orghian, and C. Hidalgo. DIVE: A mixed-initiative system supporting integrated data exploration workﬂows. In Proc. HILDA’2018. ACM, 2018. [89] B. Huang, G. Zhang, and P. C.-Y. Sheu. A Natural Language Database Interface Based on a Probabilistic Context Free Grammar. In Proc. WSCS’08. IEEE, 2008. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 17 [90] Z. Huang, Y. Zhao, W. Chen, and et al. A Natural-language-based Visual Query Approach of Uncertain Human Trajectories. IEEE Trans. Vis. Comput. Graph., 26(1):1–11, 2019. [91] J. Hullman, N. Diakopoulos, and E. Adar. Contextiﬁer: Automatic generation of annotated stock visualizations. In Proc. CHI’13. ACM. [92] M. Hulsebos, A. Satyanarayan, K. Hu, and et al. Sherlock: A deep learning approach to semantic data type detection. In Proc. KDD’19. ACM, 2019. [93] R. Ingria, R. Sauri, J. Pustejovsky, and et al. TimeML: Robust Speciﬁcation of Event and Temporal Expressions in Text. New Dir. Quest. answering, 3:28–34, 2003. [94] S. Iyer, I. Konstas, A. Cheung, and et al. Learning a Neural Semantic Parser from User Feedback. In Proc. ACL’17. ACL, 2017. [95] S. Jiang, P. Kang, X. Song, and et al. Emerging Wearable Interfaces and Algorithms for Hand Gesture Recognition: A Survey. IEEE Rev. Biomed. Eng., pages 1–11, 2021. [96] M. Joshi, D. Chen, Y. Liu, and et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Trans. Assoc. Comput. Linguist., 8:64–77, 2020. [97] M. Joshi, O. Levy, L. Zettlemoyer, and D. Weld. BERT for Coreference Resolution: Baselines and Analysis. In Proc. EMNLP’19. ACL, 2019. [98] K. Kaﬂe, B. Price, S. Cohen, and C. Kanan. DVQA: Understanding Data Visualizations via Question Answering. In Proc. CVPR’18. IEEE, 2018. [99] S. E. Kahou, V. Michalski, A. Atkinson, and et al. FigureQA: An Annotated Figure Dataset for Visual Reasoning. In Proc. ICLR’18, 2018. [100] S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual speciﬁcation of data transformation scripts. In Proc. CHI’11. ACM, 2011. [101] S. Kandel, R. Parikh, A. Paepcke, and et al. Proﬁler: Integrated statistical analysis and visualization for data quality assessment. In Proc. AVI’12. ACM, 2012. [102] E. Kandogan. Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations. In Proc. VAST’12. IEEE, 2012. [103] J. Kang, K. Condiff, S. Chang, and et al. Understanding How People Use Natural Language to Ask for Recommendations. In Proc. RecSys’17. ACM, 2017. [104] J. F. Kassel and M. Rohs. Valletto: A multimodal interface for ubiquitous visual analytics. In Proc. CHI EA’18. ACM, 2018. [105] J. F. Kassel and M. Rohs. Talk to me intelligibly: Investigating an answer space to match the user’s language in visual analysis. In Proc. DIS’19. ACM, 2019. [106] T. Kato, M. Matsushita, and E. Maeda. Answering it with charts: dialogue in natural language and charts. In Proc. COLING’02. ACM, 2002. [107] P. Kaur, M. Owonibi, and B. Koenig-Ries. Towards visualization recommendation-a semi-automated domain-speciﬁc learning approach. CEUR Workshop Proc., 1366:30–35, 2015. [108] S. Kerpedjiev, G. Carenini, S. F. Roth, and J. D. Moore. AutoBrief: a multimedia presentation system for assisting data analysis. Comput. Stand. Interfaces, 18(6-7):583–593, 1997. [109] N. Kerracher and J. Kennedy. Constructing and Evaluating Visualisation Task Classiﬁcations: Process and Considerations. Comput. Graph. Forum, 36(3):47–59, 2017. [110] A. Key, B. Howe, D. Perry, and C. Aragon. VizDeck: Self-organizing dashboards for visual analytics. In Proc. SIGMOD’12. ACM, 2012. [111] D. H. Kim, E. Hoque, and M. Agrawala. Answering Questions about Charts and Generating Visual Explanations. In Proc. CHI’20. ACM. [112] D. H. Kim, V. Setlur, and M. Agrawala. Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line Charts. In Proc. CHI’21. ACM, 2021. [113] H. Kim, J. Oh, Y. Han, and et al. Thumbnails for Data Stories: A Survey of Current Practices. In Proc. VIS’19. IEEE, 2019. [114] Y. Kim and J. Heer. Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Comput. Graph. Forum, 37(3):157–167, 2018. [115] Y. Kim and J. Heer. Gemini: A Grammar and Recommender System for Animated Transitions in Statistical Graphics. IEEE Trans. Vis. Comput. Graph., 27(2):485–494, 2021. [116] R. Kincaid and G. Pollock. Nicky: Toward a Virtual Assistant for Test and Measurement Instrument Recommendations. In Proc. ICSC’17. IEEE, 2017. [117] Y. Kirstain, O. Ram, and O. Levy. Coreference Resolution without Span Representations. arXiv, 2021. [118] V. Kocijan, A.-M. Cretu, O.-M. Camburu, and et al. A Surprisingly Robust Trick for the Winograd Schema Challenge. Proc. 57th Annu. Meet. Assoc. Comput. Linguist. ACL’19, pages 4837–4842, 2019. [119] N. Kong and M. Agrawala. Graphical Overlays: Using Layered Elements to Aid Chart Reading. IEEE Trans. Vis. Comput. Graph., 18(12), 2012. [120] A. Kumar, J. Aurisano, B. Di Eugenio, and et al. Multimodal Coreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identiﬁcation. In Proc. SEMDIAL’17. ISCA. [121] A. Kumar, J. Aurisano, B. Di Eugenio, and et al. Towards a dialogue system that supports rich visualizations of data. In Proc. SIGDIAL’16. ACL, 2016. [122] C. Lai, Z. Lin, R. Jiang, and et al. Automatic Annotation Synchronizing with Textual Description for Visualization. In Proc. CHI’20. ACM, 2020. [123] S. Lalle, D. Toker, and C. Conati. Gaze-Driven Adaptive Interventions for Magazine-Style Narrative Visualizations. IEEE Trans. Vis. Comput. Graph., 27(6):2941–2952, 2019. [124] R. Langner, M. Satkowski, W. Bu¨schel, and R. Dachselt. MARVIS: Combining Mobile Devices and Augmented Reality for Visual Data Analysis. In Proc. CHI’21. ACM, 2021. [125] R. Lebret, D. Grangier, and M. Auli. Neural Text Generation from Structured Data with Application to the Biography Domain. In Proc. EMNLP’16. ACL, 2016. [126] B. Lee, D. Brown, B. Lee, and et al. Data Visceralization: Enabling Deeper Understanding of Data Using Virtual Reality. IEEE Trans. Vis. Comput. Graph., 27(2):1095–1105, 2021. [127] B. Lee, P. Isenberg, N. H. Riche, and S. Carpendale. Beyond Mouse and Keyboard: Expanding Design Considerations for Information Visualization Interactions. IEEE Trans. Vis. Comput. Graph., 2012. [128] B. Lee, A. Srinivasan, P. Isenberg, and J. Stasko. Post-wimp interaction for information visualization. Found. Trends Human-Computer Interact., 14(1):1–95, 2021. [129] D. J.-L. Lee, V. Setlur, M. Tory, and et al. Deconstructing Categorization in Visualization Recommendation: A Taxonomy and Comparative Study. IEEE Trans. Vis. Comput. Graph., 2626:1–14, 2021. [130] K. Lee, L. He, M. Lewis, and L. Zettlemoyer. End-to-end Neural Coreference Resolution. In Proc. EMNLP’17. ACL, 2017. [131] K. Lee, L. He, and L. Zettlemoyer. Higher-Order Coreference Resolution with Coarse-to-Fine Inference. In Proc. NAACL’18. ACL, 2018. [132] R. J. Leo John, J. M. Patel, A. L. Alexander, and et al. A Natural Language Interface for Dissemination of Reproducible Biomedical Data Science. Lect. Notes Comput. Sci., 11073:197–205, 2018. [133] R. J. Leo John, N. Potti, and J. M. Patel. Ava: From data to insights through conversation. In Proc. CIDR’17, 2017. [134] F. Li and H. V. Jagadish. NaLIR: An interactive natural language interface for querying relational databases. In Proc. SIGMOD’14. ACM. [135] F. Li and H. V. Jagadish. Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow., 8(1):73–84, 2014. [136] F. Li and H. V. Jagadish. Understanding Natural Language Queries over Relational Databases. ACM SIGMOD Rec., 45(1):6–13, 2016. [137] H. Li, Y. Wang, S. Zhang, and et al. KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation. IEEE Trans. Vis. Comput. Graph., pages 1–11, 2021. [138] J. Li, J. Yang, A. Hertzmann, and et al. LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators. arXiv, 2019. [139] L. H. Li, M. Yatskar, D. Yin, and et al. VisualBERT: A Simple and Performant Baseline for Vision and Language. arXiv, 2019. [140] P. Li, L. Liu, J. Xu, and et al. Application of Hidden Markov Model in SQL Injection Detection. In Proc. COMPSAC’17. IEEE, 2017. [141] H. Lin, D. Moritz, and J. Heer. Dziban: Balancing Agency & Automation in Visualization Design via Anchored Recommendations. In Proc. CHI’20. ACM, 2020. [142] C. Liu, Y. Han, R. Jiang, and X. Yuan. ADVISor: Automatic Visualization Answer for Natural-Language Question on Tabular Data. In Proc. PaciﬁcVis’21. IEEE, 2021. [143] C. Liu, L. Xie, Y. Han, and et al. AutoCaption: An Approach to Generate Natural Language Description from Visualization Automatically. In Proc. PaciﬁcVis’20. IEEE, 2020. [144] Q. Liu, H. Jiang, Z.-H. Ling, and et al. Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge. arXiv, 2016. [145] T. Liu, X. Li, C. Bao, and et al. Data-Driven Mark Orientation for Trend Estimation in Scatterplots. In Proc. CHI’21. ACM, 2021. [146] G. Lo´pez, L. Quesada, and L. A. Guerrero. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. In Proc. AHFE’17. Springer, 2017. [147] Y. Luo, X. Qin, N. Tang, and et al. DeepEye: Creating good data visualizations by keyword search. In Proc. SIGMOD’18. ACM, 2018. [148] Y. Luo, X. Qin, N. Tang, and G. Li. Deepeye: towards automatic data visualization. In Proc. ICDE’18. IEEE, 2018. [149] Y. Luo, N. Tang, G. Li, and et al. Natural Language to Visualization by Neural Machine Translation. In Proc. VIS’21. IEEE, 2021. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 18 [150] Y. Luo, N. Tang, G. Li, and et al. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. In Proc. SIGMOD’21. ACM, 2021. [151] Y. Ma, A. K. H. Tung, W. Wang, and et al. ScatterNet: A Deep Subjective Similarity Model for Visual Analysis of Scatterplots. IEEE Trans. Vis. Comput. Graph., 26(3):1562–1576, 2020. [152] J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Trans. Graph., 5(2):110–141, 1986. [153] J. Mackinlay, P. Hanrahan, and C. Stolte. Show Me: Automatic Presentation for Visual Analysis. IEEE Trans. Vis. Comput. Graph., 13(6):1137–1144, 2007. [154] S. Madan, Z. Bylinskii, M. Tancik, and et al. Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics. arXiv, 2018. [155] S. Maji, S. S. Rout, and S. Choudhary. DCoM: A Deep Column Mapper for Semantic Data Type Detection. arXiv, 2021. [156] C. Manning, M. Surdeanu, J. Bauer, and et al. The Stanford CoreNLP Natural Language Processing Toolkit. In Proc. ACL’14. ACL, 2014. [157] M. Mathew, V. Bagal, R. P. Tito, and et al. InfographicVQA. arXiv, 2021. [158] M. Matsushita, E. Maeda, and T. Kato. An interactive visualization method of numerical data based on natural language requirements. Int. J. Hum. Comput. Stud., 60(4):469–488, 2004. [159] L. McNabb and R. S. Laramee. Survey of Surveys (SoS) - Mapping The Landscape of Survey Papers in Information Visualization. Comput. Graph. Forum, 36(3):589–617, 2017. [160] R. Metoyer, B. Lee, N. Henry Riche, and M. Czerwinski. Understanding the verbal language and structure of end-user descriptions of data visualizations. In Proc. CHI’12. ACM, 2012. [161] R. Metoyer, Q. Zhi, B. Janczuk, and W. Scheirer. Coupling story to visualization: Using textual analysis as a bridge between data and interpretation. In Proc. IUI’18. ACM, 2018. [162] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efﬁcient estimation of word representations in vector space. In Proc. ICLR’13, 2013. [163] G. A. Miller. WordNet: A Lexical Database for English. Commun. ACM, 38(11):39–41, 1995. [164] M. Mitri. Story Analysis Using Natural Language Processing and Interactive Dashboards. J. Comput. Inf. Syst., pages 1–11, 2020. [165] V. O. Mittal, J. D. Moore, G. Carenini, and S. Roth. Describing Complex Charts in Natural Language: A Caption Generation System. Comput. Linguist., 24(3):431–467, 1998. [166] P. Moraes, G. Sina, K. McCoy, and S. Carberry. Generating Summaries of Line Graphs. In Proc. INLG’14. ACL, 2014. [167] D. Moritz, C. Wang, G. L. Nelson, and et al. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Trans. Vis. Comput. Graph., 25(1):438–448, 2019. [168] T. Murillo-Morales and K. Miesenberger. AUDiaL: A Natural Language Interface to Make Statistical Charts Accessible to Blind Persons. In Lect. Notes Comput. Sci., pages 373–384. Springer, 2020. [169] B. Mutlu, E. Veas, and C. Trattner. VizRec: Recommending personalized visualizations. ACM Trans. Interact. Intell. Syst., 6(4):1–39, 2016. [170] B. Mutlu, E. Veas, C. Trattner, and V. Sabol. Towards a Recommender Engine for Personalized Visualizations. In Lect. Notes Comput. Sci., volume 9146, pages 169–182. Springer, 2015. [171] M. Nafari and C. Weaver. Augmenting Visualization with Natural Language Translation of Interaction: A Usability Study. Comput. Graph. Forum, 32(3):391–400, 2013. [172] M. Nafari and C. Weaver. Query2Question: Translating visualization interaction into natural language. IEEE Trans. Vis. Comput. Graph., 21(6):756–769, 2015. [173] A. Narechania, A. Fourney, B. Lee, and G. Ramos. DIY: Assessing the Correctness of Natural Language to SQL Systems. In Proc. IUI’21. ACM, 2021. [174] A. Narechania, A. Srinivasan, and J. Stasko. NL4DV: A Toolkit for Generating Analytic Speciﬁcations for Data Visualization from Natural Language Queries. IEEE Trans. Vis. Comput. Graph., 27(2), 2021. [175] M. Nebeling, M. Speicher, X. Wang, and et al. MRAT: The Mixed Reality Analytics Toolkit. In Proc. CHI’20. ACM, 2020. [176] N. Nihalani, S. Silakari, and M. Motwani. Natural language Interface for Database: A Brief review. Int. J. Comput. Sci., 8(2):600–608, 2011. [177] L. G. Nonato and M. Aupetit. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE Trans. Vis. Comput. Graph., 25(8):2650–2673, 2019. [178] J. Obeid and E. Hoque. Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model. In Proc. INLG’20. ACL, 2020. [179] M. Oppermann, R. Kincaid, and T. Munzner. VizCommender: Computing Text-Based Similarity in Visualization Repositories for ContentBased Recommendations. IEEE Trans. Vis. Comput. Graph., 2021. [180] P. Pasupat and P. Liang. Compositional Semantic Parsing on SemiStructured Tables. In Proc. ACL’15. ACL, 2015. [181] M. Peters, M. Neumann, M. Iyyer, and et al. Deep Contextualized Word Representations. In Proc. NAACL’18. ACL, 2018. [182] J. Poco and J. Heer. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Comput. Graph. Forum, 2017. [183] S. Pradhan, A. Moschitti, N. Xue, and et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Proc. EMNLP’12. ACL, 2012. [184] P. Qi, Y. Zhang, Y. Zhang, and et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proc. ACL’20. ACL. [185] C. Qian, S. Sun, W. Cui, and et al. Retrieve-Then-Adapt: Example-based Automatic Generation for Proportion-related Infographics. IEEE Trans. Vis. Comput. Graph., 27(2):443–452, 2021. [186] X. Qian, E. Koh, F. Du, and et al. Generating Accurate Caption Units for Figure Captioning. In Proc. WWW’21. ACM, 2021. [187] X. Qian, R. A. Rossi, F. Du, and et al. Learning to Recommend Visualizations from Data. In Proc. KDD’21. ACM, 2021. [188] X. Qian, R. A. Rossi, F. Du, and et al. Personalized Visualization Recommendation. arXiv, 2021. [189] X. Qin, Y. Luo, N. Tang, and G. Li. Making data visualization more efﬁcient and effective: a survey. VLDB J., 29(1):93–117, 2020. [190] A. Quamar, C. Lei, D. Miller, and et al. An Ontology-Based Conversation System for Knowledge Bases. In Proc. SIGMOD’20. ACM, 2020. [191] S. Reinders, M. Butler, and K. Marriott. ”Hey Model!” - Natural User Interactions and Agency in Accessible Interactive 3D Models. In Proc. CHI’20. ACM, 2020. [192] D. Ren, M. Brehmer, Bongshin Lee, and et al. ChartAccent: Annotation for data-driven storytelling. In Proc. PaciﬁcVis’17. IEEE, 2017. [193] A. Rind, W. Aigner, M. Wagner, and et al. Task Cube: A threedimensional conceptual space of user tasks in visualization design and evaluation. Inf. Vis., 15(4):288–300, 2016. [194] S. F. Roth, J. Kolojejchick, J. Mattis, and J. Goldstein. Interactive graphic design using automatic presentation knowledge. In Proc. CHI’94. ACM. [195] D. Saha, A. Floratou, K. Sankaranarayanan, and et al. ATHENA: An ontologydriven system for natural language querying over relational data stores. Proc. VLDB Endow., 9(12):1209–1220, 2016. [196] B. Saket, A. Endert, and C. Demiralp. Task-Based Effectiveness of Basic Visualizations. IEEE Trans. Vis. Comput. Graph., 25(7), 2019. [197] A. Saktheeswaran, A. Srinivasan, and J. Stasko. Touch? Speech? or Touch and Speech? Investigating Multimodal Interaction for Visual Network Exploration and Analysis. IEEE Trans. Vis. Comput. Graph., 26(6):2168–2179, 2020. [198] A. Sarikaya and M. Gleicher. Scatterplots: Tasks, Data, and Designs. IEEE Trans. Vis. Comput. Graph., 24(1):402–412, 2018. [199] A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer. Vega-Lite: A Grammar of Interactive Graphics. IEEE Trans. Vis. Comput. Graph., 23(1):341–350, 2017. [200] A. Satyanarayan, R. Russell, J. Hoffswell, and J. Heer. Reactive Vega: A Streaming Dataﬂow Architecture for Declarative Interactive Visualization. IEEE Trans. Vis. Comput. Graph., 22(1):659–668, 2016. [201] M. Savva, N. Kong, A. Chhajta, and et al. ReVision: Automated classiﬁcation, analysis and redesign of chart images. In Proc. UIST’11. ACM, 2011. [202] E. Segel and J. Heer. Narrative visualization: Telling stories with data. IEEE Trans. Vis. Comput. Graph., 16(6):1139–1148, 2010. [203] P. Seipel, A. Stock, S. Santhanam, and et al. Speak to your Software Visualization—Exploring Component-Based Software Architectures in Augmented Reality with a Conversational Interface. In Proc. VISSOFT’19. IEEE, 2019. [204] J. Sen, F. Ozcan, A. Quamar, and et al. Natural Language Querying of Complex Business Intelligence Queries. In Proc. SIGMOD’19. ACM, 2019. [205] J. Seo and B. Shneiderman. A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data. Inf. Vis., 2005. [206] M. Sereno, X. Wang, L. Besancon, and et al. Collaborative Work in Augmented Reality: A Survey. IEEE Trans. Vis. Comput. Graph., 2626:1–20, 2020. [207] V. Setlur, S. E. Battersby, M. Tory, and et al. Eviza: A natural language interface for visual analysis. In Proc. UIST’16. ACM, 2016. [208] V. Setlur, E. Hoque, D. H. Kim, and A. X. Chang. Sneak pique: Exploring autocompletion as a data discovery scaffold for supporting visual analysis. In Proc. UIST’20. ACM, 2020. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 19 [209] V. Setlur and A. Kumar. Sentiﬁers: Interpreting Vague Intent Modiﬁers in Visual Analysis using Word Co-occurrence and Sentiment Analysis. In Proc. VIS’20. IEEE, 2020. [210] V. Setlur, M. Tory, and A. Djalali. Inferencing underspeciﬁed natural language utterances in visual analysis. In Proc. IUI’19. ACM, 2019. [211] R. Sevastjanova, F. Beck, B. Ell, and et al. Going beyond Visualization: Verbalization as Complementary Medium to Explain Machine Learning Models. In Proc. VISxAI’16. IEEE, 2018. [212] S. Shekarpour, E. Marx, A.-C. Ngonga Ngomo, and S. Auer. SINA: Semantic interpretation of user queries for question answering on interlinked data. J. Web Semant., 30:39–51, 2015. [213] L. Shen, E. Shen, Z. Tai, and et al. TaskVis : Task-oriented Visualization Recommendation. In Proc. EuroVis’21. Eurographics, 2021. [214] D. Shi, Y. Shi, X. Xu, and et al. Task-Oriented Optimal Sequencing of Visualization Charts. In Proc. VDS’19. IEEE, 2019. [215] D. Shi, X. Xu, F. Sun, and et al. Calliope: Automatic Visual Data Story Generation from a Spreadsheet. IEEE Trans. Vis. Comput. Graph., 27(2):453–463, 2021. [216] Y. Shi, C. Bryan, S. Bhamidipati, and et al. MeetingVis: Visual Narratives to Assist in Recalling Meeting Context and Content. IEEE Trans. Vis. Comput. Graph., 24(6):1918–1929, 2018. [217] X. Shu, A. Wu, J. Tang, and et al. What Makes a Data-GIF Understandable? IEEE Trans. Vis. Comput. Graph., 27(2):1492–1502, 2021. [218] N. Siddiqui and E. Hoque. ConVisQA: A Natural Language Interface for Visually Exploring Online Conversations. In Proc. IV’20. IEEE, 2020. [219] T. Siddiqui, P. Luh, Z. Wang, and et al. ShapeSearch: Flexible patternbased querying of trend line visualizations. Proc. VLDB Endow., 11(12):1962–1965, 2018. [220] T. Siddiqui, P. Luh, Z. Wang, and et al. From Sketching to Natural Language: Expressive Visual Querying for Accelerating Insight. SIGMOD Rec., 50(1):51–58, 2021. [221] S. Sigtia, E. Marchi, S. Kajarekar, and et al. Multi-Task Learning for Speaker Veriﬁcation and Voice Trigger Detection. In Proc. ICASSP’20. IEEE, 2020. [222] A. Simitsis, G. Koutrika, and Y. Ioannidis. Pre´cis: from unstructured keywords as queries to structured databases as answers. VLDB J., 17(1):117–149, 2007. [223] S. Sinclair, R. Miikkulainen, and S. Sinclair. Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory, volume 73. MIT Press, 1997. [224] H. Singh and S. Shekhar. STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering. In Proc. EMNLP’20. ACL, 2020. [225] A. Spreaﬁco and G. Carenini. Neural Data-Driven Captioning of TimeSeries Line Charts. In Proc. AVI’20. ACM, 2020. [226] A. Srinivasan, M. Dontcheva, E. Adar, and S. Walker. Discovering natural language commands in multimodal interfaces. In Proc. IUI’19. ACM, 2019. [227] A. Srinivasan, S. M. Drucker, A. Endert, and J. Stasko. Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE Trans. Vis. Comput. Graph., 25(1):672–681, 2019. [228] A. Srinivasan, S. M. Drucker, A. Endert, and J. Stasko. Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication. IEEE Trans. Vis. Comput. Graph., 25(1), 2019. [229] A. Srinivasan, B. Lee, N. Henry Riche, and et al. InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices. In Proc. CHI’20. ACM, 2020. [230] A. Srinivasan, B. Lee, and J. T. Stasko. Interweaving Multimodal Interaction with Flexible Unit Visualizations for Data Exploration. IEEE Trans. Vis. Comput. Graph., 14(8):1–15, 2020. [231] A. Srinivasan, N. Nyapathy, B. Lee, and et al. Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. In Proc. CHI’21, 2021. [232] A. Srinivasan and V. Setlur. Snowy:Recommending Utterances for Conversational Visual Analysis. In Proc. UIST’21. ACM, 2021. [233] A. Srinivasan and J. Stasko. Natural Language Interfaces for Data Analysis with Visualization: Considering What Has and Could Be Asked. In Proc. EuroVis’17. EG, 2017. [234] A. Srinivasan and J. Stasko. Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks. IEEE Trans. Vis. Comput. Graph., 24(1):511–521, 2018. [235] A. Srinivasan and J. Stasko. How to ask what to say?: Strategies for evaluating natural language interfaces for data visualization. IEEE Comput. Graph. Appl., 40(4):96–103, 2020. [236] B. Steichen, G. Carenini, and C. Conati. User-adaptive information visualization - Using eye gaze data to infer visualization tasks and user cognitive abilities. In Proc. IUI’13. ACM, 2013. [237] C. Stolte, D. Tang, and P. Hanrahan. Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Trans. Vis. Comput. Graph., 8(1):52–65, 2002. [238] N. Stylianou and I. Vlahavas. A neural Entity Coreference Resolution review. Expert Syst. Appl., 168(114466):1–20, 2021. [239] W. Su, X. Zhu, Y. Cao, and et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. arXiv, 2019. [240] Y. Suhara, J. Li, Y. Li, and et al. Annotating Columns with Pre-trained Language Models. arXiv, 2021. [241] Y. Sun, J. Leigh, A. Johnson, and S. Lee. Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations. In Lect. Notes Comput. Sci. springer, 2010. [242] K. Takeoka, M. Oyamada, S. Nakadai, and T. Okadome. Meimei: An Efﬁcient Probabilistic Approach for Semantically Annotating Tables. In Proc. AAAI’19. AAAI, 2019. [243] J. R. Thompson, Z. Liu, and J. Stasko. Data Animator: Authoring Expressive Animated Data Graphics. In Proc. CHI’21. ACM, 2021. [244] C. Tong, R. Roberts, R. Borgo, and et al. Storytelling and visualization: An extended survey. Inf., 9(3):1–42, 2018. [245] M. Tory and V. Setlur. Do What I Mean, Not What I Say! Design Considerations for Supporting Intent and Context in Analytical Conversation. In Proc. VAST’19. IEEE, 2019. [246] B. Tversky, J. B. Morrison, and M. Betrancourt. Animation: Can it facilitate? Int. J. Hum. Comput. Stud., 57(4):247–262, 2002. [247] A. Van Dam. Post-WIMP User Interfaces. Commun. ACM, 40(2):63–67, 1997. [248] S. Van Den Elzen and J. J. Van Wijk. Small multiples, large singles: A new approach for visual data exploration. Comput. Graph. Forum, 32(3 PART2):191–200, 2013. [249] M. Vartak, S. Madden, A. Parameswaran, and N. Polyzotis. SEEDB: Automatically generating query visualizations. Proc. VLDB Endow., 7(13):1581–1584, 2014. [250] M. Vartak, S. Rahman, S. Madden, and et al. SEEDB: Efﬁcient datadriven visualization recommendations to support visual analytics. Proc. VLDB Endow., 8(13):2182–2193, 2015. [251] A. Vaswani, N. Shazeer, N. Parmar, and et al. Attention Is All You Need. In Proc. NIPS’17. NIPS, 2017. [252] C. Wang, Y. Chen, Z. Xue, and et al. CogNet: Bridging Linguistic Knowledge, World Knowledge andCommonsense Knowledge. In Proc. AAAI’21. AAAI, 2020. [253] C. Wang, Y. Feng, R. Bodik, and et al. Falx: Synthesis-Powered Visualization Authoring. In Proc. CHI’21. ACM, 2021. [254] Q. Wang, Z. Chen, Y. Wang, and H. Qu. A Survey on ML4VIS: Applying Machine Learning Advances to Data Visualization. arXiv, 2021. [255] W. Wang, Y. Tian, H. Wang, and W.-S. Ku. A Natural Language Interface for Database: Achieving Transfer-learnability Using Adversarial Method for Question Understanding. In Proc. ICDE’20. IEEE, 2020. [256] Y. Wang, F. Han, L. Zhu, and et al. Line Graph or Scatter Plot? Automatic Selection of Methods for Visualizing Trends in Time Series. IEEE Trans. Vis. Comput. Graph., 24(2):1141–1154, 2018. [257] Y. Wang, Z. Sun, H. Zhang, and et al. DataShot: Automatic Generation of Fact Sheets from Tabular Data. IEEE Trans. Vis. Comput. Graph., 26(1):895–905, 2020. [258] Y. Wang, Z. Sun, H. Zhang, and et al. DataShot: Automatic Generation of Fact Sheets from Tabular Data. IEEE Trans. Vis. Comput. Graph., 26(1):895–905, 2020. [259] Y. Wang, W. M. White, and E. Andersen. PathViewer: Visualizing Pathways through Student Data. In Proc. CHI’17. ACM, 2017. [260] Y. Wang, H. Zhang, H. Huang, and et al. InfoNice: Easy Creation of Information Graphics. In Proc. CHI’18. ACM, 2018. [261] Z. Wang, L. Sundin, D. Murray-Rust, and B. Bach. Cheat Sheets for Data Visualization Techniques. In Proc. CHI’20. ACM, 2020. [262] S. Wiseman, S. Shieber, and A. Rush. Challenges in Data-to-Document Generation. In Proc. EMNLP’17. ACL, 2017. [263] G. Wohlgenannt, D. Mouromtsev, D. Pavlov, and et al. A Comparative Evaluation of Visual and Natural Language Question Answering over Linked Data. In Proc.K’19. SCITEPRESS, 2019. [264] K. Wongsuphasawat, D. Moritz, A. Anand, and et al. Towards a generalpurpose query language for visualization recommendation. In Proc. HILDA’16. ACM, 2016. [265] K. Wongsuphasawat, D. Moritz, A. Anand, and et al. Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations. IEEE Trans. Vis. Comput. Graph., 22(1):649–658, 2016. [266] K. Wongsuphasawat, Z. Qu, D. Moritz, and et al. Voyager 2: Augmenting visual analysis with partial view speciﬁcations. In Proc. CHI’17. ACM. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, XX 2021 20 [267] A. Wu, Y. Wang, X. Shu, and et al. AI4VIS: Survey on Artiﬁcial Intelligence Approaches for Data Visualization. IEEE Trans. Vis. Comput. Graph., pages 1–20, 2021. [268] A. Wu, Y. Wang, M. Zhou, and et al. MultiVision: Designing Analytical Dashboards with Deep Learning Based Recommendation. IEEE Trans. Vis. Comput. Graph., pages 1–11, 2021. [269] A. Wu, L. Xie, B. Lee, and et al. Learning to Automate Chart Layout Conﬁgurations Using Crowdsourced Paired Comparison. In Proc. CHI’21. ACM, 2021. [270] H. Xia. Crosspower: Bridging graphics and linguistics. In Proc. UIST’20. ACM, 2020. [271] H. Xia, J. Jacobs, and M. Agrawala. Crosscast: Adding Visuals to Audio Travel Podcasts. In Proc. UIST’20. ACM, 2020. [272] Y. Xian, H. Zhao, T. Y. Lee, and et al. EXACTA: Explainable Column Annotation. In Proc. KDD’21. ACM, 2021. [273] X. Xu, C. Liu, and D. Song. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning. arXiv, 2017. [274] S. Yagcioglu, A. Erdem, E. Erdem, and N. Ikizler-Cinbis. RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes. In Proc. EMNLP’18. ACL, 2018. [275] N. Yaghmazadeh, Y. Wang, I. Dillig, and T. Dillig. SQLizer: query synthesis from natural language. Proc. ACM Program. Lang., 2017. [276] Z. Yang, P. Blunsom, C. Dyer, and W. Ling. Reference-Aware Language Models. In Proc. EMNLP’17. ACL, 2017. [277] T. Young, D. Hazarika, S. Poria, and E. Cambria. Recent Trends in Deep Learning Based Natural Language Processing. IEEE Comput. Intell. Mag., 13(3):55–75, 2018. [278] Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe. Data@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction. In Proc. CHI’21. ACM, 2021. [279] B. Yu and C. T. Silva. VisFlow - Web-based Visualization Framework for Tabular Data with a Subset Flow Model. IEEE Trans. Vis. Comput. Graph., 23(1):251–260, 2017. [280] B. Yu and C. T. Silva. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataﬂow System. IEEE Trans. Vis. Comput. Graph., 26(1):1–11, 2020. [281] T. Yu, R. Zhang, K. Yang, and et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proc. EMNLP’18. ACL, 2018. [282] L. Yuan, Z. Zhou, J. Zhao, and et al. InfoColorizer: Interactive Recommendation of Color Palettes for Infographics. IEEE Trans. Vis. Comput. Graph., 2626:1–15, 2021. [283] R. Zehrung, A. Singhal, M. Correll, and L. Battle. Vis Ex Machina: An Analysis of Trust in Human versus Algorithmically Generated Visualization Recommendations. In Proc. CHI’21. ACM, 2021. [284] D. Zhang, Y. Suhara, J. Li, and et al. Sato: Contextual semantic type detection in tables. Proc. VLDB Endow., 13(11):1835–1848, 2020. [285] H. Zhang, Y. Song, Y. Song, and D. Yu. Knowledge-aware Pronoun Coreference Resolution. In Proc. ACL’19. ACL, 2019. [286] H. Zhang, X. Zhao, and Y. Song. A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution. arXiv, 2020. [287] S. Zhang, L. Yao, A. Sun, and Y. Tay. Deep Learning Based Recommender System. ACM Comput. Surv., 52(1):1–38, 2019. [288] Y. Zhang, P. Pasupat, and P. Liang. Macro Grammars and Holistic Triggering for Efﬁcient Semantic Parsing. In Proc. EMNLP’17. ACL. [289] Z. Zhang, Y. Gu, X. Han, and et al. CPM-2: Large-scale Cost-effective Pre-trained Language Models. arXiv, 2021. [290] Zhen Wen, M. Zhou, and V. Aggarwal. An optimization-based approach to dynamic visual context management. In Proc. InfoVis’05. IEEE, 2005. [291] W. Zheng, H. Cheng, L. Zou, and et al. Natural Language Question/Answering. In Proc. CIKM’17. ACM, 2017. [292] X. Zheng, X. Qiao, Y. Cao, and R. W. H. Lau. Content-aware generative modeling of graphic design layouts. ACM Trans. Graph., 2019. [293] V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arXiv, 2017. [294] M. Zhou, Q. Li, X. He, and et al. Table2Charts: Recommending Charts by Learning Shared Table Representations. In Proc. KDD’21. ACM. [295] S. Zhu, G. Sun, Q. Jiang, and et al. A survey on automatic infographics and visualization recommendations. Vis. Informatics, 4(3):24–40, 2020. [296] F. O˝ zcan, A. Quamar, J. Sen, and et al. State of the Art and Open Challenges in Natural Language Interfaces to Data. In Proc. SIGMOD’20. ACM, 2020.