From Data Stories to Dialogues: A Randomised Controlled Trial of Generative AI Agents and Data Storytelling in Enhancing Data Visualisation Comprehension

Lixiang Yan Monash UniversityMelbourneAustralia [email protected] , Roberto Martinez-Maldonado 0000-0002-8375-1816 Monash UniversityMelbourneAustralia [email protected] , Yueqiao Jin 0009-0003-7309-4984 Monash UniversityMelbourneAustralia [email protected] , Vanessa Echeverria 0000-0002-2022-9588 Monash UniversityMelbourneAustralia Escuela Superior Politécnica del LitoralGuayaquilEcuador [email protected] , Mikaela Milesi 0009-0002-0910-9822 Monash UniversityMelbourneAustralia [email protected] , Jie Fan 0009-0000-8585-2760 Monash UniversityMelbourneAustralia [email protected] , Linxuan Zhao 0000-0001-5564-0185 Monash UniversityMelbourneAustralia [email protected] , Riordan Alfredo 0000-0001-5440-6143 Monash UniversityMelbourneAustralia [email protected] , Xinyu Li Monash UniversityMelbourneAustralia [email protected] and Dragan Gašević Monash UniversityMelbourneAustralia [email protected]

Abstract.

Generative AI (GenAI) agents offer a potentially scalable approach to support comprehending complex data visualisations, a skill many individuals struggle with. While data storytelling has proven effective, there is little evidence regarding the comparative effectiveness of GenAI agents. To address this gap, we conducted a randomised controlled study with 141 participants to compare the effectiveness and efficiency of data dialogues facilitated by both passive (which simply answer participants’ questions about visualisations) and proactive (infused with scaffolding questions to guide participants through visualisations) GenAI agents against data storytelling in enhancing their comprehension of data visualisations. Comprehension was measured before, during, and after the intervention. Results suggest that passive GenAI agents improve comprehension similarly to data storytelling both during and after intervention. Notably, proactive GenAI agents significantly enhance comprehension after intervention compared to both passive GenAI agents and standalone data storytelling, regardless of participants’ visualisation literacy, indicating sustained improvements and learning.

Generative Artificial Intelligence, Data Storytelling, Data Visualisation, Data Comprehension, Visualisation Literacy, AI Agent

^†^†ccs:Human-centered computing Empirical studies in visualization^†^†ccs:Human-centered computing User studies

1.Introduction

The significance of data in contemporary society is indisputable. As sectors like healthcare(Sun et al.,2023),manufacturing(Park et al.,2023a),education(Yang et al.,2023),and finance(Schroeder et al.,2020)increasingly embrace data-driven decision-making and advanced analytics, the ability to effectively interpret data visualisations becomes paramount. Data visualisation is a potent tool for exploring, analysing, and communicating complex datasets(Figueiras,2014;Lowe and Matthee,2020),broadly categorised intoexploratoryandexplanatorytypes. Exploratory visualisations commonly serve experts, such as business analysts conducting detailed market analysis(Daradkeh,2021;Bravo and Maier,2020;Martinez-Maldonado et al.,2020).Conversely, explanatory visualisations cater to broader audiences, including journalists elucidating climate change impacts and educators leveraging learning analytics dashboards to enhance student outcomes(Aurambout et al.,2013;Echeverria et al.,2017;Watson and Setlur,2015).These visualisations aim to succinctly present key insights rather than encourage extensive data exploration(Kosara and MacKinlay,2013;Rodrigues et al.,2019;Echeverria et al.,2018).Despite their importance, empirical evidence consistently shows that the general public often struggles with interpreting visual data due to low levels ofvisualisation literacy(Maltese et al.,2015;Börner et al.,2016;Donohoe and Costello,2020).This literacy gap underscores the urgent need for more accessible and comprehensible visualisation methods to democratise access to data insights.

Data storytelling has emerged as a promising approach to bridging this visualisation literacy gap and enhancing the communicative power of data visualisations. By integrating narrative elements with visual data representations, data storytelling aims to convey insights more effectively and engagingly(Schulz et al.,2013;Dykes,2015).Advocates argue that this approach can make complex data more accessible by emphasising specific takeaway messages, adding explanatory text, and highlighting key data points or findings, thereby reducing cognitive load and aiding comprehension(Segel and Heer,2010;Knaflic,2015;Gershon and Page,2001;Krum,2013;Daradkeh,2021).Emerging empirical evidence supports these claims, demonstrating that data storytelling indeed enhances the efficiency and effectiveness of comprehension tasks compared to conventional visualisations, particularly for individuals with low visualisation literacy(Shao et al.,2024;Pozdniakov et al.,2023).Yet, while data storytelling in professional sectors is believed to enhance engagement, empathy, and memory retention, research provides mixed support for these claims(Boy et al.,2015,2017;Morais et al.,2021;Liem et al.,2020;Zdanovic et al.,2022a).

A balanced assessment of data storytelling must also take into account several potential drawbacks and challenges. A potential issue is that narrative elements can sometimes oversimplify data, potentially leading to misinterpretation or omission of critical details(Fan et al.,2022;Ren et al.,2023).Conversely, overloading visualisations with too many narrative elements, especially text, can overwhelm users and result in cognitive overload, diminishing comprehension(Zdanovic et al.,2022b;Feigenbaum and Alamalhodaei,2020;Ryan,2018;Milesi and Martinez-Maldonado,2024).The effectiveness of data storytelling can also vary depending on the audience’s prior knowledge and visualisation literacy, as well as the specific context in which the data is presented(Lee et al.,2015;Echeverria et al.,2018;Ma et al.,2012;Shao et al.,2024).Moreover, effective data storytelling is a resource and knowledge-intensive process requiring a high level of data visualisation expertise and a deep understanding of the data context(Dykes,2015;Zanan and Aziz,2022).While automated and human-AI collaboration tools for data storytelling are emerging, their adaptability and scalability remain questionable, as further research and development are needed to enhance the level of automation(Fernandez-Nieto et al.,2024;Li et al.,2024b).

An alternative promising approach to improving user comprehension of data visualisations involves integrating conversational features that offer on-demand, real-time, and personalised support(Srinivasan et al.,2018;Wu et al.,2021;Chen et al.,2021;Scheers and De Laet,2021).Recent advancements in multimodal generative AI (GenAI) technologies, such as GPT-4o, make this approach increasingly viable. These technologies can generate explanations from both textual and visual inputs(Achiam et al.,2023;Ooi et al.,2023).Leveraging these capabilities, adaptable chatbots can enhance user understanding through interactive dialogues without overloading visualisations with excessive text, annotations, or labels, thereby balancing information richness and usability(Noroozi et al.,2019;Therón,2020;Segel and Heer,2010;Knaflic,2015).The advancement of retrieval-augmented generation (RAG) methodologies further improves the accuracy and contextual relevance of GenAI-generated explanations(Shuster et al.,2021;Siriwardhana et al.,2023).Additionally, autonomous AI agents based on GenAI technologies can facilitate communication between users and data visualisations by proactively guiding users through scaffolding techniques, rather than just passively responding(Wu et al.,2023;Park et al.,2023b;Oertel et al.,2020).These advancements set the stage for developing adaptive GenAI agents that assist users in comprehending the insights conveyed through data visualisations. However, despite the emergence of frameworks and system designs for such agents(Yan et al.,2024d;Ma et al.,2023),there is limited empirical evidence on the comparative effectiveness (e.g., comprehension accuracy) and efficiency (e.g., time for accurate comprehension) of GenAI agents versus data storytelling in helping people extract insights from data visualisations. Furthermore, the relationship between these different methods and individuals’ visualisation literacy remains unexplored, limiting the potential to provide personalised and targeted support.

The main contribution of this paper lies in its empirical investigation of different augmentation methods—data storytelling, passive GenAI agent, and proactive GenAI agent—and their impact on improving efficiency (measured by average success time) and effectiveness (measured by correct scores) in extracting insights from data visualisations. This comparative study also spans three phases: pre-intervention, intervention, and post-intervention, allowing for a comprehensive analysis of performance changes over time. By comparing these interventions, this paper provides valuable insights into which methods are most effective for sustaining comprehension improvements and the relative benefits and challenges of adopting each method for facilitating better data visualisation comprehension. The findings offer actionable insights for visualisation designers and analytics providers on the effectiveness of integrating data storytelling and GenAI agents in enhancing users’ ability to interpret data visualisations, effectively addressing a notable gap in the existing literature. Such insights can inform the development of practical solutions and guide stakeholders in selecting the most optimal aid, whether data storytelling or GenAI agents, tailored to different purposes and settings, thereby enhancing users’ comprehension and interpretation of data visualisations.

2.Background and Related Work

2.1.Data Visualisations and Data Stories

Over recent decades, data storytelling has emerged within data visualisation to bridge the gap between raw data and meaningful insights. Conventional visualisations, such as bar charts, line graphs, and scatter plots, present structured data visually, enabling independent data exploration and interpretation(Tufte,2001;Roberts et al.,2018;Hicks,2009).However, these methods may not fully convey deeper narratives, especially to those with limited data analysis expertise(Bravo and Maier,2020;Martinez-Maldonado et al.,2020;Börner et al.,2016;Maltese et al.,2015).Data storytelling converts raw numbers into cohesive narratives, making data understandable and actionable. This approach uses storytelling to embed facts, insights, emotions, and intentions into engaging narratives, thus giving complex data representations more meaning and actionability(Blythe,2017;Fog et al.,2005;Dimond et al.,2013).Data storytelling’s growing adoption across industries like sports analytics, education, psychology, and social work underscores its effectiveness in communicating data-driven insights(Fu and Stasko,2022;Zhi et al.,2019;Martinez-Maldonado et al.,2020;Wang et al.,2019;Lan et al.,2022;Wilkerson et al.,2021;Shan et al.,2022).Despite its significance, the formal conceptualisation of data storytelling has gained academic attention only recently.

Several frameworks and principles have been established to guide effective data storytelling.Segel and Heer (2010)integrated narrative elements with graphics to convey specific messages.Kosara and MacKinlay (2013)clarified that conventional visualisation aids data analysis, while data storytelling presents derived insights. Despite many identified patterns, the literature converges on five foundational principles for creating ’data stories’(Echeverria et al.,2017;Zdanovic et al.,2022a;Ryan,2016;Knaflic,2015):(i) Identifying an explicit goalinvolves understanding the audience’s tech savviness and subject matter expertise to ensure relevance and accessibility(Bach et al.,2018;Segel and Heer,2010;Lee et al.,2015;Dykes,2015).(ii) Removing redundant elementsdeclutters by eliminating unnecessary labels and markers to focus on critical information(Knaflic,2015;Feigenbaum and Alamalhodaei,2020;Ryan,2018).(iii) Utilising storytelling elements judiciouslymeans using annotations and cues sparingly for clarity(Kong et al.,2019;Kalyuga,2009).(iv) Capturing interestinvolves minimising irrelevant visuals while emphasising crucial components, using colours or bold elements to highlight key data points(Knaflic,2015;Feigenbaum and Alamalhodaei,2020).(v) Calling for actionrequires a coherent visualisation that guides users towards the intended action through explicit titles and clear messages(Bach et al.,2018;Ojo and Heravi,2018).These principles ensure data stories are visually appealing and effective in delivering actionable insights.

Many authors suggest data stories convey key insights more effectively than conventional visualisations.Segel and Heer (2010)noted that combining narratives with graphics offers a rich medium for presenting analysis results.Gershon and Page (2001)argued that storytelling enhances intuitive and effective communication.Daradkeh (2021)added that data storytelling can improve decision-making and business outcomes.Knaflic (2015)andKrum (2013)highlighted that such visualisations leverage human perceptual skills for recognising visual patterns, based on pre-attentive processing(Hicks,2009).Researchers likeRyan (2016),Zhang (2018),Zhang and Lugmayr (2019),andZhang et al.(2022)emphasised data storytelling’s potential for efficient communication by focusing on essential data, reducing complexity, and shortening comprehension time. Yet, empirical evidence supporting the benefits of data stories is limited and only recently emerging. For example,Pozdniakov et al.(2023)conducted an eye-tracking study with 23 higher education teachers, showing that data stories can attract attention and facilitate exploration, especially for those with low visualisation literacy. Additionally,Shao et al.(2024)conducted a comparative analysis with 103 participants, finding that data stories enhanced effectiveness and efficiency in information retrieval and insights comprehension, regardless of individuals’ visualisation literacy. However, differences between data stories and conventional visualisations in fostering empathy(Boy et al.,2015,2017;Morais et al.,2021;Liem et al.,2020;Zdanovic et al.,2022a)or supporting memorisation(Zdanovic et al.,2022a)have not been significant.

While the effectiveness of data stories in aiding user comprehension of data visualisations is increasingly recognised, concerns about the adaptiveness and scalability of this approach remain(Fernandez-Nieto et al.,2024;Li et al.,2024b).Specifically,Li et al.(2024b)conducted a systematic review of existing human-AI collaboration tools for creating data stories and concluded that further research and development are necessary to enhance the level of agency and automation in these tools. They also recommended integrating and leveraging the power of emerging large-scale AI systems, such as GenAI technologies.

2.2.Generative AI and Data Comprehension

The latest developments in GenAI have introduced new avenues for improving users’ comprehension of data visualisations(Chen et al.,2024;Yan et al.,2024d;Li et al.,2024b;Ye et al.,2024;Gupta et al.,2024;Yan et al.,2024b).Large language models (LLMs) like GPT and Llama are now capable of generating text-based content in response to user prompts, making them invaluable for summarising complex datasets and visualisations, providing explanatory narratives, and crafting data stories with minimal human oversight(Ma et al.,2023;Yan et al.,2024d;Chung et al.,2022;Dang et al.,2023).In parallel, image-generating diffusion models such as Midjourney and DALL-E have proven effective in creating visual representations from textual descriptions, aiding the intuitive understanding of data through visual means(Dibia,2023;Wu et al.,2021;Liu et al.,2022).The advent of multimodal models like GPT-4o further enhances GenAI’s ability to interpret and synthesise both textual and visual information, opening up new possibilities for more comprehensive explanations of data visualisations(Achiam et al.,2023;Chen et al.,2024;Yan et al.,2024d;Ye et al.,2024).These advancements in foundational models point towards a future where conveying insights to users through data visualisations could become more interactive and diverse, thereby enriching the methods through which data-driven narratives can be communicated(Ye et al.,2024;Wu et al.,2021;Li et al.,2024b).

The capability of multimodal GenAI to comprehend both natural and visual languages at a high-quality, zero-shot level is crucial for advancing users’ comprehension of data visualisations through interactive dialogues. Although conversational GenAI has been increasingly utilised in various sectors, including customer service, user engagement, and learning assistance(Okonkwo and Ade-Ibijola,2021;Muller et al.,2024;Ma et al.,2023;Kim et al.,2024;Hou et al.,2024),their application in supporting users’ comprehension of data visualisations has been relatively limited(Li et al.,2024b).This limitation is primarily due to the complexities involved in achieving high-quality, zero-shot understanding of visual language(Lee et al.,2023).Additionally, generating mere descriptions of visualisations could fall short of delivering meaningful data insightsZdanovic et al.(2022a); Ryan (2016); Knaflic (2015).To truly benefit users, narratives and explanations of data visualisations must be contextually anchored in the specific data scenario. Such contextual grounding is essential for aiding decision-making processes and prompting deeper insights(Milesi and Martinez-Maldonado,2024;Borges et al.,2022;Burns et al.,2020).Thus, developing comprehension tools for data visualisation necessitates using multimodal GenAI models that can effectively interpret both natural and visual languages.

The advancement of supporting infrastructures could further enhance the ability of GenAI to convey precise and contextually relevant insights for data visualisations. GenAI models have faced criticism for producing well-articulated but inaccurate content, known as hallucination(Ji et al.,2023;Leiser et al.,2024),which can mislead users. One effective method to mitigate this issue is RAG, which confines content generation to contextually relevant material(Shuster et al.,2021).For instance, relevant information can be converted into vector embeddings, which are representations that map objects like words or entities into a continuous space, capturing semantic or relational meanings(Mikolov et al.,2013).During interactions with GenAI models, this information can be dynamically retrieved through semantic search(Li et al.,2024a),typically by calculating the cosine similarity between stored embeddings and those generated from user queries. This approach has proven effective in reducing hallucinations and enhancing the domain-specific accuracy of generated outputs(Siriwardhana et al.,2023).Therefore, integrating GenAI with RAG could offer on-demand, real-time support to help users understand data visualisations by providing contextualised explanations to address any potential confusion(Yan et al.,2024d).

Furthermore, the development of GenAI agents—autonomous, adaptive entities that operate independently to achieve predefined goals without constant user input—holds promise for enhancing communication between users and data visualisations. These agents can offer a more engaging and structured experience by proactively guiding users through scaffolding techniques(Wu et al.,2023;Park et al.,2023b;Oertel et al.,2020).Scaffolding, an established educational strategy, aligns well with data storytelling’s purpose(Segel and Heer,2010;Zdanovic et al.,2022a;Ryan,2016;Knaflic,2015).Techniques often involve breaking down complex information into smaller parts, asking guiding questions, and providing feedback to help users make sense of data. For instance, similar to how an educator might ask,” What trends do you notice in this graph?” or” Why do you think this data point is an outlier?”, proactive GenAI agents could prompt users to focus on specific parts of data visualisations and offer step-by-step guidance(Gibbons,2002).This strategy effectively supports individuals in achieving a deeper understanding and mastery of a subject by breaking down complex information and offering guidance as they build their knowledge and skills(Gibbons,2002;Van de Pol et al.,2010;Kim et al.,2018).Consequently, this approach may enhance user comprehension more effectively than passive GenAI agents that merely respond to queries. Although this method seems promising and innovative system frameworks are being developed(Yan et al.,2024d;Ma et al.,2023),empirical evidence remains limited on whether such agents improve the effectiveness and efficiency of users’ comprehension of data visualisations compared to data stories.

2.3.Visualisation literacy

Effectively communicating data visualisations can be challenging, even when the visualisation and its accompanying tools, such as data stories and GenAI agents, are well-designed(Donohoe and Costello,2020).Literacy in data visualisation is essential for understanding how these different methods can improve user comprehension.Visualisation literacyrefers to the ability to accurately interpret data visualisations, efficiently extract insights, and draw informed conclusions(Boy et al.,2014;Firat et al.,2022).Research on the interaction between visualisation literacy and data storytelling is both limited and mixed. For example,Pozdniakov et al.(2023)found that teachers with lower visualisation literacy benefit the most from data storytelling enhancements, as it reduces their cognitive load. In contrast,Shao et al.(2024)found that data stories help users of all data literacy levels equally well in extracting insights. These mixed findings necessitate further research on the effects of visualisation literacy on different methods of enhancing users’ comprehension of data visualisations.

2.4.Contribution to HCI and Research Questions

This study aims to fill the gaps in the literature identified above by empirically evaluating the impact ofthree intervention conditions,– data stories, passive GenAI agents (those that can reactively respond to questions), and proactive GenAI agents (those that can proactively guide users to make sense of data visualisations by asking questions or providing suggestions)– on users’ effectiveness and efficiency in extracting insights from data visualisations. Specifically, we sought to quantitatively assess how each condition enhances accuracy and reduces the time required for insights extraction from data visualisation acrossthree intervention phases:pre-intervention, during the intervention, and post-intervention (i.e., after the intervention is removed). Including a post-intervention phase is crucial for understanding whether these supportive approaches function merely as temporary tools or also facilitate the learning of key comprehension skills in data visualisation during the process(Segel and Heer,2010;Asamoah,2022;Bahtaji,2020).This motivates the first research question:

•

RQ1:To what extent do participants’effectivenessandefficiencyin comprehending data visualisationschangefrom pre- to post-intervention phases across the three intervention conditions?

We also aimed to compare the effectiveness and efficiency of the three interventions to identify the most effective method for sustaining comprehension improvements. This between-subject comparison is critical for understanding the relative efficacy of novel methods, such as GenAI agents, compared to state-of-the-art approaches like data storytelling for enhancing data visualisation comprehension(Li et al.,2024b).This leads to the second research question:

•

RQ2:To what extent do participants’effectivenessandefficiencyin comprehending data visualisationsdifferbetween the three intervention conditions in pre-intervention, the intervention, and post-intervention phases?

Additionally, considering the important role that visualisation literacy plays in data visualisation comprehension(Firat et al.,2022;Pozdniakov et al.,2023;Shao et al.,2024),we aimed to examine the effects of visualisation literacy on the three intervention conditions through the third research question:

•

RQ3:To what extent dovisualisation literacyimpact participants’ effectiveness and efficiency in comprehending data visualisations in each of the three intervention conditions?

Finally, in addition to the three prior quantitative investigations, we also aimed to gain a deeper understanding of potential differences through qualitative explorations. This provides in-depth insights to support researchers and practitioners in choosing between different supportive methods of data visualisation by specifying their unique benefits. This led to our final research question:

•

RQ4:What are participants’experiencesandperceptionsof the intervention’s effectiveness in helping them understand data visualisations for each condition?

3.Method

This section presents the essential components of the current study, including 1) the datasets and materials, 2) the design of the interventions (data storytelling, passive, and proactive GenAI agents), 3) the experiment design and procedure, 4) the participants, and 5) the data analysis performed for each research question.

3.1.Dataset and Materials

The current study utilises a multimodal dataset to examine teamwork in healthcare simulations. The simulated learning environment (see Figure1), featuring four patient beds equipped with medical devices like oxygen masks and vital signs monitors, used advanced patient manikins that could simulate different heart rates and pulses, controlled by teaching staff via a tablet. The main goal was to assess and enhance students’ teamwork, communication, and prioritisation skills in handling clinical emergencies.

Refer to caption — Figure 1.Picture of the healthcare simulation activity with key components labelled and identities masked.

The dataset included three primary types of physical and physiological data: positioning, audio, and heart rate. Students’ x-y positioning data were collected using the Pozyx creator toolkit¹¹1https://www.pozyx.io/creator,audio data were captured using wireless headsets with unidirectional microphones for multi-channel audio recording, and heart rate data were recorded using Empatica E4 wristbands.

This dataset was selected because its visualisations have already been validated and utilised by healthcare educators and students for reflective purposes [Anonymised]. As depicted in Figure2,three different visualisations with increasing complexity were used. The first visualisation is a bar chart illustrating four prioritisation strategies demonstrated by students during the simulation. This type of visualisation is typically effective for comparing different categories(Saket et al.,2018).For example, the bar chart allows for a straightforward interpretation of the proportion of time each team spent on various prioritisation behaviours.

The second visualisation is a social network, or sociogram, mapping the communication behaviours among students and other actors in the simulation. Social network analysis is commonly used in collaborative learning and computer-supported cooperative work studies(De Laat et al.,2007;Dado and Bodemer,2017).This visualisation can help understand interaction dynamics within the team, identifying who communicated most frequently and the direction of these communications. It can also highlight interactions with the patient, the doctor, and the relative, providing insights into the roles and engagement levels of each participant.

The third visualisation is a ward map that displays students’ physical positions, verbal communication duration, and peak heart rate locations. Inspired by advanced techniques in sports analytics, such as CourtVision(Goldsberry,2012),this visualisation combines heatmaps(Gu,2022)with specific healthcare-relevant information. The heatmap component illustrates the frequency and distribution of verbal communications, with saturated colours indicating areas of high verbal activity. Additionally, the map shows the spatial distribution of students around the simulation space, providing a comprehensive view of their physical and behavioural engagement. Furthermore, the map includes the location and value of each student’s peak heart rate, indicating areas of highest physiological arousal. These visualisations collectively offer insights into prioritisation, teamwork, and communication strategies, which are challenging to comprehend without advanced visualisation literacy [Anonymised].

3.2.Intervention Design: Generative AI Agents

To investigate the impact of passive and proactive GenAI agents on participants’ ability to comprehend data visualisations, we utilised an open-sourced prototype of [Anonymised]²²2Hidden for double-blinded review.This prototype integrates multimodal GenAI (e.g., OpenAI GPT-4o³³3https://openai.com/index/hello-gpt-4o/), RAG frameworks (e.g., LangChain), and agentic system designs (e.g., Autogen(Wu et al.,2023)) to ensure the accuracy and contextual relevance of information provided by the GenAI agents. Multimodal GenAI processes both graphical and textual data, tailoring AI-generated responses specifically to the visualisations participants inquire about. RAG incorporation ensures GenAI agents access relevant and accurate knowledge for context-specific prompts(Lewis et al.,2020;Gao et al.,2023).Agentic systems(Park et al.,2023b)enable GenAI agents to address irrelevant responses generated by incomplete user prompts(White et al.,2023).As shown in Figure3,the design of the passive and proactive GenAI agents includes five main components: i) agent characteristics, ii) prompt integration, iii) knowledge database, iv) response generation, and v) exemplar behaviours.

3.2.1.Agent characteristics.

The core functionalities and interaction styles distinguish the passive and proactive GenAI agents. Passive GenAI agents are reactive, waiting for user-initiated queries before responding(Ma et al.,2023;Yan et al.,2024d).They are characterised by their ability to deliver precise and contextually relevant information without initiating unsolicited interactions. In contrast, proactive GenAI agents are more engaging and interactive(Park et al.,2023b;Oertel et al.,2020).They not only respond to user queries but also guide users through the visualisations with structured narratives and scaffolding questions (crafted by human experts; AppendixA). This dual approach allows for a comparative analysis of how different interaction styles impact user understanding and insight extraction from data visualisations.

3.2.2.Prompt integration.

Prompt integration is critical for both passive and proactive GenAI agents. For passive agents, the process starts with user-initiated queries, which are processed to generate contextually relevant responses. This involves interpreting the natural language input and matching it with the visualisation and the knowledge database using RAG. Proactive agents, however, go a step further by synthesising prompts based on user interactions and a pre-defined scaffolding narrative consisting of guiding questions that facilitate the exploration of the visualisations. They generate prompts to guide users through the visualisations, helping them understand the data more structurally and providing feedback based on participants’ responses. This is achieved through a combination of pre-stored knowledge and real-time data interpretation, ensuring accurate and tailored responses.

3.2.3.Knowledge database.

The knowledge database is the backbone of the GenAI agents, providing the necessary contextual information for accurate and relevant responses. Built using LangChain⁴⁴4https://www.langchain.com/and Chroma⁵⁵5https://www.trychroma.com/,the database is populated with task-specific materials, descriptions, and detailed explanations of visualisation components. These materials are directly drawn from the learning design documents of the healthcare simulation unit and peer-reviewed publications describing each visualisation [Anonymised]. Textual materials are converted into vector embeddings using OpenAI’s embedding model (text-embedding-ada-002), allowing for efficient semantic searches. This enables the agents to retrieve pertinent information based on the cosine similarity of the embeddings(Gao et al.,2023).The knowledge database is dynamic, evolving with each user interaction. New information from user-agent conversations is continuously ingested, enhancing the personalisation and relevance of subsequent interactions.

3.2.4.Response generation.

Response generation in both passive and proactive GenAI agents utilises the multimodal capabilities of GPT-4o. For passive agents, the process involves generating responses based on user queries by accessing the knowledge database and retrieving relevant information using RAG. The responses are designed to be precise and contextually relevant, ensuring that users receive accurate answers to their questions. Proactive agents follow pre-defined scaffolding narratives with guiding questions. Each response contains feedback on users’ prior responses and asks the user to either elaborate on the current response or move on to the next guiding question. This involves performing semantic searches across the vector embeddings to identify relevant text segments, assessing the correctness of users’ responses, and generating encouraging or corrective feedback accordingly.

3.2.5.Exemplar behaviours.

The behaviours of the passive and proactive GenAI agents are exemplified by their interaction styles (Figure4). Passive agents focus on explanation and clarity, providing users with straightforward answers to their queries. For example, when asked about the representation of a bar chart, a passive agent retrieves and presents the relevant information from the knowledge database. Proactive agents, however, offer scaffolding with guided questions and feedback. They actively engage users by asking clarifying questions, offering step-by-step explanations, and facilitating the exploration of the visualisations within a scaffolding framework (e.g., a set of guiding questions targeting each element of the visualisations)(Xun and Land,2004).For example, to facilitate users’ understanding of a social network, a proactive agent might start by asking questions about the thickness of the edges connecting two nodes. After confirming users have a clear understanding through their responses, it will then move on to questions about the direction of the edges. This scaffolding approach aims to structurally enhance user comprehension and cultivate the ability to comprehend data visualisations.

3.3.Intervention Design: Data Stories

To reliably create data visualisations enhanced with data storytelling elements, we first needed to clearly identify these storytelling techniques. We established design actions, based on the framework byZdanovic et al.(2022a),to transform the original visualisations, as presented in Figure2,into data stories (examples presented below in Figure5by adding or removing visual design elements. This process follows three main phases: i) exploring the data, ii) creating the visualisation, and iii) telling the story. While conventional visualisations typically involve only the first two phases, data stories incorporate additional storytelling elements added in the third phase(Zdanovic et al.,2022a;Echeverria et al.,2017;Ryan,2016;Knaflic,2015).These include the following main data storytelling elements:

Using the right visualisation technique:Selecting the appropriate visualisation based on the storytelling goal is critical(Knaflic,2015).This could involve switching chart types (e.g., from pie to line charts) to better illustrate trends(Tufte,2001;Few,2004).Our design process did not result in changing the chart types, as we aimed to maintain the increasing complexity from the bar chart, through the social network visualisation, to the custom-made ward map. This approach allowed us to preserve the progression of complexity across the visualisations.

Eliminating clutter:Simplifying visuals by removing unnecessary elements highlights key points and enhances readability, adhering to principles of minimising the data-ink ratio(Tufte,2001;Heer and Shneiderman,2012).

Directing attention:Emphasising important data points or trends through techniques like bold lines or contrasting colours to draw attention to key insights(Knaflic,2015;Ware,2019;Few,2004).

Adding annotations:Providing contextual or explanatory information directly on the visualisation clarifies important insights(Knaflic,2015;Hullman et al.,2013).

Adding an explanatory title:Crafting titles that summarise key findings or insights guides the viewer toward the intended interpretation(Knaflic,2015;Pozdniakov et al.,2023;Kong et al.,2018).

The process of creating visualisations enhanced with data storytelling elements was conducted collaboratively. Initially, one researcher designed a first draft of the data stories followingZdanovic et al.(2022a)’s framework. Then, a second researcher with deep expertise in data storytelling reviewed and improved the designs through discussion. The refined visualisations were further reviewed by two additional researchers, leading to a final design agreed upon by the team. This iterative process, and followingZdanovic et al.(2022a)’s framework for infusing visualisations with data storytelling elements, ensured that the visual elements were thoughtfully developed and aligned with the research objectives.

3.4.Experiment Design and Procedure

This randomised controlled study was designed following gold standards for effectiveness research(Hariton and Locascio,2018).We conducted a mixed-design (3x3) experiment, using both within-subject and between-subject comparisons to evaluate the effectiveness of data storytelling, passive GenAI agents, and proactive GenAI agents in enhancing insight extraction from data visualisations. The study was conducted using Qualtrics⁶⁶6https://www.qualtrics.com/,an online platform for designing and conducting experiments, integrated with a custom-developed website. Participants were recruited from Prolific⁷⁷7https://www.prolific.co/,compensated £8, and the study was structured to be completed within one hour. Integration between Qualtrics and Prolific streamlined participant flow, redirecting them from Prolific to the Qualtrics survey and verifying data before confirming participation, enhancing reliability. Ethics approval was obtained from [Anonymised] University (Project Number: Anonymised). Informed consent was secured from each participant.

Participants were randomly assigned to one of three intervention conditions: passive GenAI agents, proactive GenAI agents, or data storytelling. Passive GenAI agents responded to user queries without initiating further dialogue or guidance, while proactive GenAI agents guided users with scaffolding questions. The data storytelling condition presented visualisations within a narrative format, highlighting key insights and trends. Each participant interacted with their assigned condition during the intervention phase to support insight extraction from visualisations. The effectiveness of each method was evaluated through pre-, during, and post-intervention assessments.

The study followed a structured procedure to ensure consistency across conditions, comprising five main parts: i) the participants received an introduction to the study and completed demographic and background questions on Qualtrics; ii) they answered 12 items from the mini-Visualisation Literacy Assessment Test (mini-VLAT)(Pandey and Ottley,2023)to measure their visualisation literacy; iii) the participants were directed to our custom platform to complete three analytical writing tasks, each followed by six evaluation questions to assess their understanding of the visualisations; and iv) finally, the participants completed a questionnaire to provide feedback on their experience with the assigned intervention method. Figure6illustrates the experiment design and procedure. Details are elaborated below.

3.4.1.Part 1: Demographics and background questions

Participants were initially queried on demographic information, including (i) self-reported gender, (ii) age range, (iii) region, and (iv) highest level of education. Subsequently, we assessed participants’ self-reported experience in data analysis and GenAI tools. For data analysis, participants rated their experience on a scale from”None - I have little to no experience with data analysis”to”Expert - I have extensive experience and can perform data analysis tasks confidently.”Similarly, for GenAI tools, participants rated their experience from”None - I have no prior experience with generative AI tools”to”Expert - I have deep expertise and often develop or customise generative AI tools for specialised purposes.”

3.4.2.Part 2: Visualisation literacy

The 12-item mini-VLAT was used to assess participants’ visualisation literacy(Pandey and Ottley,2023).This instrument was selected due to its demonstrated reliability and validity, comparable to the full 53-item VLAT(Lee et al.,2016)while offering a more concise measure of visualisation literacy. These features made it particularly suitable for our extended study, as it minimised cognitive load on participants. The scores were corrected for guessing using the correction-for-guessing formula(Thorndike et al.,1991):

(1)

CS=\textit{R}-\textit{W}/(\textit{C}-1)

In this formula,CSrepresents the final score corrected for guessing behaviours,Rrepresents the number of correctly answered items of a given participant,Wrepresents the number of wrongly answered items, andCrepresents the number of choices per item, which is four for the mini-VLAT.

3.4.3.Part 3: Randomised controlled trial

The randomised controlled trial consisted of four main activities: i) reading contextual information about the visualisations, ii) all participants completing the same baseline analytical writing activity (Pre-intervention) where no intervention was provided, followed by six evaluation questions, iii) each participant being randomly assigned to one of the three intervention conditions (Intervention), having access to either data storytelling, passive GenAI agents, or proactive GenAI agents, followed by six evaluation questions, and iv) all participants completing another, yet different, baseline activity (Post-intervention) with the intervention removed, followed by six evaluation questions.

The visualisation format remained consistent across the pre-intervention phase, the intervention phase, and the post-intervention phase (e.g., bar chart, social network, and ward map), although the data and insights contained within these visualisations varied across these different phases. The same set of visualisations was used in all three intervention conditions. Likewise, the six evaluation questions maintained the same format, with the questions and answers tailored to the insights for each activity. Details of the contextual information, analytical writing tasks, and evaluation questions are elaborated below:

Contextual information.Participants were provided with a picture (Figure1) and a description of a healthcare simulation where two nursing students took over care of four manikin patients (Beds 1-4) during their shift. Each patient required at least two tasks, with the primary focus on the deteriorating patient in Bed 4 (Amy), while the needs of other patients (Beds 1-3) also required attention. An actor played a relative of the patient in Bed 3, frequently distracting the nurses. The simulation aimed to practice teamwork, communication, and prioritisation skills. By the end, students were expected to i) demonstrate a structured approach to patient assessment and management, ii) recognise and respond to early signs of deterioration, and iii) contribute to effective teamwork. This background was essential for participants to engage in the subsequent writing tasks effectively.

Analytical writing tasks.Participants were directed to a custom-developed website (Figure7) to complete an analytical writing task. They were asked to analyse three visualisations (bar chart, social network, and ward map) and write a 100-150 word response onhow the two nurses managed the primary patient (Amy) while attending to other beds, focusing on their task prioritisation, verbal communication, and stress levels.The website was designed with simplicity in mind(Stone et al.,2005),ensuring easy navigation and task completion.

The website had three key components. First, the display component (Figure7a) presented the visualisations one at a time to avoid information overload and visual clutter(Ellis and Dix,2007).Second, the writing space (Figure7b) allowed participants to articulate their analysis, synthesising information from the visualisations. Lastly, the instructions component (Figure7c) provided standard task instructions for the pre-intervention and post-intervention baselines and the data storytelling condition. In the passive and proactive GenAI conditions, participants interacted with GenAI agents via a chat function for additional support and guidance (Figure7d).

This design helped participants focus on the analytical writing task without distractions, facilitating clear and effective exposure to each intervention condition and providing opportunities to develop their ability to comprehend data visualisations.

Evaluation questions.After each analytical writing task, participants were asked to complete six multiple-choice questions (four choices per question; two for each visualisation). These questions were designed to evaluate their ability to comprehend data visualisations. Structured based on the first two levels of Bloom’s taxonomy, knowledge (Level 1) and comprehension (Level 2)(Bloom et al.,1984),these questions directly align with common practices in visualisation research(Arneson and Offerdahl,2018;Mnguni et al.,2016).For the knowledge questions, participants were required to identify specific data points or patterns in the visualisations, such as pinpointing the prioritisation behaviour that two nurses spent the most time on from a bar chart. These questions assessed their ability to retrieve pertinent information. For the comprehension questions, participants had to interpret and derive meaningful insights, such as comparing the spatial and verbal activities between two nurses through the ward map. These questions evaluated their ability to interpret multiple insights and identify inconsistencies or misleading information (Table1). Higher levels of Bloom’s taxonomy, such as application (Level 3), were not addressed, as the study involved online participants with limited contextual knowledge.

Two researchers familiar with the dataset and visualisations designed the six evaluation questions. A third researcher validated this design to ensure its applicability across all three sets of visualisations: pre-intervention, intervention, and post-intervention. Any discrepancies were resolved through discussion, leading to a consensus on the final evaluation questions. These questions ensured alignment with the data insights in the visualisations and the appropriate Bloom’s taxonomy levels. AppendixBprovides an example of the first set of evaluation questions for the pre-intervention condition, which share the same structure as the other two sets for the subsequent phases. While the order of the questions remained consistent across all phases, the order of the answer choices was randomised to minimise bias and improve reliability and validity(Association et al.,2014).A fifth option, ’I am not sure,’ was added to each question to reduce guessing, following recommendations in mini-VLAT and VLAT(Lee et al.,2016;Pandey and Ottley,2023).The time taken to answer each question correctly was measured to evaluate the efficiency of insights extraction, while the accuracy rate was recorded to provide insights into the effectiveness of comprehension (further elaborated in Section3.6)(Zhu,2007;Garlandini and Fabrikant,2009).

Table 1.Example level 1 and 2 evaluation questions for the bar chart.

Bloom’s Level	Evaluation Question
Knowledge (Level 1)	Which behaviour did the two nurses spend theleasttime on?
Comprehension (Level 2)	How did the nurses spend their time working on tasks for Amy compared to other tasks?

3.4.4.Part 4: Intervention experience

For the last part of the study, participants were asked to evaluate their experiences with the intervention (e.g., data storytelling, passive GenAI agents, or proactive GenAI agents) through three single-choice items and one open-ended question. Each item was rated on a five-point Likert scale ranging from strongly disagree (1) to strongly agree (5). Table2shows the items. Although the internal reliability of these single-item items can not be estimated, they have shown evident reliability in capturing individuals’ attitudes and beliefs(Wanous and Reichers,1996).The open-ended question targeted further qualitative elaborations –”Please describe your experience with the intervention. To what extent did it enhance or diminish your ability to interpret the data visualisations?”

Table 2.Intervention Experience Items

Dimension	Item
Self-efficacy	I feel more confident in my ability to interpret data visualisations after participating in the intervention.
Helpfulness	The intervention helped me better understand the data visualisations.
Satisfaction	I am satisfied with my overall experience during the intervention.

3.5.Participants

We conducted a priori power analysis for a mixed-design study with three between-subject conditions and three within-subject repeated measures using G*Power(Faul et al.,2009).The analysis indicated that a minimum sample size of 108 participants (36 per condition) was needed to detect a medium effect size (0.25) with 80% power at a 0.05 significance level. To account for an anticipated attrition rate of 15% to 20% due to potential data issues, particularly given the one-hour duration of the online study, we aimed to recruit 150 participants. Participants were required to be fluent in English and use a laptop or desktop to ensure consistent visualisation displays. We specifically recruited individuals with healthcare, medical, or nursing backgrounds to control for familiarity with the healthcare simulation context of the visualisations.

Before the main study, a pilot study with 27 participants (9 per condition) was conducted to refine the Qualtrics survey for clarity and ease of use. We also tested the custom-developed website to ensure timely responses from both passive and proactive GenAI agents. Data from the pilot study were excluded from the final analysis due to subsequent improvements in the study flow, such as mandatory reading time for each task instruction and improved clarity of several context description sentences.

For the main study, participants had to complete all three writing tasks, essential for internalising their understanding of data visualisations. Incomplete responses, including failure to complete any writing task or the entire survey, were deemed invalid. Out of the 141 responses received, 24 were invalid due to incomplete writing tasks, resulting in a final sample of 117 valid responses. Specifically, 36 participants completed the data storytelling condition, 41 completed the passive GenAI agents condition, and 40 completed the proactive GenAI agents condition.

The participants came from six different regions, with the majority coming from North America/Central America (51), Europe (28), and Africa (25). The remainder were from Australia (5), South America (4), and other regions (4). They identified as female (68), male (48), and non-binary (1). The age distribution was predominantly within the 25-34 years old range (47), followed by 18-24 years old (41), 35-44 years old (16), 45-54 years old (8), 55-64 years old (3), and 65+ years old (2). Most participants attained a Bachelor’s degree (55) or high school or equivalent education (28), while others held a Master’s degree (15), vocational training or diploma (11), Doctorate (4), or other qualifications (4). Regarding experience with data analysis, most participants were at an intermediate level (50), with a moderate level of experience. Others were beginners (36), advanced users (17), had no experience (12), or were experts (2). In terms of familiarity with GenAI tools, most participants were at an intermediate level (63), having used the tools on several occasions. Others were beginners (34), advanced users (15), experts (4), or had no prior experience (1).

3.6.Analysis

The analysis aimed to address four research questions (RQs) regarding the impacts of different interventions on enhancing participants’ effectiveness and efficiency in extracting insights from data visualisations. We first calculated a set of metrics related to the RQs using the responses and times for six evaluation questions for each task (AppendixB). These metrics were calculated for both within- and between-subject analyses, including the three intervention phases, specifically, the pre-intervention baseline (Pre), the intervention phase, and the post-intervention phase (Post), as well as for the three intervention conditions, namely data storytelling (DS), passive GenAI agents (GAI), or proactive GenAI agents (DSAI). In particular, the following metrics were calculated:

•

Correct Score:To investigate the effectiveness of insights extraction from data visualisations (RQ1-2), we computed the correct score for the six evaluation questions for each condition by summing the total number of questions that participants answered correctly (each score ranges from 0 to 6), resulting in three correct scores for each participant (Pre_score, Intervention_score, and Post_score). The correction-for-guessing formula(Thorndike et al.,1991)was applied to account for guessing in these multiple-choice questions (with four options).
•

Success Time:To investigate participants’ efficiency in data insights extraction (RQ1-2), we summed the total time they spent on answering the six evaluation questions for each condition in seconds and then divided by the corresponding correct score for that given condition, resulting in three success time metrics for each participant (Pre_time, Intervention_time, and Post_time).

In addition to the above two metrics, visualisation literacy (mini-VLAT) was calculated separately for each participant to address RQ3. This calculation was done by summing the number of correct responses. Specifically, visualisation literacy ranges from 0 to 12. For both literacy scores, the correction-for-guessing formula(Thorndike et al.,1991)was applied to account for guessing behaviours. All calculations and analyses were conducted in Python using packages including NumPy, SciPy, and Statsmodels. The following sections elaborate on the analysis for each RQ.

3.6.1.Preliminary Analysis

Before addressing each research question, we conducted a preliminary analysis to ensure that the three intervention conditions were comparable in terms of participants’ visualisation literacy (mini-VLAT), data analysis, and GenAI expertise (demographics ranging from 1–None to 5–Expert). This preliminary analysis is essential to ensure the findings for the subsequent analyses are robust and reliable(Charness et al.,2012).We used the Kruskal-Wallis test, a non-parametric test that compares the medians of more than two independent groups, to assess whether there were statistically significant differences in participants’ visualisation literacy, data analysis, and GenAI expertise(McKight and Najab,2010).A non-significant finding for each of these metrics is critical to confirm the comparability of the three intervention conditions. Additionally, boxplots were drawn to visualise the sample distribution for each condition.

3.6.2.RQ1: Within-subject Analysis

To address RQ1, we performed within-subject analyses to examine changes in participants’ effectiveness and efficiency in extracting insights from data visualisations across three repeated measures for different intervention conditions. First, we utilised the Friedman test(Sheldon et al.,1996),a non-parametric test suitable for detecting differences in scores across multiple conditions. This test was chosen because of the repeated measures design and the ordinal nature of the data, as well as violations of normality in data distribution identified using the Shapiro-Wilk W-test. The Friedman test compared median correct scores (Pre_score, Intervention_score, and Post_score) and median success times (Pre_time, Intervention_time, and Post_time) within participants across the three study phases. The effect size was calculated using Kendall’s W(Pereira et al.,2015).

Following significant results from the Friedman test, we conducted pairwise comparisons using Wilcoxon signed-rank tests(Wilcoxon et al.,1970).This non-parametric test compared two related samples to determine whether their population mean ranks differ, making it robust for handling non-normally distributed data. This allowed us to identify which specific conditions differed significantly. The Holm-Bonferroni method was used to adjust for multiple comparisons, with an initial alpha of 0.05. Effect sizes for these tests were calculated using Rank-Biserial correlation to assess the magnitude of observed differences(Kerby,2014).

3.6.3.RQ2: Between-subject Analysis

To address RQ2, we conducted between-subject analyses to compare the effectiveness and efficiency of different intervention conditions across participants. We began with the Kruskal-Wallis test, a non-parametric method suitable for this analysis due to observed violations of normality, especially for Post_score. This test compared the correct scores (Pre_score, Intervention_score, and Post_score) and success times (Pre_time, Intervention_time, and Post_time) across different intervention conditions to assess statistically significant differences between the conditions. The effect size was calculated using eta-squared.

When the Kruskal-Wallis test indicated significant differences, we used the Mann-Whitney U test to explore differences between pairs of independent groups further. This non-parametric test is appropriate for non-normally distributed data and assesses whether two independent samples come from the same distribution. The Holm–Bonferroni method was applied to adjust for multiple comparisons with an initial alpha of 0.05. We also calculated effect sizes for these tests to quantify the strength of observed differences using the Rank-Biserial correlation(Kerby,2014).

3.6.4.RQ3: Regression Analysis

For RQ3, we used ordinary least squares (OLS) regression to examine the relationship between intervention conditions and visualisation literacy on participants’ effectiveness and efficiency in extracting insights from data visualisations. This analysis considered both main effects (e.g., intervention conditions) and interaction effects (e.g., intervention conditions and visualisation literacy), holding other variables constant. The pre-intervention baseline (Pre) was excluded to focus on the effects during and after the intervention.

First, we standardised the mini-VLAT scores by subtracting the mean and dividing by the standard deviation, ensuring normal distribution. Outliers for scores (Intervention_score and Post_score) and time-related metrics (Intervention_time and Post_time) were removed using the interquartile range (IQR) method, and data were transformed using the Box-Cox transformation to stabilise variance and normalise data distribution.

For each metric (Intervention_score, Intervention_time, Post_score, Post_time), we constructed a formula to investigate the effects of visualisation literacy (mini-VLAT) across all three intervention conditions. The formula included an intercept ( $\beta_{0}$ ), main effects for the GAI condition ( $\beta_{1}$ ) and the DSAI condition ( $\beta_{2}$ ), visualisation literacy ( $\beta_{3}$ ), and interaction terms between intervention conditions and visualisation literacy ( $\beta_{4}$ and $\beta_{5}$ ), with the DS condition as the reference.

(2)		Score or Time	$\displaystyle=\beta_{0}+\beta_{1}\times\text{{condition [GAI]}}+\beta_{2}% \times\text{{condition [DSAI]}}+\beta_{3}\times\text{{mini-VLAT}}$
(2)			$\displaystyle\quad+\beta_{4}(\text{{condition [GAI]}}\times\text{{mini-VLAT}})% +\beta_{5}(\text{{condition [DSAI]}}\times\text{{mini-VLAT}})$

We fitted the models using the OLS function from the Statsmodels library. The assumptions were validated through several methods. Linearity was assessed by plotting predicted versus observed values. The normality of residuals was checked using the Shapiro-Wilk test and QQ plots. Homoscedasticity was evaluated using the Breusch-Pagan test, and the independence of residuals was assessed using the Durbin-Watson test. All assumptions were met.

3.6.5.RQ4: Experiences and Perceptions

To address RQ4, we computed the median and IQR for each of the three experience questions (Table2). These questions centred on participants’ experiences and perceptions of the interventions they received: DS, GAI, or DSAI. Boxplots were used to visually depict the differences in experiences and perceptions across the three intervention conditions. Additionally, Kruskal-Wallis tests were performed to evaluate statistical differences, using the same approach as in the analysis of RQ2. Moreover, a thematic analysis of participants’ elaborations for the perceived effectiveness of each intervention was conducted, coding all 117 responses(Braun and Clarke,2006).The initial qualitative analysis was conducted by one researcher, who identified emerging themes. These themes were then cross-verified by a second researcher and collaboratively finalised with contributions from the rest of the authors. This process ensured the reliability and validity of the thematic interpretation(Guest et al.,2012).

4.Results

4.1.Preliminary Analysis – Comparability of Intervention Conditions

To ensure the comparability of the three intervention conditions, we conducted a preliminary analysis of participants’ visualisation literacy (mini-VLAT), data analysis expertise, and GenAI expertise.

For visualisation literacy (mini-VLAT), the Kruskal-Wallis test did not reveal significant differences between the three conditions: DS (Median = 6.0, IQR = 4.0), GAI (Median = 6.67, IQR = 5.33), and DSAI (Median = 6.67, IQR = 4.0). The test results were $H(2)=0.49,n=117,p=0.784$ ,indicating that the groups were comparable in terms of visualisation literacy.

For data expertise, the Kruskal-Wallis test also did not show significant differences between the conditions: DS (Median = 3.0, IQR = 1.0), GAI (Median = 3.0, IQR = 1.0), and DSAI (Median = 3.0, IQR = 1.0). The test results were $H(2)=1.02,n=117,p=0.601$ ,confirming that the groups were comparable in terms of data analysis expertise.

For GenAI expertise, the Kruskal-Wallis test again did not reveal significant differences between the conditions: DS (Median = 3.0, IQR = 0.25), GAI (Median = 3.0, IQR = 1.0), and DSAI (Median = 3.0, IQR = 1.0). The test results were $H(2)=1.42,n=117,p=0.491$ ,indicating that the groups were comparable in terms of GenAI expertise.

These non-significant findings in visualisation literacy, data expertise, and GenAI expertise metrics are critical for comparing the three intervention conditions. Boxplots were drawn to visualise the sample distribution for each condition, further confirming the similarity of the groups (Figure8). These findings ensure that any differences observed in the subsequent analyses can be attributed to the interventions rather than pre-existing disparities among the participants.

4.2.RQ1 – Changes in Effectiveness and Efficiency between Phases

Figure9illustrates the within-subject changes across the three phases for each intervention condition. For the DS condition (N=36), the Friedman test revealed a significant difference in correct scores across the three phases, $\chi^{2}(2)=30.05,p<.001,W=0.35$ (medium effect(Pereira et al.,2015)). Post-hoc Wilcoxon signed-rank tests indicated that the correct scores significantly improved from the pre-intervention (Median = 3.0, IQR = 2.0) to the intervention phase (Median = 5.0, IQR = 1.0), $W=46,p<.001,r=0.91$ (large effect(Kerby,2014)), and from the pre-intervention to the post-intervention phase (Median = 5.0, IQR = 2.0), $W=52,p<.001,r=0.88$ (large effect). However, the change from the intervention phase to the post-intervention phase was not significant. In terms of success time for the DS condition, the Friedman test also showed significant differences, $\chi^{2}(2)=33.17,p<.001,W=0.46$ (medium effect). Wilcoxon signed-rank tests revealed that the success time significantly decreased from the pre-intervention (Median = 75.72 seconds, IQR = 62.53 seconds) to the intervention phase (Median = 39.61 seconds, IQR = 25.96 seconds), $W=76,p<.001,r=0.89$ (large effect), and from the pre-intervention to the post-intervention phase (Median = 35.58 seconds, IQR = 21.27 seconds), $W=22,p<.001,r=0.97$ (large effect). The change from the intervention to the post-intervention phase was not significant. These results suggest that the DS intervention significantly improved the participants’ ability to extract correct insights from data visualisations and reduced the time taken to do so. The improvements from the pre-intervention to the intervention phase and from the pre-intervention to the post-intervention phase were significant, indicating the effectiveness of the DS condition. However, there was no significant difference between the intervention and the post-intervention phase, suggesting that the gains in performance were maintained post-intervention.

For the GAI condition (N=41), the Friedman test indicated significant differences in correct scores, $\chi^{2}(2)=29.56,p<.001,W=0.29$ (small effect). Post-hoc tests showed significant improvements from the pre-intervention (Median = 3.0, IQR = 1.0) to the intervention phase (Median = 4.0, IQR = 1.0), $W=71,p<.001,r=0.87$ (large effect), and from the pre-intervention to the post-intervention phase (Median = 5.0, IQR = 2.0), $W=104,p<.001,r=0.81$ (large effect). The change from the intervention to the post-intervention phase was not significant. The success time for the GAI condition also showed significant differences, $\chi^{2}(2)=40.59,p<.001,W=0.49$ (medium effect). Wilcoxon signed-rank tests indicated significant reductions from the pre-intervention (Median = 72.12 seconds, IQR = 77.27 seconds) to the intervention phase (Median = 46.34 seconds, IQR = 42.88 seconds), $W=57,p<.001,r=0.93$ (large effect), and from the pre-intervention to the post-intervention phase (Median = 36.8 seconds, IQR = 19.74 seconds), $W=86,p<.001,r=0.90$ (large effect). The change from the intervention to the post-intervention phase was not significant. These findings indicate that the GAI intervention was effective in significantly enhancing participants’ correct scores and reducing their success times from the baseline to both the intervention and the post-intervention phase. However, the lack of significant differences between the intervention and the post-intervention phase suggests that the improvements were sustained post-intervention, similar to the DS condition.

For the DSAI condition (N=40), the Friedman test revealed significant differences in correct scores, $\chi^{2}(2)=46.48,p<.001,W=0.50$ (large effect). Post-hoc Wilcoxon signed-rank tests indicated significant improvements from the pre-intervention (Median = 3.0, IQR = 2.0) to the intervention phase (Median = 5.0, IQR = 1.0), $W=32,p<.001,r=0.94$ (large effect), and from the pre-intervention to the post-intervention phase (Median = 6.0, IQR = 1.0), $W=19,p<.001,r=0.97$ (large effect). Additionally, there was a significant improvement from the intervention to the post-intervention phase, $W=209,p=.019,r=0.52$ (large effect). Significant differences were also found in the success time for the DSAI condition, $\chi^{2}(2)=43.65,p<.001,W=0.55$ (large effect). Wilcoxon signed-rank tests indicated significant reductions from the pre-intervention (Median = 98.28 seconds, IQR = 76.31 seconds) to the intervention phase (Median = 41.94 seconds, IQR = 19.77 seconds), $W=20,p<.001,r=0.98$ (large effect), and from the pre-intervention to the post-intervention phase (Median = 35.75 seconds, IQR = 16.06 seconds), $W=42,p<.001,r=0.95$ (large effect). The change from the intervention to the post-intervention phase was not significant. These results demonstrate significant improvements in both correct scores and success times from the baseline to the intervention and the post-intervention phase. Unlike the other conditions, the DSAI intervention also showed significant improvement from the intervention to the post-intervention phase, indicating a continued positive improvement post-intervention.

4.3.RQ2 – Changes in Effectiveness and Efficiency between Interventions

Figure10illustrates the between-subject changes across the three conditions for each intervention phase. The Kruskal-Wallis test did not reveal significant differences between the three conditions for the correct scores in both the pre-intervention and the intervention phase. However, for the post-intervention correct scores, the Kruskal-Wallis test revealed significant differences, $H(2)=10.62,n=117,p=.005,\eta^{2}=0.08$ (medium effect). Pairwise comparisons using the Mann-Whitney U test indicated significant differences between the DSAI and DS conditions ( $U=967.5,p=.007,r=0.30$ ;medium effect), and between the DSAI and GAI conditions ( $U=1119.0,p=.003,r=0.31$ ;medium effect). Specifically, the median correct score for DSAI was 6.0 (IQR = 1.0), which was significantly higher than the median scores for DS (Median = 5.0, IQR = 2.0) and GAI (Median = 5.0, IQR = 2.0). For the success times, the Kruskal-Wallis tests revealed no significant differences between the intervention conditions across all three phases. These between-subject results suggest that while all three interventions (DS, GAI, and DSAI) were effective in improving correct scores and reducing success times within subjects (from RQ1 findings), the DSAI condition was particularly effective in achieving higher correct scores in the post-intervention phase compared to the DS and GAI conditions. This indicates that the DSAI intervention may be more effective in sustaining performance improvements post-intervention. There were no significant differences in success times between the conditions, suggesting that all interventions were similarly effective in reducing the time taken to extract insights from the visualisations.

4.4.RQ3 – Visualisation literacy

4.4.1.Intervention score and time.

The OLS regression revealed no significant main effects or interaction effects for visualisation literacy on the intervention score for all three conditions. Regarding the success time, the OLS regression also did not reveal any significant main or interaction effects for visualisation literacy across the DS, GAI, or DSAI conditions.

4.4.2.Post-intervention score and time.

The OLS regression revealed a significant main effect for visualisation literacy ( $\beta=0.41,SE=0.17,t=2.33,p=0.022$ ) and the DSAI condition ( $\beta=0.55,SE=0.25,t=2.52,p=0.013$ ) on the post-intervention score. This indicates that an increase of one standard deviation in visualisation literacy results in a 0.405 standard deviation increase in the post-intervention score across all three conditions. Additionally, being in the DSAI condition results in a 0.546 standard deviation increase in the post-intervention score compared to the DS and GAI conditions while holding visualisation literacy constant. The model explained 27.7% of the variance in post-intervention scores ( $F(5,93)=7.11,p<0.001$ ). Whereas, the OLS regression for post-intervention time revealed no significant main or interaction effects for visualisation literacy on the post-intervention time for the DS and GAI conditions. However, the model is significant but merely explained 11.8% of the variance in post-intervention scores (( $F(5,93)=2.35,p=0.047$ )).

4.5.RQ4 – Intervention Experiences and Perceptions

4.5.1.Quantitative findings.

For all three self-reported measures, the Kruskal-Wallis test did not reveal any significant differences between the conditions, indicating that participants perceived the efficacy, helpfulness, and satisfaction of the interventions similarly. As shown in Figure11,the boxplots illustrate that the medians for efficacy, helpfulness, and satisfaction were consistently high across the DS, GAI, and DSAI conditions, with median values of 4.0 or 4.5 for all measures. This suggests overall positive experiences and perceptions for all three interventions in supporting participants’ understanding of data visualisations.

4.5.2.Qualitative Findings.

The thematic analysis of participants’ responses to the intervention reveals nuanced insights into how the interventions affected their ability to comprehend data visualisations. Overall, most participants (N=85) reported that the intervention enhanced their ability to interpret data visualisations. The specific benefits and challenges for each condition are detailed below.

DS Condition.Data stories enhanced interpretation efficiency (N=7), provided clear directions and examples (N=5), and offered multiple perspectives (N=2). One participant highlighted the detail and prioritisation in interpretation:”I learned how to interpret data visualisations more quickly and accurately. It taught me to pay attention to detail and look at the entire picture at the same time, gauging the important facts and putting them first but not forgetting all the other details”(P96). Another noted their usefulness in comparing nurse-patient conversations:”It (DS) really helped to compare how much each nurse was conversing with each other and to the patient/relative”(P34). Practical examples and clear structure were praised:”It provided lucid, organised direction on analysing visual representations and extracting meaningful insights. Practical examples and established techniques were offered to enhance my comprehension of graphical data depiction”(P118). However, challenges in integrating insights from multiple visual aspects were noted:”Interpreting complex data and integrating multiple visual aspects posed occasional challenges, making it difficult to draw clear conclusions. The intervention enhanced data comprehension but required careful consideration of each visualisation’s context”(P33). Overall, data stories improved pattern recognition, provided multiple perspectives, and enhanced efficiency in interpreting visual data, though challenges remained in integrating and contextualising complex data.

GAI Condition.Participants noted benefits such as providing contextual insights (N=10), facilitating textual interpretation (N=8), and enhancing task efficiency (N=5). The passive GenAI agent was praised for providing context:”The intervention provided valuable context for interpreting data visualisations by highlighting key aspects of task prioritisation, communication, and stress levels among nurses. It effectively guided my understanding of how these factors influenced performance”(P31). Another commended the agent for clarifying visualisation aspects:”I felt that the intervention helped clarify some aspects of the data visualisations. For instance, I was able to learn the significance of the thickness of the lines in the communication network”(P98). One participant highlighted the enhanced understanding by confirming observations and converting images to words:”The intervention greatly enhanced my understanding of the visual data by confirming what I was already seeing and also converting the images to words for easy and better comprehension”(P108). Efficiency and engagement were also noted:”To be honest, I was a little sceptical at first, but I was surprised by how efficiently it addressed my queries. It made my work easier and saved me a lot of time”(P72). However, a participant with high visualisation literacy reported minimal additional insight:”The intervention did not contribute much for me as it did not provide any additional insight on the graphs; using the chatbot was both slower and less straightforward than analysing the graphs”(P21). In sum, the passive GenAI agent enhanced contextual understanding and efficiency, though utility varied for those with higher visualisation literacy.

DSAI Condition.Participants reported benefits of the proactive GenAI agent, including integrating insights for holistic understanding (N=14) and providing structured guidance and feedback (N=10). One participant noted:”I am able to put together different components that are given and make one conclusion rather than look at one part and just make a conclusion”(P48). Another stated:”It allowed me to relate the visuals to one another in order to maximise my understanding of the situation”(P17). The proactive GenAI agent was endorsed for guidance and feedback. One mentioned:”The intervention enhanced my ability to interpret the data visualisation as it provided guidance and understanding of the time allocation, stress level and communication pattern”(P101). Another appreciated the feedback:”It provided meaningful feedback and asked the right questions on what to focus on and what you have to get right. So, overall, I think it’s pretty neat and useful for bringing a better experience when you have to work with data and need some help and feedback”(P18). One participant found it helpful for consolidating assumptions:”I found that it was helpful to consolidate things our brain may have already been assuming. I also found that it was able to tell us our errors when we were assuming things that were not necessarily correct about these diagrams or images”(P27). However, one participant felt it was less useful for simple visualisations like bar charts:”In the first question, it only confirmed what could be observed easily in the chart”(P8). Another felt it wasn’t necessary for the tasks:”I thought it was helpful in guiding me along. However, I felt as though it was not completely necessary. I would value it if I needed help, but I don’t know if I really did”(P56). Overall, while the proactive GenAI agent enhanced interpretation and integration of insights through guidance and feedback, utility varied with visualisation complexity and prior experience.

5.Discussion

In this section, we outline the main findings regarding each research question, offer our critical thoughts on the implications for research and practice, and highlight the limitations of our study along with possible directions for future research.

5.1.Summary of Findings and Research Questions

As data visualisation gains widespread adoption across various domains to convey data insights(Sun et al.,2023;Park et al.,2023a;Yang et al.,2023;Schroeder et al.,2020),it becomes crucial to identify practical and scalable methods to assist the general public in understanding these visualisations. Many individuals struggle with interpreting visualisations due to low levels of visualisation literacy(Maltese et al.,2015;Börner et al.,2016;Donohoe and Costello,2020).This comparative study offers empirical evidence on three distinct approaches—data storytelling, passive GenAI agent, and proactive GenAI agent—to improve the effectiveness and efficiency of comprehending data visualisations.

Regarding the first research question, all three approaches significantly enhanced both the effectiveness (measured by correct scores) and efficiency (measured by average success time) of extracting insights from data visualisations. These improvements were maintained post-intervention for data stories and the passive GenAI agent, while the proactive GenAI agent condition showed continued enhancement even after the intervention was removed. These findings provide empirical evidence supporting prior assumptions that data storytelling enhances participants’ ability to accurately and efficiently comprehend insights conveyed through data stories more than conventional data visualisations(Ryan,2016;Zhang,2018;Zhang and Lugmayr,2019;Zhang et al.,2022;Kosara and MacKinlay,2013;Gershon and Page,2001).Such findings also provided empirical evidence to supportDykes (2015)’s foundational definition of data storytelling. Likewise, the findings also contribute empirical evidence supporting the use of GenAI agents as an alternative to data storytelling for facilitating accurate and efficient comprehension of data visualisations. This aligns with recent recommendations to leverage advanced AI systems for supporting the understanding of data visualisations(Li et al.,2024b;Fernandez-Nieto et al.,2024;Yan et al.,2024d).Furthermore, the sustained and continued improvements following the removal of interventions indicate that these supportive methods do more than serve as temporary tools; they facilitate the learning of key comprehension skills in data visualisation, extending their benefits beyond mere utility to education(Segel and Heer,2010;Asamoah,2022;Bahtaji,2020).

In terms of the second research question, we found that the proactive GenAI agent infused with scaffolding techniques significantly outperformed both data stories and the passive GenAI agent in the effectiveness of extracting insights from data visualisations after the intervention was removed. This finding supports the effectiveness of scaffolding techniques for mastering a subject(Gibbons,2002;Van de Pol et al.,2010;Kim et al.,2018),which in this study pertains to users’ ability to comprehend data visualisation. As these techniques are also frequently used in the creation of data stories(Segel and Heer,2010;Zdanovic et al.,2022a;Ryan,2016;Knaflic,2015),the proactive agent can be viewed as a combination of data storytelling and the passive GenAI agent. It actively conveys data stories by prompting users with targeted questions instead of passively responding to user prompts. In this sense, the current finding also provides empirical evidence to support the additive effects of combining data storytelling with GenAI technologies to achieve better comprehension and learning skills in the interpretation of data visualisations.

For the third research question, the absence of interactional effects between intervention conditions and visualisation literacy on participants’ effectiveness in understanding data visualisations is consistent withShao et al.’s findings(Shao et al.,2024).This indicates that all three interventions are equally beneficial for users, regardless of their level of visualisation literacy. Contrary to the common belief supported by previous studies(Ma et al.,2012;Zhang et al.,2022;Figueiras,2014;Pozdniakov et al.,2023),our results did not statistically confirm that any supportive methods, including data storytelling, are more advantageous for individuals with lower visualisation literacy. Furthermore, the significant results demonstrating the proactive GenAI agent’s ability to enhance participants’ effectiveness in comprehending data visualisations after the intervention was removed, while accounting for visualisation literacy, further validate its benefits for users across all levels of visualisation literacy.

Regarding the last research question, qualitative insights revealed that participants appreciated the enhanced efficiency, clear directions, practical examples, and multiple perspectives offered by data stories. This finding supports the proposed benefits of data storytelling, where integrating narrative elements could improve the communicative power of data visualisations(Schulz et al.,2013;Dykes,2015).However, such benefits seem to be less applicable in supporting participants to integrate insights from multiple visual aspects and contextualising interpretations, emphasising the need for integrative data storytelling techniques that support users in making connections between multiple heterogeneous facets(Chen et al.,2018;Martinez-Maldonado et al.,2020).On the other hand, the passive GenAI agent was noted for providing valuable contextual insights, facilitating textual interpretation, and improving task efficiency, and the proactive GenAI agent was praised for integrating multiple insights for a holistic understanding and offering structured guidance and feedback. These findings suggest that these GenAI agents hold considerable promise in complementing data stories, especially in scenarios where users need to synthesise complex data insights from multiple sources(Streit et al.,2011;Chen et al.,2018;Martinez-Maldonado et al.,2020).

5.2.Implications for Research

This study presents several critical implications for the field of HCI and data visualisation research. The demonstrated comparative effectiveness of GenAI agents suggests that leveraging advanced AI technologies can provide alternative and complementing methods to data storytelling in support users in interpreting complex data visualisations(Fernandez-Nieto et al.,2024;Li et al.,2024b;Yan et al.,2024d).Specifically, the proactive agent’s superior performance, especially after the intervention was removed, underscores the potential of scaffolding techniques in facilitating and cultivating essential comprehension skills of data visualisations(Gibbons,2002;Kim et al.,2018).Researchers should explore how these agents can be further developed to provide adaptive, context-aware support that evolves with the user’s growing comprehension skills(Asamoah,2022;Bahtaji,2020).

The positive findings on GenAI agents by no means invalidate the value of data storytelling. Instead, they further substantiate the foundational principles of creating effective data stories(Segel and Heer,2010;Zdanovic et al.,2022a;Ryan,2016;Knaflic,2015),as these design principles closely align with established scaffolding techniques in the field of education and training(Van de Pol et al.,2010;Kim et al.,2018).However, the current evidence is limited to enhanced effectiveness and efficiency in comprehending insights from data visualisations and does not necessarily translate to improved memory recall(Zdanovic et al.,2022a),user engagement(Boy et al.,2015;Zhao et al.,2019),or fostering empathy towards the subject matter(Boy et al.,2017;Morais et al.,2021;Liem et al.,2020).Consequently, this study contributes to and advocates for a broader research initiative that promotes the investigation of human-AI collaboration in designing and developing tools to better support users in comprehending insights from data visualisations and developing their comprehension skills during the process. This is becoming increasingly essential for work across various domains(Sun et al.,2023;Park et al.,2023a;Yang et al.,2023;Schroeder et al.,2020).

The absence of significant interactional effects between visualisation literacy and the intervention conditions is consistent withShao et al.’s findings(Shao et al.,2024).This indicates that visualisation literacy might not be as crucial as previously believed in determining the effectiveness of supportive methods(Pozdniakov et al.,2023).Nevertheless, visualisation literacy remains an important factor affecting the effectiveness of comprehending data visualisations once the support is removed. This discovery paves the way for more in-depth investigations into the complex relationship between supportive methods and visualisation literacy, promoting a more comprehensive understanding of how visualisation literacy aids in users’ comprehension of data visualisation. Additionally, these results suggest the need for further research into other individual differences, such as cognitive load or domain-specific knowledge, that might impact the effectiveness of these supportive methods.

5.3.Implications for Practice

This study provides actionable insights for enhancing data visualisation comprehension through practical methods. Incorporating data storytelling and GenAI agents into data visualisations can significantly improve users’ ability to extract and understand insights, making it a valuable strategy for data analysts, educators, and communication professionals(Dykes,2015;Martinez-Maldonado et al.,2020;Wang et al.,2019;Shan et al.,2022).The demonstrated improvements in both effectiveness and efficiency of these methods are critical in decision-making contexts, ensuring timely and accurate comprehension of insights communicated via data visualisations(Daradkeh,2021;Schroeder et al.,2020).Moreover, the sustained improvements after removing these supportive methods highlight their educational role in developing data visualisation comprehension skills, making them suitable for the general public, who often struggle with low levels of visualisation literacy(Maltese et al.,2015;Börner et al.,2016;Donohoe and Costello,2020).These methods are effective regardless of users’ visualisation literacy levels, further validating their appropriateness for diverse audiences.

Given the findings on sustained improvements in comprehension skills, practitioners should consider these supportive methods as educational tools rather than temporary aids. Training programs and workshops could integrate these tools to deepen understanding of data visualisation principles and techniques. Notably, the proactive GenAI agent demonstrated prolonged improvements from intervention to post-intervention, unlike data stories and passive GenAI agents, which only showed sustained improvements after removing the intervention. This highlights the unique advantage of the proactive GenAI agent in enhancing both performance and learning(Soderstrom and Bjork,2015).The guided experiences provided by the proactive GenAI agent make it an effective method for onboarding users to complex data visualisations, particularly when integrating insights from multiple data sources and visual aspects is required(Streit et al.,2011;Chen et al.,2018;Martinez-Maldonado et al.,2020).This learning process is valuable for in-service training across various industries, such as education, healthcare, business, sports analytics, and social work, where data visualisation is increasingly used to communicate essential insights and support decision-making(Fu and Stasko,2022;Zhi et al.,2019;Martinez-Maldonado et al.,2020;Wang et al.,2019;Lan et al.,2022;Wilkerson et al.,2021;Shan et al.,2022).

Despite the benefits of GenAI technologies in supporting data visualisation comprehension, challenges remain, especially in settings that prioritise data privacy and security(Yan et al.,2024c;Zohny et al.,2023;Yan et al.,2024a).Establishing knowledge databases for storing contextual information and transmitting such information to third-party GenAI models could pose privacy and security concerns. Therefore, integrating GenAI technologies should be carefully considered, with strict adherence to regulatory guidelines.

5.4.Limitations and Future Directions

While this study offers valuable insights, it also has several limitations that should be addressed in future research. Firstly, the study was conducted in a controlled experimental setting, which may not fully capture the complexities of real-world data visualisation tasks. Future studies should consider field experiments or longitudinal studies to examine the long-term effects of these interventions in naturalistic settings. Secondly, the participant pool was limited by the recruitment platform and available resources, resulting in an overrepresentation of participants from North America and Europe. Future research should strive to include a more diverse participant group. Thirdly, the study highlighted the potential of proactive GenAI agents but did not delve into specific design features and interaction mechanisms contributing to their effectiveness. Future research should explore the design and implementation of these agents in greater detail, optimising scaffolding techniques and feedback mechanisms to support users’ comprehension of data visualisations. Additionally, the generalisability of the findings may be limited by the specific sample and context used in this study. Future research should replicate the study with diverse populations and in different domains to validate and extend the findings. Lastly, we did not analyse the analytical writing content due to potential subjectivity in coding and grading responses. This decision was influenced by the limited insights such analyses would contribute to our study’s focus on understanding effectiveness and efficiency. In future work, we plan to use automatic essay grading and learning analytics advancements to uncover process-level insights from participants’ interactions with the supportive methods and their impacts on the writing process(Ramesh and Sanampudi,2022;Kovanovic et al.,2017).

5.5.Conclusion

This study provides empirical evidence on the effectiveness of data storytelling, passive GenAI agents, and proactive GenAI agents in enhancing users’ comprehension of data visualisations. The findings demonstrate that all three approaches significantly improve the effectiveness and efficiency of extracting insights, with the proactive GenAI agent showing the most promise in sustaining and further enhancing comprehension skills. These results have important implications for both research and practice, highlighting the need for more integrative and adaptive approaches to data visualisation support. Future research should continue to explore the potential of GenAI technologies and scaffolding techniques to develop more user-centred data visualisation tools that can cater to the diverse needs of users across different contexts and levels of expertise.

Acknowledgements.

Roberto Martinez-Maldonado’s and Dragan Gasevic’s research was partly funded by the Jacobs Foundation. Dragan Gasevic’s, Lixiang Yan’s, and Roberto Martinez-Maldonado’s research was partly funded by the Australian Government through the Australian Research Council (DP240100069, DP220101209, DP210100060) and Digital Health Cooperative Centre Ltd. Dragan Gasevic’s research was partly funded by the Defense Advanced Research Projects Agency (DARPA) through the Knowledge Management at Speed and Scale (KMASS) program (HR0011-22-2-0047). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.

References

(1)
Achiam et al.(2023) Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al.2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774(2023).
Arneson and Offerdahl (2018) Jessie B Arneson and Erika G Offerdahl. 2018. Visual literacy in Bloom: Using Bloom’s taxonomy to support visual learning skills. CBE—Life Sciences Education17, 1 (2018), ar7. https://doi.org/10.1187/cbe.17-08-0178
Asamoah (2022) Daniel Asamoah. 2022. Improving Data Visualization Skills: A Curriculum Design. International Journal of Education and Development Using Information and Communication Technology18, 1 (2022), 213–235.
Association et al.(2014) American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. 2014. Standards for Educational and Psychological Testing. AERA, Washington, DC.
Aurambout et al.(2013) Jean-Philippe Aurambout, Falak Sheth, Ian Bishop, and Christopher Pettit. 2013. Simplifying climate change communication: An application of data visualisation at the regional and local scale. GeoSpatial visualisation(2013), 119–136. https://doi.org/10.1007/978-3-642-12289-7_6
Bach et al.(2018) Benjamin Bach, D. Stefaner, J. Boy, S. Drucker, L. Bartram, J. Wood, P. Ciuccarelli, Yuri Engelhardt, U. Köppen, and B. Tversky. 2018. Narrative Design Patterns for Data-Driven Storytelling. CRC Press (Taylor & Francis), 107–133. https://doi.org/10.1201/9781315281575-5
Bahtaji (2020) Michael Allan A Bahtaji. 2020. Improving students graphing skills and conceptual understanding using explicit Graphical Physics Instructions. Kıbrıslı Eğitim Bilimleri Dergisi15, 4 (2020), 843–853.
Bloom et al.(1984) Benjamin S Bloom, David R Krathwohl, Bertram B Masia, et al.1984. Bloom taxonomy of educational objectives. InAllyn and Bacon.Pearson Education London.
Blythe (2017) Mark Blythe. 2017. Research Fiction: Storytelling, Plot and Design. InProceedings of the 2017 CHI Conference on Human Factors in Computing Systems(Denver, Colorado, USA)(CHI ’17).Association for Computing Machinery, New York, NY, USA, 5400–5411. https://doi.org/10.1145/3025453.3026023
Borges et al.(2022) Mariana Borges, Claiton Marques Correa, and Milene Selbach Silveira. 2022. Fundamental elements and characteristics for telling stories using data. Journal on Interactive Systems13 (2022). Issue 1. https://doi.org/10.5753/jis.2022.2330
Börner et al.(2016) Katy Börner, Adam Maltese, Russell Nelson Balliet, and Joe Heimlich. 2016. Investigating aspects of data visualization literacy using 20 information visualizations and 273 science museum visitors. Information Visualization15, 3 (2016), 198–213. https://doi.org/10.1177/1473871615594652
Boy et al.(2015) Jeremy Boy, Francoise Detienne, and Jean-Daniel Fekete. 2015. Storytelling in Information Visualizations: Does It Engage Users to Explore Data?. InProceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(Seoul, Republic of Korea)(CHI ’15).Association for Computing Machinery, New York, NY, USA, 1449–1458. https://doi.org/10.1145/2702123.2702452
Boy et al.(2017) Jeremy Boy, Anshul Vikram Pandey, John Emerson, Margaret Satterthwaite, Oded Nov, and Enrico Bertini. 2017. Showing people behind data: Does anthropomorphizing visualizations elicit more empathy for human rights data?. InProceedings of the 2017 CHI conference on human factors in computing systems,Vol. 2017-May. 5462–5474. https://doi.org/10.1145/3025453.3025512
Boy et al.(2014) Jeremy Boy, Ronald A. Rensink, Enrico Bertini, and Jean Daniel Fekete. 2014. A principled way of assessing visualization literacy. IEEE Transactions on Visualization and Computer Graphics20 (2014). Issue 12. https://doi.org/10.1109/TVCG.2014.2346984
Braun and Clarke (2006) Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101.
Bravo and Maier (2020) Andrea Bravo and Anja M Maier. 2020. Immersive visualisations in design: Using augmented reality (AR) for information presentation. InProceedings of the Design Society: DESIGN Conference,Vol. 1. Cambridge University Press, 1215–1224. https://doi.org/10.1017/dsd.2020.33
Burns et al.(2020) Alyxander Burns, Cindy Xiong, Steven Franconeri, Alberto Cairo, and Narges Mahyar. 2020. How to evaluate data visualizations across different levels of understanding. Proceedings - 8th Evaluation and Beyond: Methodological Approaches for Visualization, BELIV 2020. https://doi.org/10.1109/BELIV51497.2020.00010
Charness et al.(2012) Gary Charness, Uri Gneezy, and Michael A Kuhn. 2012. Experimental methods: Between-subject and within-subject design. Journal of economic behavior & organization81, 1 (2012), 1–8.
Chen et al.(2021) Chih-Ming Chen, Jung-Ying Wang, and Li-Chieh Hsu. 2021. An interactive test dashboard with diagnosis and feedback mechanisms to facilitate learning performance. Computers and Education: Artificial Intelligence2 (2021), 100015.
Chen et al.(2024) Qing Chen, Wei Shuai, Jiyao Zhang, Zhida Sun, and Nan Cao. 2024. Beyond Numbers: Creating Analogies to Enhance Data Comprehension and Communication with Generative AI. InProceedings of the CHI Conference on Human Factors in Computing Systems.1–14.
Chen et al.(2018) Siming Chen, Jie Li, Gennady Andrienko, Natalia Andrienko, Yun Wang, Phong H Nguyen, and Cagatay Turkay. 2018. Supporting story synthesis: Bridging the gap between visual analytics and storytelling. IEEE transactions on visualization and computer graphics26, 7 (2018), 2499–2516.
Chung et al.(2022) John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching stories with generative pretrained language models. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems.1–19.
Dado and Bodemer (2017) Marielle Dado and Daniel Bodemer. 2017. A review of methodological applications of social network analysis in computer-supported collaborative learning. Educational Research Review22 (2017), 159–180.
Dang et al.(2023) Hai Dang, Sven Goller, Florian Lehmann, and Daniel Buschek. 2023. Choice over control: How users write with large language models using diegetic and non-diegetic prompting. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems.1–17.
Daradkeh (2021) Mohammad Kamel Daradkeh. 2021. An empirical examination of the relationship between data storytelling competency and business performance: The mediating role of decision-making quality. Journal of Organizational and End User Computing33 (2021). Issue 5. https://doi.org/10.4018/JOEUC.20210901.oa3
De Laat et al.(2007) Maarten De Laat, Vic Lally, Lasse Lipponen, and Robert-Jan Simons. 2007. Investigating patterns of interaction in networked learning and computer-supported collaborative learning: A role for Social Network Analysis. International Journal of Computer-Supported Collaborative Learning2 (2007), 87–103.
Dibia (2023) Victor Dibia. 2023. LIDA: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models. arXiv preprint arXiv:2303.02927(2023).
Dimond et al.(2013) Jill P. Dimond, Michaelanne Dye, Daphne Larose, and Amy S. Bruckman. 2013. Hollaback! The Role of Storytelling Online in a Social Movement Organization. InProceedings of the 2013 Conference on Computer Supported Cooperative Work(San Antonio, Texas, USA)(CSCW ’13).Association for Computing Machinery, New York, NY, USA, 477–490. https://doi.org/10.1145/2441776.2441831
Donohoe and Costello (2020) David Donohoe and Eamon Costello. 2020. Data visualisation literacy in higher education: An exploratory study of understanding of a learning dashboard tool. International Journal of Emerging Technologies in Learning15 (2020). Issue 17. https://doi.org/10.3991/ijet.v15i17.15041
Dykes (2015) Brent Dykes. 2015. Data storytelling: What it is and how it can be used to effectively communicate analysis results. Applied Marketing Analytics1, 4 (2015), 299–313.
Echeverria et al.(2017) Vanessa Echeverria, Roberto Martinez-Maldonado, and Simon Buckingham Shum. 2017. Towards data storytelling to support teaching and learning. ACM International Conference Proceeding Series. https://doi.org/10.1145/3152771.3156134
Echeverria et al.(2018) Vanessa Echeverria, Roberto Martinez-Maldonado, Simon Buckingham Shum, Katherine Chiluiza, Roger Granda, and Cristina Conati. 2018. Exploratory versus Explanatory Visual Learning Analytics: Driving Teachers’ Attention through Educational Data Storytelling. Journal of Learning Analytics5 (2018). Issue 3. https://doi.org/10.18608/jla.2018.53.6
Ellis and Dix (2007) Geoffrey Ellis and Alan Dix. 2007. A taxonomy of clutter reduction for information visualisation. IEEE transactions on visualization and computer graphics13, 6 (2007), 1216–1223.
Fan et al.(2022) Arlen Fan, Yuxin Ma, Michelle Mancenido, and Ross Maciejewski. 2022. Annotating Line Charts for Addressing Deception. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(¡conf-loc¿, ¡city¿New Orleans¡/city¿, ¡state¿LA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿)(CHI ’22).Association for Computing Machinery, New York, NY, USA, Article 80, 12 pages. https://doi.org/10.1145/3491102.3502138
Faul et al.(2009) Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior research methods41, 4 (2009), 1149–1160.
Feigenbaum and Alamalhodaei (2020) Anna Feigenbaum and Aria Alamalhodaei. 2020. The data storytelling workbook. Routledge. https://doi.org/10.4324/9781315168012
Fernandez-Nieto et al.(2024) Gloria Milena Fernandez-Nieto, Roberto Martinez-Maldonado, Vanessa Echeverria, Kirsty Kitto, Dragan Gašević, and Simon Buckingham Shum. 2024. Data storytelling editor: A teacher-centred tool for customising learning analytics dashboard narratives. InProceedings of the 14th Learning Analytics and Knowledge Conference.678–689.
Few (2004) Stephen Few. 2004. Show me the numbers:Designing Tables & Graphs to Enlighten. (2004).
Figueiras (2014) Ana Figueiras. 2014. Narrative visualization: A case study of how to incorporate narrative elements in existing visualizations. Proceedings of the International Conference on Information Visualisation. https://doi.org/10.1109/IV.2014.79
Firat et al.(2022) Elif E Firat, Alark Joshi, and Robert S Laramee. 2022. Interactive visualization literacy: The state-of-the-art. Information Visualization21, 3 (mar 2022), 285–310. https://doi.org/10.1177/14738716221081831
Fog et al.(2005) Klaus Fog, Christian Budtz, and Baris Yakaboylu. 2005. Storytelling: Branding in practice. Springer. https://doi.org/10.1007/b138635
Fu and Stasko (2022) Yu Fu and John Stasko. 2022. Supporting Data-Driven Basketball Journalism through Interactive Visualization. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22).Association for Computing Machinery, New York, NY, USA, Article 598, 17 pages. https://doi.org/10.1145/3491102.3502078
Gao et al.(2023) Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997(2023).
Garlandini and Fabrikant (2009) Simone Garlandini and Sara Irina Fabrikant. 2009. Evaluating the effectiveness and efficiency of visual variables for geographic information visualization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)5756 LNCS. https://doi.org/10.1007/978-3-642-03832-7_12
Gershon and Page (2001) Nahum Gershon and Ward Page. 2001. What storytelling can do for information visualization. Commun. ACM44, 8 (2001), 31–37. https://doi.org/10.1145/381641.381653
Gibbons (2002) Pauline Gibbons. 2002. Scaffolding language, scaffolding learning. Heinemann Portsmouth, NH.
Goldsberry (2012) Kirk Goldsberry. 2012. Courtvision: New visual and spatial analytics for the nba. In2012 MIT Sloan sports analytics conference,Vol. 9. 12–15.
Gu (2022) Zuguang Gu. 2022. Complex heatmap visualization. Imeta1, 3 (2022), e43.
Guest et al.(2012) Greg Guest, Kathleen MacQueen, and Emily Namey. 2012. Validity and Reliability (Credibility and Dependability) in Qualitative Research and Data Analysis. SAGE Publications, Inc., Thousand Oaks, 79–103. https://doi.org/10.4135/9781483384436
Gupta et al.(2024) Rajan Gupta, Gaurav Pandey, and Saibal Kumar Pal. 2024. Automating Government Report Generation: A Generative AI Approach for Efficient Data Extraction, Analysis, and Visualization. Digital Government: Research and Practice(2024).
Hariton and Locascio (2018) Eduardo Hariton and Joseph J Locascio. 2018. Randomised controlled trials—the gold standard for effectiveness research. BJOG: an international journal of obstetrics and gynaecology125, 13 (2018), 1716.
Heer and Shneiderman (2012) Jeffrey Heer and Ben Shneiderman. 2012. Interactive Dynamics for Visual Analysis: A Taxonomy of Tools That Support the Fluent and Flexible Use of Visualizations. Queue10, 2 (feb 2012), 30–55. https://doi.org/10.1145/2133416.2146416
Hicks (2009) Martin Hicks. 2009. Perceptual and Design Principles for Effective Interactive Visualisations. Springer London, London, 155–174. https://doi.org/10.1007/978-1-84800-269-2_7
Hou et al.(2024) Irene Hou, Sophia Mettille, Owen Man, Zhuo Li, Cynthia Zastudil, and Stephen MacNeil. 2024. The Effects of Generative AI on Computing Students’ Help-Seeking Preferences. InProceedings of the 26th Australasian Computing Education Conference.39–48.
Hullman et al.(2013) Jessica Hullman, Steven Drucker, Nathalie Henry Riche, Bongshin Lee, Danyel Fisher, and Eytan Adar. 2013. A deeper understanding of sequence in narrative visualization. IEEE Transactions on Visualization and Computer Graphics19 (2013). Issue 12. https://doi.org/10.1109/TVCG.2013.119
Ji et al.(2023) Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys55, 12 (2023), 1–38.
Kalyuga (2009) Slava Kalyuga. 2009. The expertise reversal effect. InManaging cognitive load in adaptive multimedia learning.IGI Global, 58–80. https://doi.org/10.4018/978-1-60566-048-6.ch003
Kerby (2014) Dave S Kerby. 2014. The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology3 (2014), 11–IT.
Kim et al.(2024) Jeong Soo Kim, Minseong Kim, and Tae Hyun Baek. 2024. Enhancing User Experience With a Generative AI Chatbot. International Journal of Human–Computer Interaction(2024), 1–13.
Kim et al.(2018) Nam Ju Kim, Brian R Belland, and Andrew E Walker. 2018. Effectiveness of computer-based scaffolding in the context of problem-based learning for STEM education: Bayesian meta-analysis. Educational Psychology Review30 (2018), 397–429.
Knaflic (2015) Cole Nussbaumer Knaflic. 2015. Storytelling with data: a data visualization guide for business professionals.
Kong et al.(2018) Ha-Kyung Kong, Zhicheng Liu, and Karrie Karahalios. 2018. Frames and Slants in Titles of Visualizations on Controversial Topics. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada)(CHI ’18).Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3174012
Kong et al.(2019) Ha-Kyung Kong, Wenjie Zhu, Zhicheng Liu, and Karrie Karahalios. 2019. Understanding Visual Cues in Visualizations Accompanied by Audio Narrations. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI ’19).Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300280
Kosara and MacKinlay (2013) Robert Kosara and Jock MacKinlay. 2013. Storytelling: The next step for visualization. Computer46 (2013). Issue 5. https://doi.org/10.1109/MC.2013.36
Kovanovic et al.(2017) Vitomir Kovanovic, Srecko Joksimovic, Dragan Gašević, Marek Hatala, and George Siemens. 2017. Content analytics: The definition, scope, and an overview of published research. Handbook of learning analytics(2017), 77–92.
Krum (2013) Randy Krum. 2013. Cool infographics: Effective communication with data visualization and design. John Wiley & Sons.
Lan et al.(2022) Xingyu Lan, Yanqiu Wu, Yang Shi, Qing Chen, and Nan Cao. 2022. Negative Emotions, Positive Outcomes? Exploring the Communication of Negativity in Serious Data Stories. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems(New Orleans, LA, USA)(CHI ’22).Association for Computing Machinery, New York, NY, USA, Article 28, 14 pages. https://doi.org/10.1145/3491102.3517530
Lee et al.(2015) Bongshin Lee, Nathalie Henry Riche, Petra Isenberg, and Sheelagh Carpendale. 2015. More Than Telling a Story: Transforming Data into Visually Shared Stories. IEEE Computer Graphics and Applications35 (2015). Issue 5. https://doi.org/10.1109/MCG.2015.99
Lee et al.(2023) Kenton Lee, Mandar Joshi, Iulia Raluca Turc, Hexiang Hu, Fangyu Liu, Julian Martin Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, and Kristina Toutanova. 2023. Pix2struct: Screenshot parsing as pretraining for visual language understanding. InInternational Conference on Machine Learning.PMLR, 18893–18912.
Lee et al.(2016) Sukwon Lee, Sung-Hee Kim, and Bum Chul Kwon. 2016. Vlat: Development of a visualization literacy assessment test. IEEE transactions on visualization and computer graphics23, 1 (2016), 551–560.
Leiser et al.(2024) Florian Leiser, Sven Eckhardt, Valentin Leuthe, Merlin Knaeble, Alexander Maedche, Gerhard Schwabe, and Ali Sunyaev. 2024. Hill: A hallucination identifier for large language models. InProceedings of the CHI Conference on Human Factors in Computing Systems.1–13.
Lewis et al.(2020) Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al.2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474.
Li et al.(2024b) Haotian Li, Yun Wang, and Huamin Qu. 2024b. Where are we so far? understanding data storytelling tools from the perspective of human-ai collaboration. InProceedings of the CHI Conference on Human Factors in Computing Systems.1–19.
Li et al.(2024a) Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2024a. From matching to generation: A survey on generative information retrieval. arXiv preprint arXiv:2404.14851(2024).
Liem et al.(2020) J. Liem, C. Perm, and J. Wood. 2020. Structure and Empathy in Visual Data Storytelling: Evaluating their Influence on Attitude. Computer Graphics Forum39 (2020). Issue 3. https://doi.org/10.1111/cgf.13980
Liu et al.(2022) Vivian Liu, Han Qiao, and Lydia Chilton. 2022. Opal: Multimodal image generation for news illustration. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology.1–17.
Lowe and Matthee (2020) Joy Lowe and Machdel Matthee. 2020. Requirements of data visualisation tools to analyse big data: A structured literature review. InResponsible Design, Implementation and Use of Information and Communication Technology: 19th IFIP WG 6.11 Conference on e-Business, e-Services, and e-Society, I3E 2020, Skukuza, South Africa, April 6–8, 2020, Proceedings, Part I 19.Springer, 469–480. https://doi.org/10.1007/978-3-030-44999-5_39
Ma et al.(2012) Kwan Liu Ma, Isaac Liao, Jennifer Frazier, Helwig Hauser, and Helen Nicole Kostis. 2012. Scientific storytelling using visualization. IEEE Computer Graphics and Applications32 (2012). Issue 1. https://doi.org/10.1109/MCG.2012.24
Ma et al.(2023) Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, and Dongmei Zhang. 2023. Demonstration of InsightPilot: An LLM-empowered automated data exploration system. arXiv preprint arXiv:2304.00477(2023).
Maltese et al.(2015) Adam V Maltese, Joseph A Harsh, and Dubravka Svetina. 2015. Data visualization literacy: Investigating data interpretation along the novice—expert continuum. Journal of College Science Teaching45, 1 (2015), 84–90. https://doi.org/10.2505/4/jcst15_045_01_84
Martinez-Maldonado et al.(2020) Roberto Martinez-Maldonado, Vanessa Echeverria, Gloria Fernandez Nieto, and Simon Buckingham Shum. 2020. From data to insights: A layered storytelling approach for multimodal learning analytics. InProceedings of the 2020 chi conference on human factors in computing systems.1–15. https://doi.org/10.1145/3313831.3376148
McKight and Najab (2010) Patrick E McKight and Julius Najab. 2010. Kruskal-wallis test. The corsini encyclopedia of psychology(2010), 1–1.
Mikolov et al.(2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems26 (2013).
Milesi and Martinez-Maldonado (2024) Mikaela Elizabeth Milesi and Roberto Martinez-Maldonado. 2024. Data Storytelling in Learning Analytics? A Qualitative Investigation into Educators’ Perceptions of Benefits and Risks. InProceedings of the 14th Learning Analytics and Knowledge Conference(Kyoto, Japan)(LAK ’24).Association for Computing Machinery, New York, NY, USA, 167–177. https://doi.org/10.1145/3636555.3636865
Mnguni et al.(2016) Lindelani Mnguni, Konrad Schönborn, and Trevor Anderson. 2016. Assessment of visualisation skills in biochemistry students. South African Journal of Science112, 9-10 (2016), 1–8. https://doi.org/10.17159/sajs.2016/20150412
Morais et al.(2021) Luiz Morais, Yvonne Jansen, Nazareno Andrade, and Pierre Dragicevic. 2021. Can anthropographics promote prosociality? a review and large-sample study. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems.1–18. https://doi.org/10.1145/3411764.3445637
Muller et al.(2024) Michael Muller, Anna Kantosalo, Mary Lou Maher, Charles Patrick Martin, and Greg Walsh. 2024. GenAICHI 2024: Generative AI and HCI at CHI 2024. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems.1–7.
Noroozi et al.(2019) Omid Noroozi, Iman Alikhani, Sanna Järvelä, Paul A Kirschner, Ilkka Juuso, and Tapio Seppänen. 2019. Multimodal data to design visual learning analytics for understanding regulation of learning. Computers in Human Behavior100 (2019), 298–304.
Oertel et al.(2020) Catharine Oertel, Ginevra Castellano, Mohamed Chetouani, Jauwairia Nasir, Mohammad Obaid, Catherine Pelachaud, and Christopher Peters. 2020. Engagement in human-agent interaction: An overview. Frontiers in Robotics and AI7 (2020), 92.
Ojo and Heravi (2018) Adegboyega Ojo and Bahareh Heravi. 2018. Patterns in award winning data storytelling: Story types, enabling tools and competences. Digital journalism6, 6 (2018), 693–718. https://doi.org/10.1080/21670811.2017.1403291
Okonkwo and Ade-Ibijola (2021) Chinedu Wilfred Okonkwo and Abejide Ade-Ibijola. 2021. Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence2 (2021), 100033.
Ooi et al.(2023) Keng-Boon Ooi, Garry Wei-Han Tan, Mostafa Al-Emran, Mohammed A Al-Sharafi, Alexandru Capatina, Amrita Chakraborty, Yogesh K Dwivedi, Tzu-Ling Huang, Arpan Kumar Kar, Voon-Hsien Lee, et al.2023. The potential of generative artificial intelligence across disciplines: Perspectives and future directions. Journal of Computer Information Systems(2023), 1–32.
Pandey and Ottley (2023) Saugat Pandey and Alvitta Ottley. 2023. Mini-VLAT: A Short and Effective Measure of Visualization Literacy. InComputer Graphics Forum,Vol. 42. Wiley Online Library, 1–11.
Park et al.(2023a) Eunji Park, Yugyeong Jung, Inyeop Kim, and Uichin Lee. 2023a. Charlie and the Semi-Automated Factory: Data-Driven Operator Behavior and Performance Modeling for Human-Machine Collaborative Systems. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems.1–16. https://doi.org/10.1145/3544548.3581457
Park et al.(2023b) Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023b. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology.1–22.
Pereira et al.(2015) Dulce G Pereira, Anabela Afonso, and Fátima Melo Medeiros. 2015. Overview of Friedman’s test and post-hoc analysis. Communications in Statistics-Simulation and Computation44, 10 (2015), 2636–2653.
Pozdniakov et al.(2023) Stanislav Pozdniakov, Roberto Martinez-Maldonado, Yi-Shan Tsai, Vanessa Echeverria, Namrata Srivastava, and Dragan Gasevic. 2023. How Do Teachers Use Dashboards Enhanced with Data Storytelling Elements According to Their Data Visualisation Literacy Skills?. InLAK23: 13th International Learning Analytics and Knowledge Conference(Arlington, TX, USA)(LAK2023).Association for Computing Machinery, New York, NY, USA, 89–99. https://doi.org/10.1145/3576050.3576063
Ramesh and Sanampudi (2022) Dadi Ramesh and Suresh Kumar Sanampudi. 2022. An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review55, 3 (2022), 2495–2527.
Ren et al.(2023) Pengkun Ren, Yi Wang, and Fan Zhao. 2023. Re-understanding of data storytelling tools from a narrative perspective. Visual Intelligence1, 1 (2023), 11. https://doi.org/10.1007/s44267-023-00011-0
Roberts et al.(2018) Jonathan C. Roberts, Panagiotis D. Ritsos, James R. Jackson, and Christopher Headleand. 2018. The Explanatory Visualization Framework: An Active Learning Framework for Teaching Creative Computing Using Explanatory Visualizations. IEEE Transactions on Visualization and Computer Graphics24, 1 (2018), 791–801. https://doi.org/10.1109/TVCG.2017.2745878
Rodrigues et al.(2019) Sara Rodrigues, Ana Figueiras, and Ilo Alexandre. 2019. Once upon a time in a land far away: guidelines for spatio-temporal narrative visualization. In2019 23rd International Conference Information Visualisation (IV).IEEE, 44–49. https://doi.org/10.1109/IV.2019.00017
Ryan (2016) L. Ryan. 2016. The Visual Imperative: Creating a Visual Culture of Data Discovery. Elsevier Science. https://doi.org/10.1016/c2015-0-00786-9
Ryan (2018) Lindy Ryan. 2018. Visual data storytelling with tableau: story points, telling compelling data narratives. Addison-Wesley Professional.
Saket et al.(2018) Bahador Saket, Alex Endert, and Çağatay Demiralp. 2018. Task-based effectiveness of basic visualizations. IEEE transactions on visualization and computer graphics25, 7 (2018), 2505–2512.
Scheers and De Laet (2021) Hanne Scheers and Tinne De Laet. 2021. Interactive and explainable advising dashboard opens the black box of student success prediction. InTechnology-Enhanced Learning for a Free, Safe, and Sustainable World: 16th European Conference on Technology Enhanced Learning, EC-TEL 2021, Bolzano, Italy, September 20-24, 2021, Proceedings 16.Springer, 52–66.
Schroeder et al.(2020) Kay Schroeder, Batoul Ajdadilish, Alexander P Henkel, and André Calero Valdez. 2020. Evaluation of a financial portfolio visualization using computer displays and mixed reality devices with domain experts. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems.1–9. https://doi.org/10.1145/3313831.3376556
Schulz et al.(2013) Hans-Jörg Schulz, Marc Streit, Thorsten May, and Christian Tominski. 2013. Towards a characterization of guidance in visualization. InPoster at IEEE Conference on Information Visualization (InfoVis),Vol. 2.
Segel and Heer (2010) Edward Segel and Jeffrey Heer. 2010. Narrative visualization: Telling stories with data. IEEE Transactions on Visualization and Computer Graphics16 (2010), 1139–1148. Issue 6. https://doi.org/10.1109/TVCG.2010.179
Shan et al.(2022) Xurong Shan, Dawei Wang, and Jiaxiang Li. 2022. Research on Data Storytelling Strategies for Cultural Heritage Transmission and Dissemination. InHCI International 2022 – Late Breaking Posters,Constantine Stephanidis, Margherita Antona, Stavroula Ntoa, and Gavriel Salvendy (Eds.). Springer Nature Switzerland, Cham, 344–353.
Shao et al.(2024) Hongbo Shao, Roberto Martinez-Maldonado, Vanessa Echeverria, Lixiang Yan, and Dragan Gasevic. 2024. Data Storytelling in Data Visualisation: Does it Enhance the Efficiency and Effectiveness of Information Retrieval and Insights Comprehension?. InProceedings of the CHI Conference on Human Factors in Computing Systems.1–21.
Sheldon et al.(1996) Michael R Sheldon, Michael J Fillyaw, and W Douglas Thompson. 1996. The use and interpretation of the Friedman test in the analysis of ordinal-scale data in repeated measures designs. Physiotherapy Research International1, 4 (1996), 221–228.
Shuster et al.(2021) Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567(2021).
Siriwardhana et al.(2023) Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, and Suranga Nanayakkara. 2023. Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics11 (2023), 1–17.
Soderstrom and Bjork (2015) Nicholas C Soderstrom and Robert A Bjork. 2015. Learning versus performance: An integrative review. Perspectives on Psychological Science10, 2 (2015), 176–199.
Srinivasan et al.(2018) Arjun Srinivasan, Steven M Drucker, Alex Endert, and John Stasko. 2018. Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE transactions on visualization and computer graphics25, 1 (2018), 672–681.
Stone et al.(2005) Debbie Stone, Caroline Jarrett, Mark Woodroffe, and Shailey Minocha. 2005. User interface design and evaluation. Elsevier.
Streit et al.(2011) Marc Streit, Hans-Jorg Schulz, Alexander Lex, Dieter Schmalstieg, and Heidrun Schumann. 2011. Model-driven design for the visual analysis of heterogeneous data. IEEE Transactions on Visualization and Computer Graphics18, 6 (2011), 998–1010.
Sun et al.(2023) Yuling Sun, Xiaojuan Ma, Silvia Lindtner, and Liang He. 2023. Data Work of Frontline Care Workers: Practices, Problems, and Opportunities in the Context of Data-Driven Long-Term Care. Proceedings of the ACM on Human-Computer Interaction7 (2023), 1–28. Issue 1 CSCW. https://doi.org/10.1145/3579475
Therón (2020) Roberto Therón. 2020. Visual learning analytics for a better impact of big data. Radical Solutions and Learning Analytics: Personalised Learning and Teaching Through Big Data(2020), 99–113.
Thorndike et al.(1991) Robert M Thorndike, George K Cunningham, Robert Ladd Thorndike, and Elizabeth P Hagen. 1991. Measurement and evaluation in psychology and education. Macmillan Publishing Co, Inc.
Tufte (2001) Edward R Tufte. 2001. The visual display of quantitative information.Vol. 2. Graphics press Cheshire, CT. https://doi.org/10.4135/9781071812082.n670
Van de Pol et al.(2010) Janneke Van de Pol, Monique Volman, and Jos Beishuizen. 2010. Scaffolding in teacher–student interaction: A decade of research. Educational psychology review22 (2010), 271–296.
Wang et al.(2019) Zezhong Wang, Harvey Dingwall, and Benjamin Bach. 2019. Teaching Data Visualization and Storytelling with Data Comic Workshops. InExtended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI EA ’19).Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3290607.3299043
Wanous and Reichers (1996) John P Wanous and Arnon E Reichers. 1996. Estimating the reliability of a single-item measure. Psychological Reports78, 2 (1996), 631–634.
Ware (2019) Colin Ware. 2019. Information visualization: perception for design. Morgan Kaufmann. https://doi.org/10.1016/C2016-0-02395-1
Watson and Setlur (2015) Benjamin Watson and Vidya Setlur. 2015. Emerging research in mobile visualization. MobileHCI 2015 - Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct. https://doi.org/10.1145/2786567.2786571
White et al.(2023) Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382(2023).
Wilcoxon et al.(1970) Frank Wilcoxon, S Katti, Roberta A Wilcox, et al.1970. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics1 (1970), 171–259.
Wilkerson et al.(2021) Michelle Wilkerson, William Finzer, Tim Erickson, and Damaris Hernandez. 2021. Reflective Data Storytelling for Youth: The CODAP Story Builder. InProceedings of the 20th Annual ACM Interaction Design and Children Conference(Athens, Greece)(IDC ’21).Association for Computing Machinery, New York, NY, USA, 503–507. https://doi.org/10.1145/3459990.3465177
Wu et al.(2021) Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2021. Ai4vis: Survey on artificial intelligence approaches for data visualization. IEEE Transactions on Visualization and Computer Graphics28, 12 (2021), 5049–5070.
Wu et al.(2023) Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155(2023).
Xun and Land (2004) GE Xun and Susan M Land. 2004. A conceptual framework for scaffolding III-structured problem-solving processes using question prompts and peer interactions. Educational technology research and development52, 2 (2004), 5–22.
Yan et al.(2024a) Lixiang Yan, Samuel Greiff, Ziwen Teuber, and Dragan Gašević. 2024a. Promises and Challenges of Generative Artificial Intelligence for Human Learning. Nature Human Behaviour(2024). Accepted for publication.
Yan et al.(2024b) Lixiang Yan, Roberto Martinez-Maldonado, and Dragan Gasevic. 2024b. Generative Artificial Intelligence in Learning Analytics: Contextualising Opportunities and Challenges through the Learning Analytics Cycle. InProceedings of the 14th Learning Analytics and Knowledge Conference.101–111.
Yan et al.(2024c) Lixiang Yan, Lele Sha, Linxuan Zhao, Yuheng Li, Roberto Martinez-Maldonado, Guanliang Chen, Xinyu Li, Yueqiao Jin, and Dragan Gašević. 2024c. Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology55, 1 (2024), 90–112.
Yan et al.(2024d) Lixiang Yan, Linxuan Zhao, Vanessa Echeverria, Yueqiao Jin, Riordan Alfredo, Xinyu Li, Dragan Gaševi’c, and Roberto Martinez-Maldonado. 2024d. VizChat: Enhancing Learning Analytics Dashboards with Contextualised Explanations Using Multimodal Generative AI Chatbots. InInternational Conference on Artificial Intelligence in Education.Springer, 180–193.
Yang et al.(2023) Kexin Bella Yang, Vanessa Echeverria, Zijing Lu, Hongyu Mao, Kenneth Holstein, Nikol Rummel, and Vincent Aleven. 2023. Pair-Up: Prototyping Human-AI Co-orchestration of Dynamic Transitions between Individual and Collaborative Learning in the Classroom. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems.1–17. https://doi.org/10.1145/3544548.3581398
Ye et al.(2024) Yilin Ye, Jianing Hao, Yihan Hou, Zhan Wang, Shishi Xiao, Yuyu Luo, and Wei Zeng. 2024. Generative ai for visualization: State of the art and future directions. Visual Informatics(2024).
Zanan and Aziz (2022) Muhammad Faris Basheer B.Mohd Zanan and Madihah Sheikh Abdul Aziz. 2022. A Review On The Visual Design Styles In Data Storytelling Based On User Preferences And Personality Differences. Proceedings of the 2022 IEEE 7th International Conference on Information Technology and Digital Applications, ICITDA 2022. https://doi.org/10.1109/ICITDA55840.2022.9971409
Zdanovic et al.(2022a) Dominyk Zdanovic, Tanja Julie Lembcke, and Toine Bogers. 2022a. The influence of data storytelling on the ability to recall. CHIIR 2022 - Proceedings of the 2022 Conference on Human Information Interaction and Retrieval. https://doi.org/10.1145/3498366.3505755
Zdanovic et al.(2022b) Dominyk Zdanovic, Tanja Julie Lembcke, and Toine Bogers. 2022b. The influence of data storytelling on the ability to recall information. InProceedings of the 2022 Conference on Human Information Interaction and Retrieval.67–77.
Zhang (2018) Yangjinbo Zhang. 2018. Converging data storytelling and visualisation. InEntertainment Computing–ICEC 2018: 17th IFIP TC 14 International Conference, Held at the 24th IFIP World Computer Congress, WCC 2018, Poznan, Poland, September 17–20, 2018, Proceedings 17,Vol. 11112 LNCS. Springer, 310–316. https://doi.org/10.1007/978-3-319-99426-0_36
Zhang and Lugmayr (2019) Yangjinbo Zhang and Artur Lugmayr. 2019. Designing a user-centered interactive data-storytelling framework. InProceedings of the 31st Australian Conference on Human-Computer-Interaction.428–432. https://doi.org/10.1145/3369457.3369507
Zhang et al.(2022) Yangjinbo Zhang, Mark Reynolds, Artur Lugmayr, Katarina Damjanov, and Ghulam Mubashar Hassan. 2022. A Visual Data Storytelling Framework. Informatics9 (2022). Issue 4. https://doi.org/10.3390/informatics9040073
Zhao et al.(2019) Zhenpeng Zhao, Rachael Marr, Jason Shaffer, and Niklas Elmqvist. 2019. Understanding partitioning and sequence in data-driven storytelling. InInformation in Contemporary Society: 14th International Conference, iConference 2019, Washington, DC, USA, March 31–April 3, 2019, Proceedings 14,Vol. 11420 LNCS. Springer, 327–338. https://doi.org/10.1007/978-3-030-15742-5_32
Zhi et al.(2019) Qiyu Zhi, Suwen Lin, Poorna Talkad Sukumar, and Ronald Metoyer. 2019. GameViews: Understanding and Supporting Data-Driven Sports Storytelling. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI ’19).Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300499
Zhu (2007) Ying Zhu. 2007. Measuring effective data visualization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)4842 LNCS. Issue PART 2. https://doi.org/10.1007/978-3-540-76856-2_64
Zohny et al.(2023) Hazem Zohny, John McMillan, and Mike King. 2023. Ethics of generative AI. ,79–80 pages.

Appendix AScaffolding Prompts and Questions

A.1.Scaffolding Prompts

Scaffolding Approach.Guide participants through understanding each visualisation step-by-step. For each visualisation, start by providing a one-sentence description and ask only one question at a time. Avoid asking repeated questions. Give feedback on participants’ responses. Once all questions for one visualisation are covered, direct participants to the next visualisation by asking them to click the right arrow. Only ask questions related to the visualisation that was sent to you. Ensure you cover all theScaffolding Questionsfor each visualisation, including 1 for the Bar Chart, 2 for the Communication Network, and 3 for the Ward Map. Use a friendly and conversational tone. If participants get something wrong, provide a hint and ask them to rethink and re-answer.

A.2.Visualisation Descriptions and Scaffolding Questions

Bar Chart

•

Description:It shows the proportion of time the two nurses spent on different tasks.
•
Scaffolding Question:
1. (1)
  
  What do you notice about Amy’s proportion of time spent on tasks, and what does this indicate about her prioritisation?

Communication Network

•

Description:It shows verbal communication interactions among each student and their communication with the doctor, relative, and patient manikins. Arrows indicate the direction of communication, and edge thickness indicates the duration of communication.
•
Scaffolding Questions:
1. (1)
  
  Who are the main communicators in the network, and what does the thickness of the arrows tell us?
2. (2)
  
  How often did the nurses communicate with each other compared to the doctor, patient, and relative?

Ward Map

•

Description:It shows the verbal and spatial distribution of each student during a simulation session. Saturated colours indicate frequent verbal communication, while the hexagons’ locations show spatial distributions. The peak heart rate of each student is also displayed on the Ward Map, represented by a heart shape.
•
Scaffolding Questions:
1. (1)
  
  Which areas of the ward did the nurses spend the most time in, and how does that relate to their task prioritisation?
2. (2)
  
  How can the colour intensity give us insights into verbal communication patterns?
3. (3)
  
  What does the peak heart rate tell us about the stress levels of the nurses in different areas of the ward?

Appendix BEvaluation Questions

Table 3.Evaluation Questions for the Pre-Intervention Phase

Visualisation	Bloom’s Level	Evaluation Question
Bar Chart	Knowledge	Q:Which behaviour did the two nurses spend theleasttime on? – Working together on other tasks – Working individually on other tasks – Working together on tasks for Amy – Working individually on tasks for Amy
Bar Chart	Comprehension	Q:How did the nurses spend their time working on tasks for Amy compared to other tasks? – Two nurses spent more time working individually on tasks for Amy than on other tasks. – Two nurses spent more time working together on other tasks than on tasks for Amy. – Two nurses spent the same amount of time working individually on other tasks and on tasks for Amy. – Two nurses spent the same amount of time working on other tasks and on tasks for Amy.
Social Network	Knowledge	Q:Who did the Primary Nurse 2 communicate with? – Secondary Nurse 1, Patient, and Doctor – Secondary Nurse 1, Patient, and Relative – Primary Nurse 1, Doctor, and Relative – Primary Nurse 1, Patient, and Relative
Social Network	Comprehension	Q:Which of the following statements is correct? – Primary Nurse 2 communicated more with the Relative than Primary Nurse 1 – Primary Nurse 2 communicated more with the Patient than Primary Nurse 1 – Primary Nurse 1 communicated more with the Relative than Primary Nurse 2 – Primary Nurse 1 communicated more with the Patient than Primary Nurse 2
Ward Map	Knowledge	Q:Which of the following statements best describes the behaviours of the two nurses? – The two nurses spent a significant amount of time but did not talk much around Bed 4. Primary Nurse 1’s heart rate peaked at 131 around Bed 4. – The two nurses spent a significant amount of time but did not talk much around Bed 3. Primary Nurse 2’s heart rate peaked at 95 around Bed 4. – The two nurses spent a significant amount of time and talked a lot around Bed 4. Primary Nurse 1’s heart rate peaked at 131 around Bed 4. – The two nurses spent a significant amount of time and talked a lot around Bed 3. Primary Nurse 2’s heart rate peaked at 95 around Bed 4.
Ward Map	Comprehension	Q:Which of the following statements is correct? – Primary Nurse 1’s spatial and verbal activities were more around Bed 1 than Primary Nurse 2’s. Primary Nurse 1 had a higher increase in heart rate than Primary Nurse 2. – Primary Nurse 1’s spatial and verbal activities were more around Bed 2 than Primary Nurse 2’s. Primary Nurse 1 had a higher increase in heart rate than Primary Nurse 2. – Primary Nurse 2’s spatial and verbal activities were more around Bed 1 than Primary Nurse 1’s. Primary Nurse 2 had a higher increase in heart rate than Primary Nurse 1. – Primary Nurse 2’s spatial and verbal activities were more around Bed 2 than Primary Nurse 1’s. Primary Nurse 2 had a higher increase in heart rate than Primary Nurse 1.