Skip to main content

2024 Research Symposium

Each year, LAS undertakes a research program involving partners from a variety of academic, industry, and government communities. The outcomes of these research projects are of interest to our intelligence community stakeholders, as well as the respective communities of our academic and industry partners.

Projects

We invite you to learn about this year’s unclassified research projects. Projects are grouped into three themes: Content Triage, Human-Machine Teaming, and Operationalizing AI/ML.

Content Triage

Content triage projects demonstrate novel ways to address mission challenges around the ever-present need to process and exploit large data volumes, like video, images, audio and text.

Gedas Bertasius, Md Mohaiminul Islam (Department of Computer Science, University of North Carolina at Chapel Hill); Stephen W., Lori Wachter (LAS)

Most video captioning models are designed to process short video clips of a few seconds and output text describing low-level visual concepts (e.g., objects, scenes, atomic actions). However, most real-world videos last for minutes or hours and have a complex hierarchical structure spanning different temporal granularities. We propose Video ReCap, a recursive video captioning model that can process video inputs of dramatically different lengths (from 1 second to 2 hours) and output video captions at multiple hierarchy levels. The recursive video-language architecture exploits the synergy between different video hierarchies and can process hour-long videos efficiently. We utilize a curriculum learning training scheme to learn the hierarchical structure of videos, starting from clip-level captions describing atomic actions, then focusing on segment-level descriptions, and concluding with generating summaries for hour-long videos. Furthermore, we introduce Ego4D-HCap dataset by augmenting Ego4D with 8,267 manually collected long-range video summaries. Our recursive model can flexibly generate captions at different hierarchy levels while also being useful for other complex video understanding tasks, such as VideoQA on EgoSchema.

> Watch Video

Paul N., Nhat N., Jacob C. (LAS)

SANDGLOW investigates the application of multi-modal large language models (LLMs) in summarizing videos up to an hour long, leveraging both textual and visual information to effectively condense complex video content into concise summaries. Our approach enables users to quickly grasp the main ideas and key concepts presented in a video, thereby enhancing their understanding and comprehension. Furthermore, we introduce a novel technique that utilizes image summaries to create time-bounded boxes for search terms, allowing users to pinpoint specific moments within the video related to their queries. This innovation significantly improves the search functionality of our system, enabling users to quickly locate relevant content within a vast library of videos. To further enhance this technology, future research aims to incorporate Object Detection techniques, which will allow for more accurate identification and retrieval of videos of interest. This integration is expected to significantly improve the efficiency and effectiveness of video analysis.

> Blog Post: SANDGLOW

Stephen W., Brent Younce, Ethan DeBandi, Hayden White, Aadil Tajani (LAS)

The LAS EYERECKON project is developing an advanced unclassified video content processing pipeline to showcase the full capabilities of object-centric triage. Using unclassified video content from customers, LAS has created a demo that highlights how video content can be searched and prioritized within an analyst workflow. Key research areas include ML-assisted labeling, object detection, processing and storage of ML output, user search capabilities, and summarization of video data, such as geolocation-focused summaries. Unique triage capabilities enabled by state-of-the-art object detection algorithms allow for rapid identification and prioritization of critical objects, significantly enhancing the efficiency and accuracy of intelligence analysis. This project underscores the potential for innovation in video content analysis.

Sam Saltwick, Will Gleave, Kyle Rose, Julie Hong (Amazon Web Services (AWS)); Brent Younce, Ethan DeBandi, Stephen W., Lori Wachter (LAS)

Developing computer vision models for tasks like object detection and activity recognition typically require a significant amount of human-curated labeled data. This issue continues after model development and into production, as additional human labeling is required to keep models up to date with drifting data and changing modeling objectives. Advances in thoroughly pretrained, foundational models may reduce this data labeling burden, through few or zero-shot learning. These foundational models may be hundreds of millions or billions of parameters and are trained on broad data, making them ideal for transferring to various tasks, with the downside of significant training and inference costs.

In this work, we reduce the data labeling burden and the increased inference cost of foundational models through automatic knowledge distillation. This process utilizes foundational models as a teacher model, fine-tuned on a limited amount of domain-specific data to build a large weakly-supervised dataset for training student models. This process utilizes additional compute in data curation and model training, at the benefit of a more efficient model for inference. We find that this method is able to improve object detection performance by 33% over training on a small, fully hand-labeled dataset, while operating 20x faster than the foundational teacher model. We also seek to identify model performance gaps caused by data drift through image-level metrics. In response to data drift, we adapt models to new data domains and modeling objectives with weakly-supervised labels. This allows for continuous retraining of student models with relevant data and with near-zero human labeling.

> Blog Post: EYEGLASS

Sambit Bhattacharya, Catherine Spooner, Jesse Claiborne, Ashley Sutherland, Landon Wilkerson, Anita Amofah, Miriam Delgado, Keturah Johnson (Department of Math and Computer Science, Fayetteville State University); John Slankas, James S., Felicia ML. (LAS)

This project focused on developing an advanced artificial intelligence (AI) and machine learning (ML) model capable of detecting rare and physically small items of interest in image data. Detecting such objects presents a significant challenge due to their limited visual attributes and the scarcity of high-quality datasets required for training AI/ML models. This project has addressed these challenges, creating knowledge that be used for innovative applications in security and surveillance. One of the primary achievements of this project is the creation of synthetic datasets that effectively simulate the visual characteristics of small objects. By generating high-quality synthetic data, we have overcome the limitations posed by the lack of real-world training data. This approach has enabled us to train our AI/ML models more effectively, ensuring high accuracy in detecting small items. Additionally, we have integrated contextual information into our detection algorithms, allowing the AI to understand the environment in which these small objects are located. This contextual awareness significantly enhances the model’s ability to identify and track items that might otherwise be overlooked. Furthermore, our project has employed domain adaptation techniques to ensure that the AI/ML models perform well across different environments and conditions. This adaptability is crucial for real-world applications, where lighting, background, and object appearance can vary widely. The practical applications of our work in the realm of security and surveillance, are identification and tracking of physically small and rarely seen objects of interest in camera footage, such as weapons or suspicious small items in a crowd. The outcomes of this project include research articles which have been submitted or are in progress, comprehensive datasets, and prototype AI software that effectively solve critical problems related to the detection of rare and physically small items.

> Video: Small Object Detection Model Comparisons on CCTVs

> Read more: Advancing Research and Engagement with Minority Serving Institutions at the Laboratory for Analytic Sciences

Carlos Busso (Department of Electrical & Computer Engineering, The University of Texas at Dallas); Liz Richerson, Donita R. (LAS)

Emotion plays a key role in human-human interactions. Stressful scenarios are often characterized by emotional exchanges elicited as a response to fear, anxiety, and threats. It is imperative for the Intelligent Community (IC) to identify effective content triage strategies to direct the time and resources to evaluate only relevant videos. Tracking emotional traits during human interactions can be instrumental in determining which videos to triage. These algorithms should be able to identify localized segments with strong emotional content. We are particularly interested in speech, face, and text. The multimodal solutions should be resilient to missing information across the modalities (occluded faces, silent subjects, blurred frames). The models should also be flexible, providing lightweight alternatives that can run without powerful computers.

This project will build robust, efficient, and deployable multimodal solutions that continuously identify emotional behaviors during recordings. We aim to

  1. continuously infer emotional behaviors during long audiovisual recordings with robust machine learning models that are resilient to missing information across modalities,
  2. summarize the emotional content in a recording by highlighting emotional hotspots where the externalization of emotional behaviors is conveyed, and
  3. explore lightweight implementations of the proposed models that can run in real-time on edge devices.

> Video: Language-specific adaptation for cross-lingual speech emotion recognition

Robert Pless (Department of Computer Science, School of Engineering & Applied Science, George Washington University); Abby Stylianou (Department of Computer Science, School of Engineering and Science, St. Louis University); Nathan Jacobs (McKelvey School of Engineering, Washington University in St. Louis); Eric Xing, Pranavi Koluju, Manal Sultan

Investigative image search systems support analysts in finding relevant results in large databases of images. This project focused on improving machine learning models for text-guided image retrieval tasks, where an analyst uses natural language to express what they want to find in images in the database. The standard model used for this is the Contrastive Language-Image Pretraining (CLIP) model, which is trained to have aligned image and text representations, so that, for example, the text “a picture of a hotel room” and an actual image of a hotel room have similar features. Pre-trained CLIP models, however, struggle with complex, fine-grained queries that may occur in investigative tasks. To better understand the types of queries that existing CLIP models can support, we have developed an approach to predicting the zero-shot accuracy of a given CLIP model and have deployed this prediction capability in a publicly available tool called “Will It Zero Shot?”

We additionally have developed a novel training loss for fine-tuning CLIP to support more linguistically complex queries by including sentence structure information in the training process. This model achieves state-of-the-art performance on composed image retrieval baselines and is far superior as the complexity of an analyst’s query increases. To support the training and evaluation of these models on complex text queries, we developed an LLM-based approach to generate a large number of complex captions describing the salient differences between visually similar images.

Finally, we have deployed new text-guided image search capabilities into TraffickCam, a production investigative image search system deployed at the National Center for Missing and Exploited Children for investigations of human trafficking and child sexual abuse.

Agata Bogacki, Kirk Swilley, Logan Perry, Kristina Chong, Patrick Dougherty (SAS); Liz Richerson, Nicole F., Jacque J., Pauline M., Donita R., Edward S. (LAS)

The SAS team developed workflows enabling non-technical analysts to triage, sort, summarize, and curate specific technical details from a large corpus of unstructured, open-source technical information, such as patent applications, technical manuals, and research papers.

This project explored the use of various machine learning and visualization techniques to develop a framework that will allow non-technical analysts to ingest, analyze, and visualize new sources of unstructured technical data in a semi-automated manner. This project focuses on:

  • Determining what automated tasks the machine can perform for extraction and analysis of text and non-text information;
  • Understanding how metadata analysis, text mining, natural language processing techniques, and identification of non-textual elements can enable information discovery; and
  • Capturing best practices or lessons learned for supporting analysts through interactive dashboards and visualizations to consume and triage the information.

> Blog Post: Automated Workflows to Triage Technical Intelligence

Edward S., Nicole F., John Nolan, Liz Richerson, Pauline M., Jacque J., Donita R. (LAS)

Triage of technical intelligence information presents significant challenges due to the sheer volume, complexity, and diverse formats of source materials, such as technical reports, research papers, and diagrams. Analysts tasked with identifying relevant insights must navigate domain-specific lexicons, foreign-language content, and intricate technical details, all of which can slow the discovery of actionable intelligence. The need for efficient tools to aid in the extraction of critical information from these documents is paramount.

To address these challenges, we are developing a tool that leverages Large Vision Language Models (LVLMs) to extract relevant information directly from technical documents, with a specific focus on the relationship between text and associated images. The tool downloads documents—initially from Arxiv—extracts images, and applies LVLMs to identify and highlight relevant sections of text based on the content of the images. Analysts can review this extracted content, visually compare the text to images, annotate findings, and provide feedback, thereby enhancing the validation process. This tool enables a deeper, more nuanced interpretation of technical documents, helping analysts quickly focus on critical technical content.

By leveraging human feedback with AI-driven information extraction, we aim to answer key research questions regarding the feasibility of leveraging LVLMs in triaging technical intelligence, and the accuracy of LVLMs in identifying image-related text information on domain-specific graphics or diagrams. Additionally, this tool will enable analysts to explore potential LVLM-assisted tasks to incorporate into a technical intelligence triage workflow, and in the future help understand the impact of image-based information extraction on reducing the time required for technical content triage. Ultimately, the project aims to understand possible mechanisms to streamline the triage process, reduce analyst workload, and improve the efficiency and accuracy of technical intelligence analysis through AI-assisted workflows.

> Blog Post: Enhancing Technical Document Triage: Large Vision Language Models for Image-Text Information Extraction

Sean L., Aaron W., Beck Durling, Furkan Karabulut, John Nolan, Abe Tasissa, Leah Whaley (LAS)

An NC State Computer Science Senior Design team has partnered with the Laboratory for Analytic Sciences to build a custom scheduling system for cloud compute analytics. The scheduler offers system owners the capability to define their desired system load (i.e. amount of total cloud resources utilized over time), and generate an analytic schedule that should produce an actual load similar to that which is desired. Efficiency gains and system stability are the expected outcomes of this scheduling capability. A basic UI was developed, and initial results on a test cluster indicate that the scheduler works.

> Blog Post: LOADWOLF

James S., Felecia ML., Donita R., Skip S. (LAS); Prashanth BusiReddyGari, Hozaifa Owaisi, Shirsendu Mondal, Salma Mifdal, Reginald Oxendine, Mohammad Kawsar, Brian Torres, India Godwin, Jacob Dane Mattison (University of North Carolina at Pembroke)

The Laboratory for Analytic Sciences has partnered with senior design students from the University of North Carolina at Pembroke in 2024 to develop a system that uses AI/ML to visualize the integrity of the Internet. U.S. policymakers aim to enhance the security and resilience of both U.S. and global networks by promoting the widespread adoption of the Resource Public Key Infrastructure (RPKI) framework. To support this effort, there is a need for a system that can effectively capture and display the security status of the U.S. Internet. RPKI and Route Origin Authority (ROA) are key to securing global internet routing by allowing network operators to detect and prevent fraudulent or erroneous Border Gateway Protocol (BGP) route announcements, which can disrupt traffic flow and make networks unreachable. The success of these services relies on their adoption by all network operators.

> Blog Post: Internet Routing Integrity

> Learn more: Advancing Research and Engagement with Minority Serving Institutions at the Laboratory for Analytic Sciences

Blake Hartley, Mike Geide (PUNCH Cyber); Lori Wachter, Stephen W., Pauline M., Al J., Bo L. (LAS)

Recommender systems have seen a growing application in the field of cyber security. Over the past three years, PUNCH Cyber has been working to apply recommender system techniques to help analysts more efficiently study and retain SIGINT data. In 2024, PUNCH introduced CARD (Context Aware Recommendation of Data), a front-end interface that is capable of viewing recommended data and integrating user control and multiple fidelities. CARD presents individual recommendations to users and collects their explicit feedback to update a recommender pipeline. This front end will serve as the interface for a high-side recommender system that generates recommendations based on analyst interaction with cyber data.

Human-Machine Teaming

Human-machine teaming projects demonstrate ways to address mission challenges around the need to enhance the effectiveness of human analysts partnering with automated technology.

Robert Capra Jaime Arguello, Bogeum Choi, Jiaming Qu (School of Information and Library Science, University of North Carolina at Chapel Hill); Liz Richerson, Patti K., Sue Mi K., Christine Shahan Brugh, James S. (LAS)

Intelligence analysis of audio data involves a triage stage in which data is transcribed, interpreted, and annotated by analysts making operator comments (OCs). OCs are meant to help other analysts make sense of the conversation. In our 2023 work, we developed a taxonomy of OC types based on a study with 30 intelligence analysts and investigated challenges that analysts face when making/viewing OCs.

Our 2024 work focused on three main threads. First, we developed and refined a prototype system for making structured OCs based on the taxonomy of OC types. The prototype enables analysts to specify the type of OC being made and to complete information fields that are relevant to the OC type. We developed the prototype using an iterative design and development process that involved feedback from analysts who tested different versions.

Second, we conducted a study with intelligence analysts to compare our prototype OC tool against a baseline text-editor system. During the study, participants were asked to:

  1. make hypothetical OCs using transcripts from the LBJ White House Tapes using both the baseline and our prototype tool
  2. complete surveys after using each system
  3. complete a survey comparing both systems
  4. participate in a brief interview about their experiences using each system.

The results of this study will help inform the design of the taxonomy of OC types, the prototype tool, and training materials.

Finally, we conducted a series of tests to evaluate the ability of large-language models to assist analysts with making OCs. We found promising results using LLMs to

  1. identify locations in transcripts where OCs were needed,
  2. identify types of OCs to make, and
  3. propose information relevant to the OC.

This component of the project is designed to explore human-machine teaming approaches for making and verifying OCs.

> Blog Post: Structured Annotation of Audio Data

Matthew Peterson, Helen Armstrong, Rebecca Planchart, Kweku Baidoo, Ashley Anderson, Kayla Rondinelli (College of Design, North Carolina State University); Christine Shahan Brugh, Brent Younce, Patti K., Sue Mi K., Mike G., Tim S. (LAS)

As information is increasingly presented to intelligence analysts through large language model (LLM)-generated summaries, it is increasingly necessary to help those analysts gauge confidence and uncertainty in granular LLM outputs, and to calibrate their trust within human-machine teams. This design project envisions a Multiple Agent Validation System (MAVS), consisting of an analytic agent that produces summaries from copious intelligence traffic, and an evaluative agent that estimates types and levels of uncertainty in those summaries. There are presently no visual conventions for signaling confidence and uncertainty, especially in text, and thus new visualization strategies are necessary to design new intelligence analysis summary interfaces. To that end, eight viable uncertainty visualization strategies have been developed. These are implemented in a MAVS simulation interface for demonstration purposes.

This interface is effectively a proposal for a feature set for explainable artificial intelligence that should educate users on AI’s nature and improve their results when collaborating with AI. Additional speculative interface animations have been created that reveal how generative user interface implementation could enhance MAVS and related systems in intelligence workflows, extending project concerns further into the technological future. Project outcomes also include a basic intelligence-relevant framework for uncertainty in LLM outputs, differentiating meaning, reference, conjecture, credibility, and evidence uncertainty.

Helen Armstrong , Kweku Baidoo, Soumya Batra, Alexis Boone, Parinita Das, Aimsley McDaniel, Sophia Milligan, Olha Novikova, Vaishnavi Parni, Aashka Patel, Rebecca Planchart, Leah Tatu, Gabrielle Thorpe, Leighann Vinesett (College of Design, North Carolina State University); Lori Wachter, Jacque J., Pauline M., Tim S., Al J. (LAS)

Target Digital Network Analysts (TDNAs) are expected to understand as well as utilize parameters of the global communication landscape in order to help satisfy intelligence requirements. To perform effectively, TDNAs must learn how to apply that knowledge to a varied and numerous set of tools and analytics which have few unclassified proxies (inherently this also means TDNAs have the additional challenge of learning sophisticated tradecraft and techniques). Given the above, there is a need for a system that would aid analysts in the cognitively intensive, non-linear flow of learning and applying foundational knowledge to tools and tasks. Thus, the focus of the 2024 Spring Design Studio was to visualize the movement of data through the network and enable analysts to effectively understand how the data they are seeing fits into their own workflows, i.e. students visualized ways to integrate and embed foundational knowledge into knowledge workflows. In doing so, the design studio project challenged students to apply AI solutions that fit both the vast expanse of the knowledge domain as well as the granularity of the practical data tasks.

> Learn more: Students Design User Interfaces for Intelligence Analysts

Cara Widmer, Amy Summerville, Louis Marti (Kairos Research); Christine Shahan Brugh, Sue Mi K., Jacque J. (LAS)

Artificial agents have the potential to significantly enhance the efficiency of human analysts by processing large volumes of source material and making targeted recommendations about where the analyst can most likely find relevant information worthy of deeper exploration. However, this potential can only be realized if human users trust AI recommendations – analysts are unlikely to rely on recommendations made by an AI agent that they do not understand or trust. The current study explores the role of alignment between an AI recommendation system and human preferences for evaluating information relevance in fostering trust and improving performance in intelligence analysis tasks. We designed a simplified analysis task in which participants searched through synthetic emails from a fictional company to answer questions about events described in these emails.

Participants were presented with different subsets of emails from this corpus, based on distinct AI selection systems. We compared how the attributes used by each selection system (e.g. sender, subject keywords, date) aligned with participants’ own preferred selection criteria to assess the impact on trust and task performance, as well as ways to mitigate the effects of misalignment between AI and human preferences. Through this work, we aim to inform guidelines for development of AI recommendation systems that enable analysts to build calibrated trust and ultimately improve efficiency and accuracy in high-stakes decision-making environments like intelligence analysis.

> Video: Impact of Alignment on Trust in Recommendation Agents

Bill Rand, Xiaoxia Champon (Department of Business Management, Poole College of Management, North Carolina State University), Ana-Maria Staicu, Krishna Marripati (Department of Statistics, North Carolina State University); Christine Shahan Brugh, Tim S. (LAS)

A systematic understanding of user engagement with technology in analyst task-specific contexts is critical for enhancing the efficiency of human-machine collaboration. Our project seeks to explore different styles of interaction in human-machine contexts by analyzing time-dependent activity logs. Our first goal is to identify effective strategies for the use of existing tools based on individual experiences. Ultimately, we aim to optimize technology use and provide balanced support for analytic workflows. This work is driven by the Analyst Workflow Dataset, created by researchers at the University of Kentucky and North Carolina State University with funding from the Laboratory for Analytic Sciences. The dataset was generated by having a group of analysts work through a fictional analysis setting called “The Insider Threat Game.” The dataset’s workflow graphs capture user activity and previous attempts to predict user actions in this context may overlook human variability, with predictions of subsequent actions from neural networks often lacking transparency.

Our analysis focuses on two key approaches:

  • Interpretable Categorical Functional Analysis: We analyze temporal user activities to cluster behavior patterns and quantify time-dependent action types in terms of probability at each step. This provides an explanatory framework that can demonstrate individual decision processes, offering a potential path to assist in developing effective, personalized tools, as well as shedding light on potential training directions and strategic resource allocations based on analysts’ characteristics.
  • Monte Carlo Simulation: We employ model-free value estimation to identify optimal strategies for intelligence analysts by uncovering the most valuable state-action pairs, improving decision-making based on observed outcomes.

In future work, we plan to generalize these methods to a broader range of analyst tasks, enhancing the applicability and adaptability of our tools.

> Blog Post: Personalize User Workflow and Improve Technological Engagement

Christine Shahan Brugh, Eve Vazquez, James Peters, Taylor Stone (LAS)

Cognitive load, understood as the demand placed by a task on the working memory of a person, is an increasingly important determinant of the performance of human-machine systems. Grounded in the framework of inherent limits to human’s memory capacity, high cognitive load has been shown to be associated with errors in task performance. For example, an analyst overloaded with information or task complexity might rely on less sophisticated ways of understanding the problem, such as cognitive biases or shortcuts.

The primary goal of this effort is to identify and assess the validation of metrics for cognitive load, with a secondary goal of exploring how they might be implemented within an operational setting. To achieve our primary goal, we undertook a systematic review of the literature in which we cataloged relevant metrics for cognitive load, alongside characteristics of the metric such as type, user burden, and validation. In total, we reviewed 125 articles related to metrics of cognitive load, published in academic journals between 1998 and 2024. From these articles, we derived a set of 129 unique metrics for cognitive load, that came from 90 unique articles. For our secondary goal, we gathered details of each experimental task in which cognitive load metrics were applied and assessed. We examined the details of these tasks for potential relevance to analytic work in operational settings.

> Blog Post: Survey of Metrics for Cognitive Load in Intelligence Community Settings

Richard Lamb (Department of Physiology and Pharmacology, University of Georgia); Lori Wachter, Stephen S., Bo L., Christine Shahan Brugh, Kenneth Thompson (LAS)

Historically, the enhancement of analyst skills within the intelligence community has primarily concentrated on the establishment and improvement of new and existing data collection systems with little consideration of underlying cognition. The purpose of this study is to examine the roles of cognitive control and cognitive demand as a potential mechanism to tune automated information summaries for analysis and cognitive augmentation purposes. Technology tools, design choices, and the arrangement of information in an effort to reduce cognitive demand during information processing is at the core of human cognitive augmentation. This study uses a mixed block-event design with counter-balanced stimuli and presented them to 35 participants. Results illustrates that the selected models have adequate fit to the data. The relationship between cognitive demand and cognitive control within the context of text summary analysis is intricately linked to the hemodynamic responses observed in analysts and provides evidence of a leveraging point for adaptive technologies.

David Gotz, Wenyuan Wang (School of Information and Library Science, University of North Carolina at Chapel Hill); R. Jordan Crouser (Department of Computer Science, Smith College); Christine Shahan Brugh, Sui Mi K., Stephen S., Bo L.

Knowledge workers face a common dilemma: weighing the desire for more information against constraints of time and other limited resources. Striking a balance between comprehensive analysis and timely decision-making is crucial in nearly every domain: the IC, industry, academia, journalism, and everyday life. Analysts must determine when they have “seen enough” to establish a fact, identify a pattern, or make a decision. While knowing when to dig deeper versus when to stop is a critical need, we know relatively little about the underlying factors that influence information satiety.

In this project, we conducted an online user study to measure the influence of individual differences on information satiety within a controlled information task with the aim of generalizing our findings toward more complex information scenarios. In the online study, participants were initially presented with a very small number of points on a scatterplot. Users were allowed to request more points as needed in order to draw an estimate of the best-fit trendline for the data. Participants were financially incentivized to achieve two contradictory goals:

  • view fewer data points, and
  • draw more accurate trendlines.

Data with different levels of correlation were presented to control task difficulty. The results from the study confirm expectations that easier tasks could be achieved more accurately and with less data. The results also showed, as expected, that visualization literacy is a key indicator for accuracy: training or experience yields better task performance. Critically, however, visualization literacy was not predictive of when participants would decide to stop obtaining more data. The point of information satiety was instead significantly associated with two personality traits: agreeableness and openness. Participants with high ratings in these traits (based on a BFI-10 personality profile) tended to look at more data before deciding they had seen enough to complete a task.

> Blog Post: “Enough is Enough”: Exploring Information Satiety in Knowledge Workers

Liz Richerson, Sue Mi K., Stephen S., Tim S. (LAS)

Large amounts of information being encoded in knowledge graphs (KG), large language models (LLMs), and/or a combination of the two are poised to improve the text content triage and discovery process. Specifically, summarization and synthesis of text present opportunities for visual text analysis tools to incorporate novel capabilities afforded by KGs (e.g., recommendations, structured representations, etc.) and LLMs (e.g., disambiguation of language, dealing with unstructured and unfamiliar text, summarization, etc.). This work prototyped a visual text analysis tool (VisPile) that integrated KGs and LLMs into specific points in the document analysis and triage process. For example, users can request LLMs to extract entities, find connections, or summarize collections of text. Similarly, KGs can be used to discover relationships. We designed VisPile and studied the impact these features have on analysis. Our findings reveal design guidelines for how LLMs and KGs can be integrated into future visual analytic tools. Additionally, our user studies indicate the effectiveness of specific operations during analysis, as well as potential challenges when incorporating LLMs and KGs.

> Video: Incorporating Knowledge Graphs and Large Language Models into Visual Text Analysis Tools

Srijan Sengupta, Karl Pazdernik, Matthew Singer (Department of Statistics, North Carolina State University); Liz Richerson, Stephen S. (LAS)

A knowledge graph (KG) is a graph-based representation of information about various entities (such as individuals, organizations, locations, or concepts) and their relationships. The transformation of unstructured language data into a structured KG facilitates downstream tasks like information retrieval, association mining, question answering, and machine reasoning. A celebrated example is the Google Knowledge Graph of 800 billion facts on 8 billion entities, from which Google serves relevant information in an infobox beside its search results.

KG construction involves three steps:

  1. Named Entity Recognition (NER), i.e., identifying the mention of named entities
  2. Named Entity Disambiguation (NED), i.e., determining the identity of a named entity from context (e.g., “Jordan” could mean the country, the basketball player, or someone else)
  3. Relationship Extraction (RE), i.e., determining the relationships among these entities.

These steps typically employ black-box machine learning tools that do not produce statistically interpretable uncertainty quantification metrics, making it impossible to objectively assess how reliable the KG is. The goal of this project is to develop statistically principled uncertainty quantification techniques for KGs by modeling how uncertainty propagates through the three steps. This will be accomplished by leveraging the proven statistical technique of conformal prediction.

> Blog Post: Conformal prediction for knowledge graphs

Lillian Thistlethwaite, Catherine Chapman (Parenthetic); Jascha Swisher, Stephen S., Jacqueline C., Victor C. (LAS)

Identifying events and the key attributes that characterize those events can be a useful tool in intelligence analysis, as it supports an analyst’s ability to arrive at higher-level patterns at a more rapid pace. Many event extraction models use closed-domain event ontologies that often do not model event types of most interest to intelligence analysts, who often look for patterns in documents of a niche domain. In 2023, Parenthetic worked with LAS to build a domain-specific pipeline that creates training datasets for the event extraction task, for specific event types of interest using analyst-provided hand-labeled “seed” sentences and data augmentation. The work helps automatically build high-quality, custom training datasets for the event extraction task. In our work, we found the best data augmentation solution utilized large language models (LLMs) for the generation of robust training datasets for the event extraction task. While a proof of concept was established, remotely served LLMs (e.g., chatGPT) are not as cost-effective and cannot support air-gapped environments. As a result, in 2024, we sought to refine our pipeline by targeting three key improvements:

  1. improve latency for inference
  2. replace ChatGPT with local (non-proprietary) LLMs
  3. develop downstream analytics from event data.

> Blog Post: Findings from the 2024 Data Augmentation for Event Extraction (DAFEE) Project

Operationalizing AI/ML

These projects demonstrate novel ways to address mission challenges around the ever-present need for machine learning (ML) and artificial intelligence (AI) to be useful even when working under operational environment constraints — whether they be financial, time, or cognitive resources.

Matt Groh, Negar Kamali, Aakriti Kumar, Karyn Nakamura, Angelos Chatzimparmpas, Jessica Hullman (Kellogg School of Management, Northwestern University); Jacque J., Aaron W., Lori Wachter, Candice G. (LAS)

Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by the possibility of photorealistic image generation by AI, we conducted a large-scale experiment to evaluate the photorealism – as measured by human detection accuracy – of 450 Diffusion model-generated images and 150 real images. We analyzed 731,513 observations and 36,465 comments from 49,827 participants and found scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images play a significant role in how accurately people distinguish real from AI-generated images. Based on these results, we propose a taxonomy for characterizing cues that distinguish between real and diffusion model-generated images organized along the following five dimensions: anatomical implausibilities, stylistic artifacts, functional implausibilities, violations of physics, and sociocultural implausibilities.

> Blog Post: 5 Telltale Signs That a Photo Is AI-generated

Jacque J., Aaron W., Candice G., Hilary F., Heather B., Aziza J. (LAS)

What tools and resources can we provide intelligence analysts in order to strengthen their ability to recognize Generative AI and manipulated multimedia? Additionally, how can we bolster the application of analytic integrity and standards (AIS) throughout the analytic workflow given the evolving implications of Generative AI’s impacts on intelligence analysis? These are some of the foundational questions driving LAS’ work with its partners in the Operationalizing AI and Machine Learning space in 2024. In seeking to answer them LAS has partnered with Dr. Matt Groh of Northwestern University to provide analysts with a how-to guide aimed at improving their ability to detect potentially manipulated and AI generated images. Crucial to that partnership has been LAS’ engagement with Candice Gerstner, Co-Chair of ODNI’s Multimedia Authentication Steering Committee, who has helped steer the development of the guide so that it aligns with existing training offered across the intelligence community. LAS has also worked in vital partnership with the Senior Intelligence Analysis Authority and the Election Security Group to explore ways to integrate the research as well as AIS fundamentals into training and tradecraft resources for analysts.

> Blog Post: Generative AI and Intelligence Analysis

Tim Menzies, Lohith Senthilkumar (Department of Computer Science, North Carolina State University)

We seek methods to lever the knowledge of an LLM such that decision-making is faster/better. To that end, we explore the use of LLM as a “warm start” tool for sequential model optimization. Our results show that, for 20+ problem domains, this approach dramatically out-performs state-of-the-art incremental learners.

> Blog Post: Next-Gen Prompt Engineering: Provably Better

Aaron W., Jaylan Argo, Claudine Bunao, Aidan Nelson, Molly Owens, Sean Hinton, Sean L., Stephen W. (LAS)

An NC State Computer Science Senior Design team has partnered with the Laboratory for Analytic Sciences to build a prototype AI artifact management system that loads and stores the AI metadata that stakeholders require to adjudicate an AI model or dataset. The AI artifact management system will allow users to submit a new AI model or dataset for adjudication and will allow AI review members to track, manage, and track the AI adjudication process. With search and filter functionality, the created system will also allow all users to search the system to discover the results of previous AI models and dataset review decisions. The system will increase efficiency within the AI management process and encourage collaboration across AI experts.

> Blog Post: AI Artifact Management System

Troy West, Ethan D. (LAS)

LAS is investigating methods for consistently storing or deploying machine learning (ML) models for testing and evaluation. LAS built on last year’s successful Model Deployment Service (MDS) prototype and has continued innovating in this space through deploying and testing a more user-friendly user interface during SCADS. This allowed a wider range of users to interact and test the system, providing the ability to upload new models and start/stop models available in the MDS cluster. Through these interactions, we were able to discover new best practices for converting models for API deployment and how users successfully interact with models via API. In addition to our UI efforts, we’ve increased system stability and experimented with providing access to LLMs through MDS deployments. This ML infrastructure supports broader LAS goals by making models available to a broader audience through greater ease of access and allows LAS projects to utilize resource-intensive models in scalable infrastructure. Finally, experience in this space generates insight to assist LAS in advising larger ML infrastructure efforts. There will be a simple prototype demonstration at the symposium.

> Blog Post: Maturing the LAS Model Deployment Service

Chris Argenta (Rockfish Research); Liz Richerson, Aaron W., Troy W., Bo L. (LAS)

OpenTLDR aids knowledge workers by automatically tailoring vast information down to a concise report – optimizing each recipient’s attention to the most relevant information for them. With this modular framework, TLDR researchers can rapidly explore, prototype, and evaluate their own end-to-end analytic implementations and user interface concepts.

The acronym “TLDR” is a playfully ambiguous combination of the well-known “Too Long; Didn’t Read” (TL;DR) and the concept of a “Tailored Daily Report.” It denotes a process of analyzing, filtering, prioritizing, and reducing information content based on what each user will find most relevant. Knowledge workers can use a TLDR to efficiently recognize useful information amidst a tornado of less relevant content.

OpenTLDR implements a complete TLDR system as a modular pipeline of AI/ML-enabled capabilities, including: initialize reference knowledge, ingest active content, connect information into a knowledge graph, recommend relevant content for each user’s requests, generate a tailored abstractive summary of content, and produce a TLDR report.

The OpenTLDR “playground” empowers researchers to easily replace working modular notebooks with their own solutions and automatically evaluate the effect of their change on the end-to-end results. While OpenTLDR’s user interface package offers a live web-based server that tailors real-world news relevant to actual users’ requests. During SCADS 2024, OpenTLDR enabled rapid prototyping and demonstration of new analytic capabilities and user interface concepts. In this demonstration, we provide an overview of the OpenTLDR framework and some of the capabilities that have been built on top of it.

> Video: OpenTLDR

James S., Felecia ML., Donita R. (LAS)

The Laboratory for Analytic Sciences (LAS) has been actively collaborating with Minority-Serving Institutions in North Carolina to enhance research in STEM and foreign languages. These partnerships, initially established through the Advancing Research, Innovation Solutions through Engagement (ARISE) Cooperative Research and Development Agreement, provide unique benefits by harnessing diverse perspectives and experiences, thereby fostering creativity and driving innovative solutions. Through these collaborations, the LAS has engaged with subject matter experts and students on real-world research projects that will have a mission impact. Additionally, the LAS has been instrumental in offering mentorship, training, and professional development opportunities to communities historically underrepresented. These initiatives not only strengthen the relationships but also contribute to building a broader talent pipeline of students pursuing careers in STEM and foreign language fields.

> Blog Post: Advancing Research and Engagement with Minority Serving Institutions at LAS

Prashanth BusiReddyGari, Harry Lamchhine, Fahim Tanzi, Raj Shinde (University of North Carolina at Pembroke); James S., Felecia ML., Donita R. (LAS)

The Laboratory for Analytic Sciences collaborated with senior design students from the University of North Carolina at Pembroke in Spring 2024 to develop a proof-of-concept website that generates synthetic cyber knowledge graphs using AI/ML techniques based on realistic cyber campaigns. As data in cyberspace continues to grow, effectively triaging relevant information against the 5 Vs of Big Data (volume, value, variety, velocity, and veracity) becomes increasingly challenging, especially with limited or sensitive datasets. Knowledge graphs (KGs) offer a powerful solution by organizing, understanding, and visualizing large datasets, capturing entities, relationships, and context. When combined with synthetic data generation—artificial data that mirrors real datasets—synthetic KGs can address privacy concerns, expand datasets, and enhance AI/ML capabilities for cybersecurity. By generating synthetic KGs in STIX 2.1 format, researchers can simulate realistic cyber scenarios and build algorithms that extract valuable insights without compromising sensitive cyber information.

> Blog Post: SAKURA: Synthetic Cyber Knowledge Graph

James S., Felecia ML., Jasmin A. (LAS); James Brown, Cameron Kelly, Marc Brown, Matthew Carby, David Nnanna, Isaiah Johnson (Shaw University)

The Laboratory for Analytic Science has partnered with Shaw University’s Fall 2024 senior design students to explore the use of WebGPU for near real-time processing of AI/ML models on edge devices. Running machine learning algorithms in near real-time on edge devices, such as laptops, tablets, and mobile phones, faces challenges due to limited processing power for tasks like image recognition and natural language processing. Leveraging the parallel processing capabilities of graphical processing units (GPUs) can help, as they enable faster, high-throughput operations and low-latency predictions. Traditionally, AI/ML models are processed on cloud-based systems before deployment, but a new technology called WebGPU, introduced in 2021, allows modern browsers to access a device’s GPU using standard web tools like HTML, CSS, and JavaScript. This approach could expand AI/ML capabilities to a broader range of devices, raising the research question: “Does WebGPU allow for efficient near real-time processing of AI/ML models on edge devices?”

> Blog Post: Exploring WebGPU’s Potential for Near Real-Time AI on Edge Devices

Majiroghene Evhi-Eyeghre, Simran Saini, Anita Amofah, Miriam Julia Delgado, Joshua Lockart, Vedh Srivatsa, Jonathankeith Murchison, Jeremy Graves, Garret Davis, Paul Rodriguez, Dr. Sambit Bhattacharya (Fayetteville State University); James S., Felecia ML. (LAS)

This project focuses on developing near real-time AI capabilities at the edge to enhance crowd behavior analysis in high-stakes environments. By deploying AI on edge devices like Raspberry Pi 5, NVIDIA Jetson Nano, and Lambda TensorBook, the project addresses challenges related to real-time responsiveness, cost-efficiency, and resource optimization. The system processes video data locally, reducing latency and bandwidth by retaining processing on-site. Using a hybrid AI model (YOLOv5 for object detection, DeepSORT for tracking, and POCO for pose estimation), it distinguishes normal from abnormal crowd behaviors. A custom chaotic video database, generated using synthetic data, supplements real-world datasets to improve the accuracy and robustness of behavior detection models. A key focus is on AI model quantization, a technique that improves efficiency on edge devices with limited computational power. Quantization reduces model weight precision (e.g., FP32 to INT8), decreasing memory usage and accelerating inference without sacrificing real-time decision-making capabilities. Success is measured through:

  • Inference Time: Quantized models must show faster processing compared to non-quantized counterparts.
  • Model Accuracy: Minimal accuracy degradation is targeted while maintaining effectiveness in detecting abnormal behavior.
  • Resource Utilization: Quantized models should use less CPU, GPU, and memory.
  • Power Consumption: Reducing power use on edge devices, crucial for field deployments, is another metric.

This quantization enhances model performance while maintaining a balance between efficiency and accuracy for real-time edge applications.

Sean L., Patti K., Edward S., Kanv Khare (LAS)

The LAS developed a capability of improving speech-to-text (STT) results using large language models. With the general goal of improving a given line of STT results, five situational cases are considered, defined by the amount of information that is made available to the models. In the extreme case of no additional data, improvement may only be made due the advanced capabilities of LLMs as compared to the language models integrated into STT algorithms. In other cases, various forms of supplementary data are provided to augment the audio data (e.g. literal context, speaker background, ground truth translations, metadata, etc). Results to date are quite interesting, with some cases showing strong improvement via the methods developed, and others remaining a work in progress.

> Blog Post: ECHOLAB: Leveraging LLMs to Boost Speech-to-Text Accuracy in Challenging Audio Environments

Sean L., Patti K., Skip S., Stephen S., Rachel Londe, Jimmy Do, Lennox Latta, Tony Youssef, Mohamed Tawous (LAS)

An NC State Computer Science Senior Design team has partnered with the Laboratory for Analytic Sciences to develop a software library that can capture fine-grained user-monitoring information (e.g. mouse movements, observed data, timestamps, etc) related to a user’s interactions with audio/speech-to-text data. While focusing on instrumenting a specific taxonomy of actions users perform on speech-to-text data was the primary goal, the software library developed can fairly easily be adapted for most web-based applications. Instrumentation data of the nature captured by this library is useful for a host of purposes: observing and understanding expert tradecraft; identifying pain points; informing business intelligence decisions; and developing training corpora to enable the infusion of various AI capabilities into workflows.

> Blog Post: Instrumenting STT

Kyle Rose, Will Gleave, Julie Hong (Amazon Web Services (AWS)); Jacqueline S.G., Stacie K., Mike G., Edward S., Aaron W., Jacqueline C., Matt Schmidt, Brent Younce, Kelli C., Peter M. (LAS)

Report tearlines provide the content of an intelligence report at a lower classification level or with less restrictive dissemination controls. They are a powerful mechanism for enabling wider dissemination of intelligence information. However, writing tearlines is costly in analyst time and effort and slows an organization’s production of valuable intelligence. In this work we seek to ease the cognitive burden on analysts and accelerate the intelligence reporting workflow by investigating the use of Large Language Models (LLMs) for automated tearline generation. The generation of a tearline report by an LLM is guided by a set of rules that ensure the content is at the appropriate classification level and does not reveal sensitive sources or methods. While many of these rules are explicitly defined in reporting guidance and secure classification guides, some are implicit, often learned from experience, and unique to each office. Here we develop an LLM-based pipeline to codify these unwritten rules using report-tearline document pairs. Both analysts and LLMs can use these rules to drive the production of higher-quality tearlines. To assess tearline generation quality we develop a custom LLM-based evaluation suite for this task, which includes assessments of rule-following ability, similarity to ground truth, and overall summary quality. Using a proxy dataset containing lay summaries of technical papers, we compare generation quality using various foundation models, model parameters and prompting techniques. We find that prompting a foundation model leads to a 48% average improvement in overall generation quality metrics compared to training a seq2seq model. We also demonstrate that foundation models can follow rules to generate proxy tearlines, correcting 78% of the source documents that contained a violation of a specific rule on average. To build analyst trust, we develop explainability techniques such as providing in-line citations of the rules used to write the tearline.

> Blog Post: Automated Tearline Generation and Evaluation with Large Language Models

Tim Menzies (Department of Computer Science, North Carolina State University), John Slankas, Mike G., Pauline M., Paul N., Aaron W. (LAS)

In 2024, we investigated trends in the application of large language models (LLMs) and how these trends might impact the work of intelligence analysis. We consider the evolution of ‘chain of thought’ reasoning in prompt generation strategies, retrieval augment generation, the evaluation of complex AI systems, and agentic frameworks that decompose workflows into achievable tasks, orchestrate the execution of the tasks, and evaluate and provide feedback in order to improve results. LLMs have been shown to perform better when prompted to think step by step, or to decompose complex tasks into simpler steps as a linear process, a decision tree, or a graph of actions to be traversed. In addition, recent research suggests that the composition of a prompt to supply as much context as possible helps ground inference results and minimize hallucinations and other errors. LLMs are being used in systems of agents where different models may be used for different tasks or call external tools to help perform them. Roles in agentic systems include planning, tool use, reflection, and multi-agent collaboration. The results of our experiments led to insights that can be used for determining appropriate analytic workflows, creating working aids for unique tasks, performing information retrieval and synthesis, and integrating audio, image, and textual information in sense-making processes.

> Blog Post: Large Language Models for Intelligence Analysis