SCADS 2025: Help for the Overburdened Analyst
At the fourth Summer Conference on Applied Data Science, the intelligence community teamed up with academia and industry to sharpen recommender systems, improve summarization tools, and study user interactions.

Intelligence analysts face an overwhelming volume of data to assess. From video footage to audio recordings to text transcripts and other documents, even knowing where to start looking for what’s most important can be a daunting task.
“The amount of data we have compared to 20 years ago is immense,” says one intelligence analyst. “We often have 27 browser tabs that don’t interact with each other to make sense of a storyline we’re trying to process.”
An AI-powered tailored daily report tool can help with that task. This summer, a collaborative initiative at NC State’s Laboratory for Analytic Sciences (LAS) marked the fourth year of work towards solving a grand challenge proposed by the intelligence community: how can we generate tailored daily reports (TLDRs) for knowledge workers that capture information relevant to their objectives and interests?
The 2025 Summer Conference on Applied Data Science (SCADS) brought together 39 participants from academia, industry, and the government for eight weeks to address the persistent problem of “too much data, not enough time.” This year saw a convergence of past efforts into a single, end-to-end software framework called OpenTLDR. Participants made advancements in tailored recommender systems, automatic summarization and retrieval methods, and human-computer interactions.
LAS is a mission-driven collaborative research lab that focuses on integrating technology into intelligence analysis. SCADS is a core component of this mission, fostering a low-stakes environment where people can fail fast and learn what works to address intelligence community challenges. Now in its fourth year, SCADS continues to develop practical, actionable AI/ML solutions that directly improve analyst workflows and ensure usability in high-stakes environments where resources may be limited.
Participants this year came from 13 universities – Dartmouth College, NC State University, Penn State, Rochester Institute of Technology, Kenyon College, UNC-Chapel Hill, UNC Charlotte, University of Colorado – Boulder, University of Georgia, University of Texas at San Antonio, Vanderbilt University, Virginia Tech, and Worcester Polytechnic Institute – and four companies, including Dell, PUNCH Cyber, Rockfish Research, and RTX BBN. Their expertise in areas like linguistics, computer science, design, psychology, and information science brought diverse perspectives to the problems shared by analysts and software engineers from the National Security Agency.
OpenTLDR: The Central Framework

OpenTLDR, originally prototyped in the second year of SCADS by NC State alumnus Chris Argenta, a software engineer from Rockfish Research, has become the central end-to-end framework for tailored daily reports at SCADS. Its modular design facilitates rapid prototyping and testing of new ideas in summarization, recommendation, and human-computer interaction, and allows evaluation of how changing those modules affects the end-to-end system results.
“It’s the scaffolding that provides researchers with a shared terminology and ways to explore, add to, alter, and change the end-to-end TLDR workflow,” says Abigail Browning, a 2025 SCADS participant. Browning earned her Ph.D. at NC State in communication, rhetoric, and digital media and is now a senior user experience (UX) researcher at Rockfish Research. To further develop OpenTLDR, the SCADS cohort needed to expand its data sets for testing.

Progress on the Tailored Daily Report (TLDR)
- SCADS 2022: Starting with the notional goal of automatically generating a President’s Daily Brief for everyone, the team produced innovations in the Recommender->Summarizer->Human-Computer Interaction (HCI) structure and an early TLDR prototype.
- SCADS 2023: Embracing the disruptive technology of large language models (LLMs), the team accelerated progress in each research category, including state-of-the-art results in some cases, and began experimenting with retrieval augmented generation (RAG). Two new prototypes were generated.
- SCADS 2024: With the foundation in place and the tooling around LLMs matured, the team was empowered to generate new ideas and prototypes quickly.
- SCADS 2025: The team shifted its focus to converging capabilities into a single framework, OpenTLDR, for the purposes of research and development, demonstration, evaluation, testing, and potential use.
Practicing with Data – Real and Fake

In the 1950s, the NSA developed a training exercise for cryptanalysts about a fictional adversary, a country called Zendia. It’s a data set comprised of artificial intercepted radio messages that mirror the properties of real-world data. At SCADS, participants have been working to expand the Zendia data set with additional fake data, like AI-generated images and characters, for the purpose of training machine learning models and assessing prototypes.
Two years ago at SCADS, Will A., a software engineer, began building out the Zendia data set as a way to familiarize himself with the newest breakthrough in AI, ChatGPT. This year, he took it even further.
Synthetic data generation is its own field of study now, and SCADS participants with deep expertise in the area approached Will, ready to help. “We are making that data set shine,” he says. “We’re generating images, source documents, and synthetic phone conversations to go with the fictional events in this fictional universe. It’s a fully fleshed-out, extensive data set.”
Unified synthetic data is crucial for consistent UX testing and training purposes across an organization, but real-world data isn’t always so tidy. Take, for example, the Abbottabad data set. Nearly half a million files were pulled from phones, laptops, and other devices seized during the 2011 raid on Osama bin Laden’s compound in the city of Abbottabad, Pakistan. Those files were publicly released by the CIA in 2017, but the unstructured nature of different file formats, like images, videos, and text documents, makes them difficult to analyze.
“It’s messy, it’s big,” and it’s in more than one language, says Browning, the UX researcher at Rockfish Research. With limited time, the team decided to focus on uploading text summaries of video content from the Abbottabad data set into the OpenTLDR framework.
“We’re documenting our process so that it can be replicated by analysts on the high side,” she says, referring to the classified network used by the intelligence community. “A case study [like this] allows analysts to not have to start from scratch. It provides the OpenTLDR framework, code, and a data set that is closer to what they would typically use.”
Will A. says the Abbottabad data set makes a realistic proxy. “It gives researchers some empathy for analysts when the data is large, messy, and non-English,” he says. “Suddenly, everything gets more challenging.”
Cross-Sector Perspectives Help Bridge the Intelligence Gap

Swahili is spoken by two million people and in regions where major global events unfold, but it is considered a low-density language because there are comparatively few Swahili language resources that can be used for training an AI/ML model in translation, retrieval, and summarization. Viviane Ito, a third-year Ph.D. student at UNC School of Information and Library Science, worked with Hemanth Kandula, a senior research engineer at RTX BBN, and Judy K., a government science and technical analyst, to build a multi-source tailored daily report in Swahili.
“Because of my linguistics background, I’d worked with African languages before,” says Ito. “For a translation system to be successful, you have to have good data.”
Successful they were. Judy K. verified that the machine’s translations of telephone conversations in Swahili were correct, and Kandula built an interface where users can ask questions about the data in the conversations. Ito developed a system for measuring the team’s weekly progress.
“We have a map of our methods so future participants can think about how they would apply it to other languages,” says Ito.
After spending the first week of SCADS learning about the grand challenge and problem set, participants worked together in interdisciplinary teams focused on human-computer interaction, automatic summarization, and recommender systems. Participants from government, industry, and academia seamlessly collaborated across locations in Maryland and North Carolina this year, embracing online collaboration platforms like Slack, Miro, Google Meet, and GitHub. The value of that partnership outweighs the challenges of hybrid collaboration, participants say.
“We’re designing something that the government can use,” says Browning. “[Feedback from] government participants allows us to create what works well, so the likelihood of adoption is higher.”
SCADS 2025 has demonstrably advanced the goal of creating effective, tailored daily reports for intelligence analysts. In the eighth week, the teams shared their results with government stakeholders through presentations, Q&A sessions, and live demos. Many projects emphasized readiness for high-side deployment, including transferable documentation and solutions. Future SCADS participants should expect to build on OpenTLDR and continue to tackle issues like data messiness and the need for explainable AI.
A technical report about SCADS 2025 is forthcoming.
Author’s note: This article was written with assistance from Google NotebookLM, a research and note-taking tool that uses artificial intelligence (AI) to assist users in interacting with their documents.
- Categories: