Collaborators Build Reporting Tools for Intelligence Community at Third Annual SCADS
Academic, industry and government researchers spent the summer creating prototypes to help tailor daily intelligence reports.
Between the U.S. presidential race, a global tech outage, the Paris Olympics, and intensifying regional wars, the world witnessed historic geopolitical moments this summer. Events like these generate overwhelming amounts of information. Data analysts in the intelligence community must sift through it all to distill useful insights that governments rely on to make important decisions. The newest AI promises to help, but can we trust it?
Researchers put their heads together for eight weeks at NC State’s third annual Summer Conference on Applied Data Science (SCADS) to find out. The interdisciplinary group worked to solve a grand challenge: how can we generate tailored daily reports for knowledge workers that capture information relevant to their interests?
Hosted by the Laboratory for Analytic Sciences (LAS), SCADS prioritizes cross-sector collaboration. Forty-six participants from industry, academia and government convened to boost the intelligence community’s capability to analyze big data using AI. This year’s SCADS participants included 12 students and six faculty from 13 academic institutions, plus five industry professionals. Twenty-three government participants from the signals intelligence community also attended.
Don’t Skip the Recap
Any company or organization must produce a product. At intelligence agencies, the product is reports. SCADS aims to create a tailored daily report, shortened to TLDR – an interactive, customized, interface that helps analysts as they begin their day. It’s like the President’s Daily Brief, but for everyone.
“When you come into work Monday morning, what do you want your tailored daily report to say?” asks Riley D., a government data scientist. Many analysts are looking for a recap of their previous day’s work, suggested next steps, and summaries of potentially useful documents and datasets.
“AI is fundamentally reshaping the landscape of information interfaces,” says Lane Harrison, a professor in computer science and data science at Worcester Polytechnic Institute. “But people have to trust the AI’s recommendation. To trust AI, they have to understand AI.”
Analysts need the ability to understand why the AI recommended a particular document or article. That builds trust. They also need to be able to dig deeper into source material if a summary piques their interest, so citations and sources are important.
Building on last year’s SCADS efforts, participants focused on these four research areas at SCADS 2024.
- Summarization was one of the most active research areas, partly due to the rise of large language models (LLMs) and their application to summarization tasks. Efforts focused on advancing automatic summarization techniques and exploring various methods and models to enhance the quality and relevance of generated summaries. These projects addressed multiple dimensions of summarization, from optimizing chunk sizes to leveraging reinforcement learning and cross-lingual capabilities.
- Teams working on Recommender Systems focused on enhancing the effectiveness and scalability of recommender models to better align with user needs and improve accuracy. These projects explored various methodologies, from leveraging LLMs alongside their summarization colleagues to integrating simulated user behavior for more robust system development.
- Knowledge Representation projects focused on advancing techniques for embedding space alignment, out-of-domain detection, entity resolution, and multilingual data extraction. These projects aimed to enhance knowledge management and analytics, ensuring the systems developed are robust, scalable, and capable of handling diverse datasets and user needs.
- Researchers working in Human-Computer Interaction designed novel interfaces and studied analyst behavior to improve the effectiveness and usability of the TLDR. These projects aimed to align technology with the cognitive needs and workflows of analysts, ensuring that the systems developed are not only efficient but also user-friendly and trustworthy.
Pulling together insights and innovations from these domains, SCADS teams also developed and tested five end-to-end prototype TLDR systems.
For example, Riley’s team built and demoed a prototype called Electric Augery that combines state-of-the-art full-text search, personalized ranking, entity tagging, geotagging, and network visualization. The search interface looks a lot like one most people are familiar with – Netflix.
“We learned that you can’t disrupt an analyst’s workflow,” says Riley. “The interface has to be obvious.”
Collaboration and Learning Across Sectors
In addition to making technical progress on problems of strategic interest, SCADS organizers also want participants to learn new skills from each other and build relationships that last beyond SCADS.
“SCADS serves as a vehicle to enable collaboration that wouldn’t typically happen,” says Matthew Schmidt, principal investigator of LAS.
Attendees led lunch-and-learn sessions for each other on topics such as design critiques, using large language models, and working in Python. They also attended weekly writing workshops and twice-weekly bull sessions where anyone could workshop ideas and parts of their technical reports. When many expressed interest in exploring retrieval augmented generation (RAG), LAS staff quickly arranged for LAS collaborator Amazon Web Services to provide attendees with training on the use of RAG techniques.
This year’s SCADS cohort included graduate students, faculty members, industry professionals, and government attendees from the following universities and companies: Dell Technologies, Elemendar, North Carolina State University, Penn State University, Resilient Cognitive Solutions, Rochester Institute of Technology, Rockfish Research, RTX BBN, Smith College, University of Arizona, University of Arkansas at Little Rock, University of Georgia, University of North Carolina at Chapel Hill, University of Rhode Island, Vanderbilt University, Washington University in St. Louis, and Whitman College. Government participants hailed from New Zealand’s Government Communications Security Bureau (GCSB), the United Kingdom’s Government Communications Headquarters (GCHQ), and every directorate at the U.S. National Security Agency (NSA).
LAS also organized ways for the attendees to engage and work with experts who could not attend the full summer program. This included six week-long visitors from national labs and federally-funded research and development centers including Pacific Northwest National Laboratory, the Software Engineering Institute at Carnegie-Mellon University, and Sandia National Laboratory.
Conference organizers invited six outside experts who met with project teams in week four to provide critical feedback on projects. Additionally, three analysts-in-residence attended SCADS for a week each and provided an analyst’s viewpoint as the projects progressed. At least 12 research projects were influenced by participants’ engagement with these analysts-in-residence.
I loved being at SCADS!! It was an amazing experience and I walked away with lots of useful ideas and feeling excited to tell others at PNNL about them!
Michelle
Pacific Northwest National Laboratory
Data Science, Real Life
For students and faculty researchers, publishing papers and presenting results at academic conferences are common goals. Government and industry participants bring a different perspective to the grand challenge of SCADS.
“Don’t be afraid to try something new,” says Violet B., a data scientist with the intelligence community.
She says SCADS is an opportunity for government staff to dive into cutting-edge technology. After participating in SCADS last year, she helped write this year’s problem book. She was surprised to see academic and industry participants take parts of the grand challenge in unexpected directions.
For example, one team worked on a system that would give intelligence analysts concise recaps of conversations in foreign languages. The project was led by Hemanth Kandula, a research engineer with RTX BBN (formerly Raytheon).
“Conversation summarization is much harder than document summary,” Kandula says. “Documents are more structured. In a chat, people talk out of context and it’s not connected.”
This cross-lingual speech summarization would minimize time spent on the tedious task of looking through conversations for data. Analysts could customize their reports to bulleted lists, short paragraphs, or more lengthy summaries of conversation topics.
“As industry practitioners, our goal is building a deployable system,” Kandula says. “At SCADS, I can communicate directly with the end user and find out the problems they are facing. With that immediate feedback, I can build a system more curated to them.”
The Final Readout
Research at SCADS this summer emphasized the technology, the application space, and the people who will be using it. The conference ended with a final readout day where participants shared their results with stakeholders through 60-second lightning talks.
In the first session, they shared the TLDR-enabling technologies they developed, like Kandula’s conversation summarizer. In session two, they showcased many of these capabilities in action through live system demonstrations, like Riley’s Netflix-inspired searchable interface. After lunch, they returned for the final session on evaluation and user-focused projects, like De’s research on trusting what the AI recommends.
The final presentations aimed to align this emerging technology with the cognitive needs and workflows of analysts, ensuring that the systems we develop are efficient, user-friendly, and trustworthy.
After the lightning talks, participants held interactive poster sessions where they connected with stakeholders, shared their prototypes, and answered questions.
Participants gained a deeper understanding of the intelligence community’s needs and how innovative data science and technology can help.
“I think it goes underappreciated how much work the intelligence community does to keep the country safe,” says Harrison, the professor from Worcester Polytechnic Institute. “I’m coming away hopeful because there’s a lot of technology and conceptual ideas that could shape it for the better. SCADS is targeting that and knows what the needs are, and there’s cause for hope.”
LAS plans to publish a technical report with complete findings later this year. Applications for SCADS 2025 will open for industry and faculty participants in October and graduate students in December.
About LAS
The Laboratory for Analytic Sciences is a partnership between the intelligence community and North Carolina State University that develops innovative technology and tradecraft to help solve mission-relevant problems. Founded in 2013 by the National Security Agency and NC State, each year LAS brings together collaborators from three sectors – industry, academia and government – to conduct research that has a direct impact on national security.
- Categories: