The Laboratory for Analytic Sciences (LAS) is an academic-industry government partnership focused on the advancement of analysis through the development of new methods, techniques, and tools. LAS is designed to facilitate projects organized around specific challenges and applications of interest to the Intelligence Community (IC). LAS participants work together, in the context of mission-relevant examples, to develop broadly applicable technology and tradecraft that can be applied with operational impact.
All LAS projects are required to be mission-relevant, which means that they consider questions of interest to the IC. LAS projects are also selected to encourage collaboration among academia, industry, and government, which allows us to apply the latest research to IC problems and rapidly prototype solutions. LAS prototyping spans a large spectrum, from writing a new research paper, to creating a new analytic technique or tradecraft, developing a specific analytic, or creating a research prototype. Almost all of the work done at LAS is unclassified to maximize opportunities for collaboration and leverage diverse expertise. Deliverables can include such items as documented workflows, storyboards, training aids, implemented technological components, or initially prototyped systems.
For 2019, LAS will be supporting work relating to four research challenges, described in Section 2, and a variety of mission-relevant applications, described in Section 3. For a high-level description of current IC priorities, see the National Intelligence Strategy of the United State of America 2014.
One way that work is organized at LAS is around challenges that affect the development and application of analytic tradecraft and technology across most, if not all, application domains. This section describes four key analytic research challenges and provides examples of research topics that we think could drive understanding and development of analytic solutions. LAS prioritizes projects that address some aspect of the challenges described in this section.
2.1 Human-Machine Collaboration
Analysts traditionally view their computers as tools; something to be used by the analyst to accomplish their objectives. In this interaction model, the computational resources available to an analyst are vast, but the cognitive demands on an analyst needed to utilize those resources can be equally vast. Analysts will often spend more time determining which computational tool to use, or what the most effective way to use it would be, than the computer takes to actually run the computation. These tasks are in addition to the even more important cognitive tasks of interpreting and synthesizing the results of the computation.
The challenge of human-machine collaboration is to enable a computer system to act as a true collaborator in the analysis process, as opposed to a simple tool. A collaborative computer would be able to decide, setup, and run analytics without requiring a human analyst to interrupt their own workflows to either initiate the computation or to integrate the results. Additionally, the analytics chosen by a collaborative computer could include those that the analyst never would or could have chosen to perform themselves.
While not an exhaustive list, LAS would be interested in work that could help address some of the following open questions within the challenge of human-machine collaboration.
- What are the right tasks for the machine to take on? Are there systems that can help us observe workflows, classify workflows, and identify points in those workflows that are appropriate for technological interventions? How can we make the machine more proactive?
- How should the machine represent the workflow of the human?
- Do collaboration methods between humans translate to collaboration between machines and humans?
- How does the human-machine interface support human-machine collaboration? How does the machine display the results of its computation? How much explicit direction should the human be giving the machine?
- What are workflow features that are indicators of successful workflows and sub-workflows? Can we recommend better workflows for the user? What is the right “nudge” to give a user to improve their workflow?
- Can we better understand the “science of personalization”? How can analysts receive computational assistance tailored for them and their tasks in real time? Can we easily develop machine learning models that are tailored to an analyst’s task? Can we better design technologies to help analysts by tailoring them to an analyst?
2.2 Integrity for Analytic Methods
Analytic methods used to infer relevant information from underlying datasets can be subject to a variety of complications that may negatively affect the accuracy and trustworthiness of the resulting product. Human expertise and mathematical models are both susceptible to biases. Corrupt or incomplete data can be used to give an inaccurate picture of a situation. A lack of context about a dataset or analytic product may result in that information being interpreted and used incorrectly.
The challenge of integrity in analytic methods is to enable analytic tradecraft that provides high-confidence in analytic products in the face of these potential complications. This is not a new challenge for analysis, but it has drawn increased interest as more automated methods, enabled by advancements in artificial intelligence and machine learning, are integrated into analysts’ tradecraft.
While not an exhaustive list, LAS would be interested in work that could help address some of the following open questions within the challenge of integrity for analytic methods:
- How do we ensure that the mathematical models learned through machine learning techniques are accurate and up to date? What about the mental models of subject matter experts? How do we build trust in the output of these models?
- What sort of knowledge infrastructure is needed to scale the development and training of accurate and relevant models? How can this infrastructure be used to evaluate the performance of these models?
- How can we provide privacy protections while enabling the large-scale application of machine learning techniques? How do we ensure that models are constructed and used in such a way that civil liberties are not violated?
- How can we protect and validate the training, evaluation, and deployment of machine learning models to ensure that adversaries are not able to manipulate or defeat them?
- What tools, techniques or training could be effective at integrating critical thinking and analytic rigor into an analyst’s tradecraft? How can we measure the effectiveness of analytic methods and tradecraft at ensuring or encouraging analytic integrity?
- Are there ways to provide clear, accurate assessments of the confidence of the output of both manual and automated analytic methods? Are there better ways to communicate confidence than probabilities and percentages?
- Can we help analysts communicate and understand the provenance of the data underlying their analytic method and any processing or inference that has been applied to the data? Can we characterize the sensitivity of analytic methods to errors or gaps in the underlying data?
While these are broad research interests, new and continuing proposals selected in this area will coordinate with existing LAS activities in support of the Integrity for Analytic Methods objectives. For context, the following are examples of activities where LAS has pursued these objectives in the past:
- Development and evaluation of tools to support structured analytic tradecraft;
- Focused observational experiments regarding changes to analytic tradecraft and procedures;
- Tools for labeling data, validating models, and supporting active learning for machine learning practitioners;
- Recommenders for analytic workflows, and domain-specific language with associated
tools to describe analytic workflows
2.3 Forecasting and Anticipation
One of the primary objectives of intelligence analysis is to enable decision-makers to avoid tactical and strategic surprises. Despite this need, current processes for analyzing threats and opportunities in future scenarios are not well-supported by either technology or training. Though it has been demonstrated that some analysts are more effective than others at these future-oriented tasks, there exists a general lack of understanding as to why. The challenge of forecasting and anticipation is to improve analysts’ abilities to both anticipate future scenarios and forecast their likelihood. Anticipating future scenarios involves analysts generating ideas about the conditions under which events occur, identifying their 2nd/3rd/n-th
order consequences, and developing explicit potential alternatives to a given scenario. Forecasting involves assessing the relative likelihood of alternative scenarios and outcomes, understanding and characterizing the conditional likelihood of a scenario and what events the scenario may be conditional on, and communicating the forecasted information to decision-makers.
While not an exhaustive list, LAS would be interested in work that could help address some of
the following open questions within the challenges of forecasting and anticipation:
- How can we train and develop analysts’ skills to think about the future? What are the underlying cognitive mechanisms used to think about the future? How can decision aids support creating mental models of the future?
- Can we develop tools and methodologies to support future-oriented analysis? How should data and analytic results be integrated into these tools and methodologies? What automation and analytics can enable future-oriented analysis?
- How can we measure an individual’s ability to think about and anticipate future scenarios? How do we measure the success of future-oriented tools and methodologies?
- How can we use available data to generate timely forecasts for well-defined future events and their characteristics (i.e. who, what, when, where, and how)? How should we update these forecasts based on either newly discovered or streaming data?
- How should we communicate the output of future-oriented analyses to decision-makers and other analysts in a simple and streamlined fashion? How can we communicate uncertainty in these forecasts? How can we distinguish what is known and not known?
2.4 Analytics, Artificial Intelligence, and Machine Learning
LAS is interested in developing and deploying new methods, techniques, and tools for analyzing data in the context of mission-relevant applications. Of particular interest are questions that arise in the context of “4 Vs of big data”: volume, velocity, variety, and veracity. While LAS may fund a limited amount of basic methodological development, our primary interest is in the creation of methods, techniques, and tools to address specific applications (Section 3). Methodological areas of interest include but are not limited to:
- Triage of an unstructured information corpus, to include topic identification, document clustering, and automatic summarization. How can analysts understand increasingly large sets of data and rapidly triage the data to find the information they need?
- Sensemaking is the collection and organization of information for deeper understanding to facilitate insight and subsequent action. How can technology support the hypothesize-test-evaluate discovery cycle? What methods allow for goal and intent recognition? Can multimedia narratives be automatically generated from corpuses of data, to include unstructured text?
- Inference and uncertainty quantification for heterogeneous information. How do we combine heterogeneous data to characterize behavior and understand the uncertainties of our inferences? How do we assess the utility of open source data from the internet, press, television, video, photos, and social media?
- Visualization. What are the most effective visualization strategies for particular analytic tasks? What individual user characteristics make visualizations effective for particular analysts? Is there a correlation between task complexity and the utility of personalized visualizations?
- Anomaly detection. What methods are effective for anomaly detection in spatio-temporal and network data?
Most projects at LAS can also be organized around application domains. These are meant to be used as examples or experiments to motivate, demonstrate, or test the possible mission benefit of analytic methods, tools, and tradecraft. The potential application areas for 2019 have been chosen, in part, to facilitate project-level engagement with analysts at LAS. This section describes these application areas in more detail and provides examples of specific applied research questions of interest to LAS stakeholders. LAS prioritizes projects that address one of the analytic challenges in Section 2 within the context of one of the application areas in Section 3.
3.1 Cybersecurity and Insider Threat
Great progress has been achieved in the areas of detecting cyber attacks, designing defensible networks, and maintaining situational awareness of network activity. However, much work still remains to be done, as the volume, variety, and severity of intrusions continues to outpace the human analysts’ ability to scale analysis.
At its core, cybersecurity is still essentially reactive. For 2019, LAS is interested in moving beyond the reactive to enable scalable defense without increasing the number of analysts. How do we fundamentally change the playing field, to make it even – or even give defenders the advantage? How do we increase the odds of detecting malicious activity, while decreasing the time and effort required to do so? The topics of interest intersect with all of the challenges described in Section 2. Potential questions include, but are not limited to:
- Moving beyond anomaly detection: Can we use anticipatory analytic tradecraft to anticipate or predict a cyber event or adversary action? Can we anticipate or predict notable events in noisy environments without reliance on hard signatures or pattern matching schemes? How is that event or action modeled and updated over time? Can we devise methods to steer adversarial actions away from their strength and toward ours?
- Scaling through technology-enabled tradecraft: How can we make analysts’ lives easier, while simultaneously improving performance and accuracy? How do we enable analysts to only spend time on tasks that machines cannot yet do that truly require a trained human? Can we develop techniques that make use of “big data” to improve performance and accuracy — absent expectation that an analyst can sift through it — to optimize human-machine interactions bounded by time
- Technology-enabled structured analytic tradecraft environment: How do we make use of cyber knowledge sharing mechanisms to enhance automated and human collaborative analysis? What computing platforms support structured analytic tradecraft and analytic rigor within a cyber context?
- Managing and modeling cyber analytics: Could an analyst simply request data related to a given intrusion, vulnerability, or technique, and get everything relevant?
- Robust cyber indications and warning: Can we combine multiple weak, high-false-positive signatures to generate high value, low-false-positive strong indicators? Can we incorporate other knowledge to eliminate false positives? Are there better techniques to triage, prioritize and discover threats and indications in time-bound, context sensitive environments? Can we provide related (possibly “non-cyber”) context to alerts?
- Incorporating disparate sources: How can we significantly reduce the manual processes involved in correlating intrusion detection system alerts with text reports of intrusion activity? Similarly, can open source knowledge be used to correlate seemingly unassociated events? Can the integration of disparate data create more meaningful alerts? Is it possible to enhance cyber defense success without examining packets on a “wire”?
- Insider threat detection: Can we model individual user behaviors, identifying both outliers and significant changes? Is there a technique to find behavioral evidence that will lead to discovery of a prior compromise in a system or network? Can we anticipate threat behavior with enough confidence to take action?
- High-confidence attribution: Are there reliable methods to attribute and track the phylogeny of malicious code? How robust and specific does the malicious behavior model have to be? What contextual information is of most benefit?
3.2 Industrial Internet of Things and Critical Infrastructure
Critical infrastructures affect all areas of daily life, including: electric power, natural gas and petroleum production and distribution, telecommunications (information and communications), transportation, water supply, banking and finance, emergency and government services, agriculture, and other fundamental systems and services that are critical to the security, economic prosperity, and social well-being of the nation. It is a national priority to keep this critical infrastructure robust against disruptions from either unexpected situations or malicious attacks. In 2019, LAS is interested in research projects focused on understanding two particular aspects of the critical infrastructure problem: the interdependencies within the nation’s critical infrastructure and the security implications that come from the introduction of Industrial Internet of Things (IIoT) devices into the infrastructure.
The nation’s critical infrastructure is highly interconnected and mutually dependent in complex ways, physically, geographically and logically through a host of information and communication technologies. These interdependencies and the resultant infrastructure topologies can create subtle interactions and feedback mechanisms that often lead to unintended behaviors and consequences during disruptions. Identifying, understanding, and analyzing such interdependencies is essential to assessing the robustness of the overall infrastructure. A number of Industrial Control Systems (ICS) users (e.g., smart transportation, smart grids, and smart medical) are transitioning to architectures that integrate cyber physical systems (CPS) with the Internet of Things using cloud computing services. These IIoT architectures provide ICS users methods to get better visibility and insight into operations and assets, however, they also have the potential to introduce numerous security challenges into critical infrastructures.
In 2018, LAS established an IoT Laboratory at NC State University to perform controlled experiments to research and analyze communications of low power, long range IoT networks anddevices, such as those used in IIoT architectures. This laboratory may be a resource to projects addressing some of the following applied questions in critical infrastructure and IIoT that LAS is interested in pursuing in 2019.
The topics of interest intersect with all of the challenges described in Section 2.
Potential questions include, but are not limited to:
- How do we reduce the security risks to critical infrastructure that migrates to an IIoT architecture? How can network owners implement security that mitigates risks without impacting productivity?
- How can we help decision makers quickly identify threats and define problems in interdependent critical infrastructure systems? How can we identify normal and abnormal activity in these complex systems? How might we identify the interdependencies of any specific sector and what interventions might be necessary?
- How can we analyze and understand the large volumes of data/information generated by IIoT networks to provide a common operating picture of the environment? Can we use the communication behavior of devices (not simply the content of their communications) to understand more about the physical environment? Can we translate this understanding into improved analytics?
- What data/information is needed to better anticipate and mitigate tactical, operational and strategic surprises? How can we provide a common operating picture between the cyber and physical environments? What about across interdependent sectors?
3.3 Computational Social Science
LAS is interested in studying social phenomena of national security interest. Of particular interest are those projects where there are social science questions that can be effectively studied by applying computational social science methods to model, simulate, and analyze phenomena of interest. Some of these projects may require the solution to analytics challenges, as described in Section 2.4. The following provides examples of the kinds of questions that might be of interest, but other questions and areas of investigations are also welcome to be proposed.
- Impact of Instability. Fragile and failing states frequently engage in armed conflict for power, and this conflict is often underpinned by illicit economies connected to global illicit networks. As these states drift further into chaos, illicit networks become an attractive option for warlords and warring factions to gain necessary resources, further destabilizing the situation. Complicating the matter, outsiders take advantage of instability for their own illicit activities, such as using refugee camps as a breeding ground for radicalization and manpower source for terror acts. More stable state and non-state actors may use failing states as staging grounds for their influence campaigns or to mask their own illicit activities. Can we understand the relative importance of factors that hallmark the impending collapse or deterioration of a fragile or failing state? How well do machine learning models trained on historical events predict failing states? What are critical elements that facilitate existing means and methods of illicit transactions that support conflict states? Can critical paths be identified in clandestine (non-attributable) and covert (mis-attributable) networks?
- Adverse Influence. Human beings, both singly and in groups, are influenced every day from every angle. Examples abound, from Russian activities in Ukraine and its near abroad to the influence mass media, social media-based “fake news” or cyber campaigns had on recent US and European elections. Influence on decision-making is difficult to observe and weigh. Some research shows that even when we recognize an attempt to influence us, it does not nullify the effect of the influence. What are scientifically rigorous methods to measure the effects of direct and indirect influence? What methods and tools are effective for identifying key direct and indirect influencing strategies for high-stakes decision-making such as those used in marketing, lobbying, fundraising,
recruiting, propaganda and information campaigns?
- Veracity and Reliability of Media. Self-communication platforms have generated a myriad of outlets and news producers. This has produced new patterns of information dissemination that compete with mainstream narratives and, at the same time, the reliability of well-known traditional media outlets has been challenged. It has become increasingly difficult to distinguish reliable sources from those that have ulterior motives. What features are predictive of reliable media? Is it possible to automatically detect propaganda?
3.4 Emerging Technology
LAS is interested in understanding the risks and opportunities presented by emerging technology domains as well as the tradecraft around understanding and providing related intelligence. Within these domains, LAS is particularly interested in projects that address some of the following applied research questions:
- What would a repeatable process to help analysts and models bootstrap learning about a new domain look like? How could this process be automated? What machine learning techniques can be applied to support domain learning?
- What processes exist for quickly building and maintaining a robust knowledge base for an emerging technology domain? How can we detect bias and veracity when looking at various sources for this information?
- What types of a dashboards/systems would an analyst need to make sense of the data, information, and knowledge?
- What are the security implications and second-order effects that may result from capabilities enabled by the emerging technologies? What other things might have to adapt to these new technologies?
The following are specific examples of emerging technology domains that could potentially drive applied research experiments and development at LAS in 2019:
- Anonymization Services and Encryption
- Blockchains, Cryptocurrencies, and Smart Contracts
- Data Brokerages and Data Aggregation Services
These examples are in addition to the applications that have been discussed in Section 3.
Dr. Matt Schmidt, LAS Director of Programs, firstname.lastname@example.org
Jamie Roseborough, Project Manager, email@example.com