Skip to main content
Symposium 2024 Blog Posts

Internet Routing Integrity 

LAS: Felecia ML., James S., Donita R., Skip S.
Academic Collaborator: The University of North Carolina at Pembroke

global map
Worldwide Internet Routing Registries

Background

Picture this: You’re settling in for some routine online banking, coffee in hand, ready to pay your monthly bills. The website looks exactly like your bank’s—same logo, same layout, same everything. You type in your username and password, confident in your computer’s up-to-date security systems and antivirus protection.  But something has gone terribly wrong.

Without your knowledge, cyber criminals have just intercepted your login credentials. Not because of a weak password.  Not because of malware on your device. But because they’ve pulled off something far more insidious: a Border Gateway Protocol (BGP) hijacking attack.

In this sophisticated attack, hackers have essentially redrawn the Internet’s road map, redirecting your bank’s entire web traffic through their own networks.

The most unsettling part? This vulnerability exists because many of our Internet security protocols remain optional. Despite the availability of powerful security frameworks like Resource Public Key Infrastructure (RPKI) and Wi-Fi Protected Access 2 (WPA2), vast sections of the Internet operate without these crucial safeguards.  It’s akin to leaving your front door unlocked in a neighborhood where everyone knows the value of locks—but installing them is somehow still considered optional.

This precarious situation hasn’t gone unnoticed. U.S. policymakers are now sounding the alarm, pushing for mandatory implementation of these security measures—particularly RPKI—to fortify our digital infrastructure.  The stakes couldn’t be higher: in an increasingly connected world, the integrity of Internet routing isn’t just about protecting individual users—it’s about safeguarding our entire digital economy.

To support this initiative, the Laboratory for Analytic Sciences (LAS) is collaborating with senior design students from the University of North Carolina at Pembroke (UNCP) and leading UNCP cybersecurity professor, Dr. Prashanth BusiReddyGari, throughout the 2024 academic year to develop a system that utilizes Artificial Intelligence and Machine Learning techniques to analyze and visualize the integrity of the internet across the U.S.

Resource Public Key Infrastructure (RPKI)

The RPKI and Route Origin Authorization (ROA) protocols offer essential safeguards for global internet routing. Without them, network operators may unknowingly—or maliciously—announce incorrect routes, leading to internet disruptions, degraded traffic flow, and inaccessible networks. When used properly, these protocols enable network operators to detect and dismiss fraudulent or erroneous BGP route announcements, helping prevent both cyberattacks and accidental routing errors. Ensuring routing integrity depends on widespread implementation of RPKI and ROA by network operators.

A Route Origin Authorization (ROA) is a cryptographically signed digital object within the RPKI framework. It explicitly authorizes a specific ASN to originate routes for a particular IP address prefix or range.  When BGP announcements propagate through networks, routers can verify each route’s legitimacy by checking its RPKI status against valid ROAs. This verification process helps prevent route hijacking and unauthorized routing changes.

The RPKI validity states are as follows:

  • Not Found
    • No ROA exists that covers the announced prefix
    • The route announcement cannot be validated against RPKI data
    • This is the default state for prefixes without RPKI coverage
  • Unknown
    • No ROA exists that covers the announced prefix
    • The route announcement cannot be validated against RPKI
    • This is the default state for prefixes without RPKI coverage
  • Valid 
    • The route announcement matches an existing ROA
    • Both the prefix and originating ASN align with the ROA’s specifications
    • The announced prefix length does not exceed the ROA’s maximum length
  • Invalid
    • One of the following conditions is met:
      • The prefix matches an ROA, but the ASN doesn’t match any ROA for that prefix
      • The prefix matches an ROA, but the prefix length exceeds the maximum length specified in the ROA
      • The prefix is covered by a ROA but originated by an unauthorized ASN

The Datasets

The American Registry for Internet Numbers (ARIN) serves as the regional Internet registry responsible for managing Internet resources across the United States, Canada, and various Caribbean and North Atlantic islands. ARIN’s Bulk WhoIs database includes a range of critical information, such as

  1. registered organizations,
  2. autonomous system numbers (ASNs),
  3. IP address allocations, and
  4. contact points for each registered entity. 

Since the Bulk WhoIs dataset only captures a snapshot in time, the LAS and UNCP developed a MongoDB database to efficiently store and organize both current and historical snapshots of this dataset.

However, relying solely on the Bulk WhoIs dataset doesn’t provide sufficient information to determine each organization’s RPKI validity status. To address this, the team designed a process using additional internet tools and services, including Routinator 3000, RIPE NCC, GeoLite2, and MaxMind, to conduct route origin validation based on ARIN’s Bulk WhoIs data.

By combining the Bulk WhoIs data with route origin validation results, the team is now able to effectively capture and visualize the RPKI status for each ARIN-registered organization (network operator). While there are other regional internet registries worldwide (see Figure 1), this proof-of-concept focused on ARIN’s registry specifically.

Figure 1: Worldwide Internet Routing Registries

Cybersecurity and Infrastructure Security Agency (CISA) Sector Categorization Using AI

There are 16 critical infrastructure sectors defined by the Cybersecurity and Infrastructure Security Agency (CISA).  Because there are over 3 million registered organizations in ARIN, the team developed a process to categorize the ARIN organizations by sectors using Sentence Transformer and ChatGPT.  This allows the user to quickly identify organizations by sector and identify which critical infrastructures are pushing the adoption of RPKI.

Sentence Transformer is designed for embedding sentences into high-dimensional vectors and captures semantic information such as sentence similarity, clustering, and categorization.  The ARIN organization names were encoded along with all 16 CISA sectors into feature vectors.  After cross-verifying the ARIN organization names, any organizational name that has no cosine similarity (i.e., unable to categorize) was sent to GPT-4o Mini API to determine the sector. If the organization name is still unable to be categorized by sector, then it is categorized as “undetermined”.

The categorized (preliminary) results of CISA sectors using Sentence Transformer and GPT-4o Mini with ARIN’s BulkWhois data are:

  • Total Number of US-Registered Organizations: 1,815,961
  • Categorized Organizations: 1,783,123 organizations were successfully categorized under 16 CISA sectors.
  • Unclassified/Unrecognized Organizations: 32838 (1.8%) of 1,815,961  were categorized as unclassified/unrecognized by the sentence transformer and ChatGPT-4o Mini API and are therefore excluded
  • CISA Sub-Sector Categorized: 1,783,123 (98.2%) of the 1,815,961 organizations successfully categorized under the 16 CISA sub-sectors
Figure 2: CISA Sector Categorization of the ARIN Registered Organizations Using Sentence Transformer and ChatGPT 4o Mini
Figure 3: ARIN Registered Organizations Categorized by Sector

The Capability

An integrated analysis platform (Figure 4) was developed that combines a backend Mongo database with a user-friendly web interface to visualize and explore RPKI adoption trends across ARIN’s jurisdiction. The platform processes data directly from ARIN’s registration database, presenting it through interactive visualizations including a pie chart showing RPKI validation state distribution and a dynamic U.S. map displaying regional adoption patterns. Further, an interactive treemap was added to display CISA sectors, allowing users to hover over each sector and subsector to view their respective RPKI adoption percentages.

Figure 4: Internet Routing Integrity Analysis Platform

The implemented search feature (Figure 5) was incorporated to allow users to search by organization name, ASN, or IP prefix to access detailed RPKI validation statistics, with results displaying aggregated validation states and adoption metrics for each queried entity. These search pages also have interactive elements to explore specifics of the data such as being able to navigate to the specific ASN or network page and vice-versa from the organization page. This implementation allows users to check specific networks and the validation status of their netblocks when paired with the specified ASN numbers allowing users to distinguish which ASN number and netblock pair give a valid or other statuses. 

This comprehensive approach enables network operators, researchers, and policy makers to better understand and monitor RPKI deployment across U.S. networks as well as explore specifics within the dataset through an interactive approach to make informed decisions regarding specific networks or overall policies for internet routing security in a quickly evolving landscape. 

Figure 5: Internet Routing Integrity Analysis Platform: Search Results

Outcomes

The initial ARIN dataset contained 3,418,280 networks, which generated 3,457,979 prefixes from the associated netblocks. However, only 800,845 of these networks included the required origin AS numbers to validate a route. To enhance route validation, RIPE’s Whois service was utilized to identify additional origin AS numbers, resulting in 88,834 more validated routes. This approach significantly improved the dataset’s completeness and validation coverage.  After validating the routes, the following results were produced:

ARIN

Total routes: 889,679

  • Valid: 62,547
  • Invalid:  429,711
  • Not-found: 397,421

The results deviated from the initial assumption regarding the distribution of route states, which was expected to follow a pattern from largest to smallest: valid, not found, and invalid. To diagnose this discrepancy, the validation process was reanalyzed, with particular attention given to the additional routes derived from RIPE’s Whois service. These routes, though relatively small in number, were separated to determine their influence, revealing only minor alterations to the overall results. To further investigate, all ARIN prefixes were cross-referenced with RIPE’s Whois service to explore any potential comparative insights that could clarify the observed distribution patterns, which resulted in the following results:

RIS Whois

Total routes: 89,410

  • Valid: 31,695
  • Invalid: 1,750
  • Not-found: 55,965

Although this process yielded only a small number of routes, the results aligned with the initial assumption: valid routes formed the largest group, while invalid routes were the smallest. To refine the analysis further, prefixes were restricted to networks updated within the past two years. This approach was based on the theory that assessing RPKI certificates within their two-year lifespan—during which resource ownership is verifiable—would produce a distribution more consistent with the original expectations.  The following was the result: 

ARIN 2021 – 2023

Total routes: 216059

  • Valid: 8467
  • Invalid: 162856
  • Not-found: 44736

The results still fell short of expectations, likely due to the voluntary nature of ARIN’s data collection, which may lead to outdated or poorly maintained information by resource holders. This issue is evident in several inconsistencies within the ARIN dataset. For example, there is no strict policy for naming organizations, resulting in variations like “Amazon, Inc” and “Amazon, Inc.” differing only by a period. Some organization names begin with unusual characters such as commas or plus signs, while others are formatted as personal names followed by long strings of numbers, such as “Allan Young02112011103952126.” These irregularities complicated the classification process, introducing inaccuracies. Furthermore, discrepancies were also noted in the recorded locations of registered organizations, further impacting the dataset’s reliability for analysis. 

ARIN utilizes ISO 3166 country codes to specify organization locations, with the iso3166-1 field indicating the country code and name and the iso3166-2 field representing the state or province code. However, inconsistencies in this data create challenges. For example, Puerto Rico may appear as iso3166-1: “PR”, “Puerto Rico” with iso3166-2: “PR”, or alternatively as iso3166-1: “US”, “United States” with iso3166-2: “Puerto Rico”. These variations result in numerous edge cases that complicate the determination of an organization’s registered location, requiring additional handling to ensure accurate classification.

Future Direction

This research underscores key recommendations to advance RPKI adoption and strengthen the security of internet routing infrastructure. Enhanced tracking methods are crucial for monitoring RPKI implementation across CISA-defined critical infrastructure sectors, enabling the identification of vulnerabilities over time. To improve ARIN’s data quality, the implementation of standardized naming conventions is recommended, addressing inconsistencies where organizations are listed under multiple variations of their name (e.g., “Amazon,” “Amazon Inc.”). Such improvements would ensure accurate organization tracking and better database management. 

Finally, preparing for the impact of quantum computing is vital, as its capabilities could undermine existing cryptographic systems. Strengthening cryptographic frameworks like RPKI will help secure global internet infrastructure against future quantum-driven cyber threats, ensuring long-term resilience and stability.

About the University of North Carolina at Pembroke

The University of North Carolina at Pembroke (UNCP) is a Native American Non-Tribal Serving Institution located in Pembroke, NC and was designated a Center of Academic Excellence in Cyber Defense for 2023-2028.  

In 2023, the LAS supported the establishment of a Minority Serving Institution (MSI) Cooperative Research and Development Agreement (CRADA) with UNCP, paving the way for expanded partnerships with the Department of Defense in cybersecurity. Through the MSI CRADA, the LAS has collaborated with Dr. Prashanth BusiReddyGari, Director of the Cyber Defense Education Center at UNCP, on two senior design projects: Generating Synthetic Cyber Knowledge Graphs (Fall 2023/Spring 2024) and Internet Routing Integrity (Spring/Fall 2024). These projects are part of the LAS’s broader commitment to strengthening academic outreach to MSIs in North Carolina, giving students hands-on experience with real-world initiatives that support the DoD’s mission.

This material is based upon work done, in whole or in part, in coordination with the Department of Defense (DoD). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the DoD and/or any agency or entity of the United States Government.