Cloud Infrastructure Visibility at Scale: Automated Resource Discovery and Interactive Topology Mapping

In collaboration with:

Back to overview

Download research

* required

In collaboration with Utrecht University of Applied Sciences

This study examined the automated identification and visualization of cloud resources and their interrelationships within AWS environments. It addresses a critical operational problem: complex infrastructures often grow faster than organizations' documentation capabilities. As cloud adoption accelerates and infrastructure complexity increases, blind spots arise in the understanding of resource relationships. This causes friction in troubleshooting, security audits, and cost optimization. The study examines how the combination of SQL-based cloud querying and interactive graph visualization can help organizations gain intuitive, real-time insight into their cloud topology, without vendor lock-in.

Research question and methodology

The central research question is: "How can a functionality be developed that generates a visual representation of cloud infrastructure, giving the user a clear overview of their infrastructure and how the various components are connected to each other?"

This question is particularly relevant because many organizations work with under-documented environments or infrastructure that falls (partly) outside Infrastructure as Code systems, making manual insight at scale practically impossible.

Research design and techniques

The study follows a three-part methodology based on practical feasibility and business-driven requirements.

First, semi-structured interviews were conducted with three internal stakeholders—the Product Owner, Business Supervisor, and Lead Platform Engineer to establish critical information priorities using the MoSCoW method. The stakeholders emphasized that demonstrating technical feasibility and identifying areas for further development were more important than a refined user interface, in line with the research objective of the project.

Secondly, a systematic technology evaluation was conducted on five candidate tools — CloudMapper, Steampipe, AWS Resource Explorer, InfraMap, and CloudQuery — based on criteria such as active maintenance status, open-source nature, multi-cloud support, and hierarchical grouping capabilities. Based on this analysis, Steampipe was selected as the data collection engine due to its cloud-agnostic architecture, fully open-source plugin ecosystem, and cost-efficient operation with native AWS Resource Explorer integration.

Thirdly, Cytoscape.js was chosen as the visualization library because of its specialized graph rendering capabilities, native support for compound nodes (for hierarchical representations), and the existing expertise within the development team.

Results: trade-offs between automated discovery and resource heterogeneity

The proof-of-concept convincingly demonstrated the technical feasibility for core use cases related to infrastructure visualization. Steampipe's SQL-based query abstraction proved effective in identifying diverse AWS resources—such as Virtual Private Clouds, subnets, EC2 instances, and S3 buckets—spread across multiple regions and availability zones. Relationships between security groups made it possible to accurately map authorized network communication between compute instances via a comprehensive SQL query that analyzed ingress rules and identified cross-references between security groups. Cytoscape.js intuitively visualized these relationships with interactive, hierarchical graphs, supported by the cola.js force-directed algorithm to prevent overlap and maintain readability in complex topologies.

At the same time, significant limitations were revealed in attempts to generalize resource identification beyond virtual machines. The researchers explored network interfaces (ENIs) as a universal discovery primitive, based on the hypothesis that all resources within subnets expose ENIs that can be linked to parent resources. While this approach worked well for EC2 instances—where the [attachedinstanceid] field provides direct parent relationships—it proved problematic for other resource types. EFS and Lambda resources contain identifiers exclusively in description fields, requiring fragile regex extraction, while RDS instances do not contain an identifier in either field. This made parent linking impossible. This finding shows that AWS resource metadata does not follow consistent patterns across resource types, fundamentally limiting generic discovery approaches.

Critical implications and architectural considerations

The research shows that cloud infrastructure visualization requires a balance between three competing objectives: automated resource discovery, hierarchical semantic representation, and alignment between tool and user. This underscores that architectural choices during infrastructure introspection directly determine which forms of visualization are possible downstream.

Future path: multi-cloud strategy and resource abstraction

The study concludes that successful multi-cloud resource visualization depends on identifying generic abstraction primitives—such as VPCs/VNETs, subnets, firewall rules, compute instances, and storage services—that are consistent across cloud providers. Steampipe's plugin architecture for AWS, Azure, and GCP provides a solid foundation for this, as these platforms expose similar resources. For example, Azure Network Security Groups and GCP VPC Firewall rules mirror AWS security groups, enabling similar connection detection. The study recommends using open-source and cloud-agnostic tooling as guiding principles to keep vendor lock-in, costs, and architectural complexity manageable.

Future work should focus on three priorities: developing alternative discovery strategies for problematic resource types such as RDS, implementing visualizations for availability zones that support multi-parent relationships, and leveraging VPC Flow Logs to supplement security group-based connectivity mapping with actual observed network traffic. The research concludes that mature infrastructure visualization is not about pursuing complete feature coverage, but about accepting architectural limitations and building layered abstractions that degrade in a controlled manner when perfect semantic representation is not feasible.

Download

Cloud Infrastructure Visibility at Scale: Automated Resource Discovery and Interactive Topology Mapping

Download research

In collaboration with Utrecht University of Applied Sciences

Research question and methodology

Research design and techniques

Results: trade-offs between automated discovery and resource heterogeneity

Critical implications and architectural considerations

Future path: multi-cloud strategy and resource abstraction

Related research

Service Level Objective-Aware Infrastructure as Code Generation: Bridging AI and Cloud Deployment Constraints

Investigating Open Source Transformer Techniques for Question Answering Systems on Cloud Domain

Leveraging eBPF to Build Effective High-Interaction Honeypots