Cloud Infrastructure Visibility at Scale: Automated Resource Discovery and Interactive Topology Mapping

Research Question and Methodology

The primary research question addressed is: “How can a functionality be developed that generates a visual representation of cloud infrastructure, so that the user has a clear overview of their infrastructure and how its components are connected to each other?” This question is particularly urgent given that many organizations operate with underdocumented environments or infrastructure absent from Infrastructure as Code systems, making manual visibility nearly impossible at scale.

Research Design and Techniques

The study follows a three-phase methodology grounded in practical feasibility assessment and business-driven requirements. First, semi-structured interviews with three internal stakeholders—including the Product Owner, Business Supervisor, and Lead Platform Engineer—established critical information priorities using the MoSCoW method. Stakeholders emphasized that demonstrating technical feasibility and identifying areas requiring further development took precedence over polished user interfaces, reflecting the research-focused nature of the project. Second, systematic technology evaluation examined five candidate tools—CloudMapper, Steampipe, AWS Resource Explorer, InfraMap, and CloudQuery—against criteria including active maintenance, open-source status, multi-cloud support, and hierarchical grouping capabilities. Based on this analysis, Steampipe was selected as the data collection engine for its cloud-agnostic architecture, fully open-source plugin ecosystem, and cost-effective operation with native AWS Resource Explorer integration. Third, Cytoscape.js was chosen as the visualization library for its specialized graph-rendering capabilities, native support for compound nodes enabling hierarchical representations, and existing organizational expertise within the development team.​

Findings: Automated Discovery vs. Resource Heterogeneity Trade-offs

The proof-of-concept successfully demonstrated technical viability for core infrastructure visualization use cases. Steampipe’s SQL-based query abstraction proved effective at identifying diverse AWS resources—Virtual Private Clouds, subnets, EC2 instances, and S3 buckets—across multiple geographic regions and availability zones. Security group relationships enabled precise mapping of authorized network communication between compute instances through a comprehensive SQL query that parsed ingress rules and identified cross-referenced security group relationships. Cytoscape.js rendered these connections intuitively through interactive, hierarchical graph layouts augmented with the cola.js force-directed algorithm to prevent node overlap and maintain readability across complex topologies.​

However, critical limitations emerged when attempting to generalize resource identification beyond virtual machines. The researchers explored querying network interfaces (ENIs) as a universal discovery primitive, theorizing that all resources residing in subnets expose ENIs that could be linked to parent resources. While this approach worked for EC2 instances where the [attachedinstanceid] field provides direct parent linkage, it proved problematic for other resource types. EFS and Lambda resources expose identifiers only in description fields requiring fragile regex extraction patterns, while RDS instances lack any identifier in either field, making parent linkage impossible. This discovery revealed that AWS resource metadata lacks consistent patterns across resource types, fundamentally constraining generic resource discovery approaches.​

Critical Implications and Architectural Trade-offs

The research demonstrates that cloud infrastructure visualization requires balancing three competing objectives: automated resource discovery, hierarchical semantic representation, and tool-user alignment. This reflects the reality that architectural decisions made during infrastructure introspection directly determine what downstream visualization becomes possible.​

Path Forward: Multi-Cloud Strategy and Resource Abstraction

The research reveals that successful multi-cloud resource visualization depends on identifying generic abstraction primitives—VPCs/VNETs, subnets, firewall rules, compute instances, and storage services—that map consistently across provider boundaries. Steampipe’s plugin architecture for AWS, Azure and GCP provides a foundation for cross-provider expansion, as these platforms expose analogous resources: Azure Network Security Groups and GCP VPC Firewall rules mirror AWS security groups, enabling similar connection-discovery approaches. The study recommends maintaining open-source and cloud-agnostic tooling as guiding principles to control vendor lock-in, costs, and architectural complexity.​

Future work should prioritize three areas: developing alternative resource discovery strategies for problematic resource types like RDS, implementing visualization solutions for availability zones that accommodate multi-parent relationships, and leveraging VPC Flow Logs to augment security-group-based connection mapping with actual observed network traffic patterns. Our research concludes that infrastructure visualization maturity lies not in chasing feature completeness but in accepting architectural constraints while building layered abstractions that gracefully degrade when perfect semantic representation becomes impossible.

Privacy Overview
This website uses cookies. We use cookies to ensure the proper functioning of our website and services, to analyze how visitors interact with us, and to improve our products and marketing strategies. For more information, please consult our privacy- en cookiebeleid.