Reconnaissance | What's Your IQ

Introduction

Reconnaissance is the first and arguably most critical phase of a penetration test. Before any vulnerability can be identified or exploited, the tester must develop a thorough understanding of the target environment. In authorized security assessments, reconnaissance involves systematically collecting information about an organization's infrastructure, technologies, personnel, and digital footprint to identify potential attack surfaces.

The quality of reconnaissance directly determines the success of subsequent penetration testing phases. A thorough reconnaissance effort can reveal misconfigurations, exposed services, leaked credentials, and forgotten infrastructure that would otherwise go unnoticed. Conversely, skipping or rushing this phase often leads to missed vulnerabilities and incomplete assessments.

All reconnaissance activities described in this article must be performed only within the scope of a formal Rules of Engagement (ROE) document and with explicit written authorization from the target organization. Unauthorized reconnaissance against systems you do not own or have permission to test is illegal in most jurisdictions.

"Give me six hours to chop down a tree and I will spend the first four sharpening the axe." -- This principle, often attributed to Abraham Lincoln, perfectly encapsulates the importance of reconnaissance in penetration testing.

Passive vs. Active Reconnaissance

Reconnaissance techniques fall into two broad categories based on whether they involve direct interaction with the target. Understanding this distinction is essential for managing detection risk and staying within the boundaries of an engagement's scope.

Passive Reconnaissance

Passive reconnaissance involves gathering information without directly interacting with the target's systems. The target organization has no way to detect that reconnaissance is taking place because the tester never sends packets to or otherwise touches the target infrastructure. All information comes from publicly available sources.

Common passive reconnaissance techniques include:

WHOIS lookups -- querying domain registration databases for ownership details, contact information, and name server records
DNS record analysis -- examining cached or publicly available DNS records without querying the target's DNS servers directly
Search engine queries -- using Google, Bing, and other search engines to find indexed pages, cached content, and metadata
Social media analysis -- reviewing employee profiles on LinkedIn, Twitter, and other platforms for organizational intelligence
Public code repositories -- searching GitHub, GitLab, and Bitbucket for accidentally committed secrets, internal documentation, or architecture details
Job postings -- analyzing job listings to determine technology stacks, security tools, and infrastructure components
SEC filings and press releases -- reviewing public financial documents for infrastructure investments and vendor relationships

Active Reconnaissance

Active reconnaissance involves direct interaction with the target's systems. This generates network traffic and log entries that could potentially alert the target's security team. Active techniques include port scanning, banner grabbing, and direct DNS queries against the target's name servers.

Because active reconnaissance is detectable, it requires explicit authorization and should be carefully scoped. Many penetration testing engagements begin with a passive-only phase before transitioning to active techniques once the tester has a preliminary understanding of the target environment.

Characteristic	Passive Reconnaissance	Active Reconnaissance
Target Interaction	None -- uses publicly available data only	Direct contact with target systems
Detection Risk	Undetectable by the target	May trigger IDS/IPS alerts and log entries
Information Depth	Broad but surface-level	Deep, specific technical details
Legal Considerations	Generally legal (public information)	Requires explicit written authorization
Examples	WHOIS, Google dorking, OSINT	Port scanning, banner grabbing, DNS zone transfer attempts
Speed	Can be slow (manual research)	Often automated and fast
Accuracy	May be outdated or incomplete	Current, real-time data

Open Source Intelligence (OSINT)

Open Source Intelligence (OSINT) is the collection and analysis of information from publicly available sources. In the context of penetration testing, OSINT forms the backbone of passive reconnaissance. The term originates from military and intelligence communities, where it refers to intelligence gathered from non-classified, publicly accessible sources.

OSINT is not limited to internet sources. It encompasses any publicly available information, including newspapers, academic publications, government records, radio broadcasts, and publicly accessible databases. However, in modern penetration testing, the vast majority of OSINT is collected from online sources.

OSINT Frameworks and Methodology

Effective OSINT collection follows a structured methodology rather than ad hoc searching. Several frameworks have been developed to guide practitioners:

OSINT Framework (osintframework.com) -- a curated collection of tools organized by category: username search, email search, domain/IP investigation, social networks, and more
Maltego -- a graphical link analysis tool that performs automated OSINT collection and visualizes relationships between entities such as people, companies, domains, and IP addresses
SpiderFoot -- an open-source OSINT automation tool that queries over 100 data sources to build a comprehensive intelligence profile
Recon-ng -- a modular reconnaissance framework written in Python, modeled after the Metasploit Framework, with modules for various OSINT collection tasks
theHarvester -- a tool for gathering email addresses, subdomains, virtual hosts, and open ports from public sources like search engines and PGP key servers

Social media platforms are among the richest sources of intelligence for penetration testers. Employees frequently share information that reveals organizational structure, technology decisions, security practices, and even credentials. Key sources include:

LinkedIn -- reveals employee names, job titles, reporting structures, technology skills, and professional connections. Job postings on LinkedIn often detail specific software, hardware, and security tools used by the organization
GitHub/GitLab -- employees may accidentally commit API keys, database credentials, internal URLs, or configuration files to public repositories
Twitter/X -- employees may discuss technical issues, conferences attended, or internal projects
Stack Overflow -- developers asking questions may inadvertently reveal internal architecture or code structure

"The best reconnaissance feels like research, not hacking. If you are doing it right, you are reading, not scanning." -- Jason Haddix, security researcher and bug bounty hunter

DNS Enumeration

DNS enumeration is the process of discovering all DNS records associated with a target domain. The Domain Name System is one of the most information-rich services available to a penetration tester, as it maps human-readable domain names to IP addresses and reveals the structure of an organization's network infrastructure.

Key DNS record types of interest during reconnaissance:

Record Type	Purpose	Intelligence Value
`A`	Maps hostname to IPv4 address	Reveals server IP addresses and hosting providers
`AAAA`	Maps hostname to IPv6 address	May reveal additional infrastructure on IPv6
`MX`	Mail exchange servers	Identifies email infrastructure and providers
`NS`	Name servers	Reveals DNS infrastructure and hosting
`TXT`	Arbitrary text (SPF, DKIM, etc.)	Reveals email security config, domain verification, cloud services
`CNAME`	Canonical name (alias)	Reveals cloud services, CDNs, and third-party integrations
`SOA`	Start of Authority	Reveals primary DNS server and admin contact
`SRV`	Service locator	Reveals specific services and ports

Common DNS enumeration techniques include:

Zone transfer attempts (AXFR) -- if a DNS server is misconfigured, it may allow a full zone transfer, revealing all records in the domain. This is an active technique and should only be attempted with authorization.
Subdomain brute-forcing -- using wordlists to discover subdomains by querying for common names like mail, vpn, dev, staging, admin
Certificate Transparency logs -- SSL/TLS certificates are logged in public Certificate Transparency (CT) logs, which can be queried to discover subdomains. Tools like crt.sh provide searchable interfaces
Reverse DNS lookups -- querying PTR records for IP ranges to discover hostnames associated with specific addresses

Example DNS enumeration commands used in authorized testing:

# Basic DNS lookupdig example.com ANY# Attempt zone transfer (requires authorization)dig @ns1.example.com example.com AXFR# Subdomain enumeration with dnsenumdnsenum --dnsserver ns1.example.com example.com# Certificate Transparency log searchcurl -s "https://crt.sh/?q=%25.example.com&output=json" | jq '.[].name_value'# Reverse DNS sweep of a subnetfor ip in $(seq 1 254); do host 192.168.1.$ip | grep "domain name pointer"done

WHOIS and Network Footprinting

WHOIS is a query-response protocol used to look up domain registration information. It reveals the registered owner of a domain, administrative and technical contacts, registration and expiration dates, name servers, and the registrar. While many organizations now use WHOIS privacy services, significant intelligence can still be gathered.

Network footprinting extends beyond WHOIS to map the entire network presence of an organization. This includes identifying IP address ranges, autonomous system numbers (ASN), network blocks, and relationships between different parts of the infrastructure.

Key footprinting data sources:

WHOIS databases -- ARIN, RIPE, APNIC, LACNIC, and AFRINIC maintain regional internet registration data
BGP routing tables -- reveal autonomous system relationships and IP prefix announcements
Netcraft -- provides web server technology detection, hosting history, and site reports
BuiltWith -- identifies web technologies, frameworks, analytics tools, and content management systems
Wayback Machine -- archives historical versions of websites, potentially revealing old pages, removed content, or previous technology stacks

# WHOIS lookup for a domainwhois example.com# WHOIS lookup for an IP addresswhois 203.0.113.50# ASN lookupwhois -h whois.radb.net -- '-i origin AS12345'# Find all IP blocks for an organizationwhois -h whois.arin.net "o Example Corp"

Google Dorking

Google dorking (also called Google hacking) is the technique of using advanced search operators to discover information that is publicly indexed but not easily found through normal searches. These operators allow penetration testers to find exposed files, login pages, directory listings, error messages, and other sensitive information that organizations may not realize is publicly accessible.

This is a purely passive technique -- the tester is only querying Google's cached index, not the target's servers directly. However, the information discovered may reveal significant security issues.

Operator	Syntax	Purpose
`site:`	`site:example.com`	Restrict results to a specific domain
`filetype:`	`filetype:pdf site:example.com`	Find specific file types
`intitle:`	`intitle:"index of" site:example.com`	Find pages with specific title text
`inurl:`	`inurl:admin site:example.com`	Find URLs containing specific strings
`intext:`	`intext:"password" filetype:log`	Find pages containing specific body text
`cache:`	`cache:example.com`	View Google's cached version of a page
`ext:`	`ext:sql site:example.com`	Find files by extension (similar to filetype)

Example queries used during authorized assessments:

# Find exposed directory listingssite:example.com intitle:"index of"# Find configuration filessite:example.com filetype:xml OR filetype:conf OR filetype:env OR filetype:ini# Find login pagessite:example.com inurl:login OR inurl:signin OR inurl:admin# Find exposed documentssite:example.com filetype:pdf OR filetype:doc OR filetype:xlsx# Find error messages revealing technology stacksite:example.com "Fatal error" OR "Warning:" OR "mysql_connect"# Find potential backup filessite:example.com filetype:bak OR filetype:old OR filetype:backup

The Google Hacking Database (GHDB), maintained by Offensive Security, catalogs thousands of tested dorks organized by category. It serves as an invaluable reference during the reconnaissance phase of authorized security assessments.

Shodan and Internet-Wide Scanning

Shodan is a search engine that indexes internet-connected devices by scanning the entire IPv4 address space and recording the banners and metadata returned by services. Unlike Google, which indexes web content, Shodan indexes services -- web servers, databases, SCADA systems, IoT devices, and anything else with a network-accessible port.

For penetration testers conducting authorized assessments, Shodan provides a passive way to identify an organization's internet-facing services without sending a single packet to the target. Because Shodan continuously scans the internet independently, querying it is considered passive reconnaissance.

Key Shodan capabilities:

Service identification -- discover open ports and running services on target IP ranges
Banner analysis -- examine service banners for version information, configuration details, and default credentials indicators
Vulnerability correlation -- Shodan tags hosts with known CVEs based on detected service versions
SSL/TLS certificate analysis -- review certificate details, expiration dates, and cipher suite configurations
Historical data -- view how a host's services have changed over time

Similar platforms include Censys, which provides comparable internet-wide scanning data with a focus on TLS certificates and web server configurations, and ZoomEye, a Chinese counterpart that indexes cyberspace devices and services.

# Shodan CLI examples (requires API key)shodan search "hostname:example.com"shodan host 203.0.113.50shodan search "org:\"Example Corp\" port:22"# Censys CLI examplescensys search "services.tls.certificates.leaf.subject.common_name: example.com"# Shodan filters for common security issuesshodan search "default password" "org:Example Corp"shodan search "port:3389" "org:Example Corp" # RDPshodan search "port:27017" "org:Example Corp" # MongoDB

Reconnaissance Tools Comparison

The following table compares commonly used reconnaissance tools, their primary functions, and whether they perform passive or active reconnaissance. All tools should only be used within the scope of an authorized engagement.

Tool	Type	Primary Function	License
Maltego	Passive	Graph-based OSINT and link analysis	Community / Commercial
Recon-ng	Passive / Active	Modular reconnaissance framework	Open Source (GPL)
theHarvester	Passive	Email, subdomain, and IP harvesting	Open Source (GPL)
SpiderFoot	Passive	Automated OSINT collection	Open Source (MIT)
Amass	Passive / Active	Subdomain enumeration and mapping	Open Source (Apache 2.0)
Sublist3r	Passive	Fast subdomain enumeration	Open Source
Shodan	Passive	Internet-wide service discovery	Commercial / Free tier
Censys	Passive	Certificate and host analysis	Commercial / Academic
dnsenum	Active	DNS enumeration and zone transfers	Open Source
Fierce	Active	DNS reconnaissance and brute-forcing	Open Source

Defensive Countermeasures

Understanding reconnaissance techniques is essential for defenders, as it reveals what information an attacker can gather and how to minimize the organization's exposure. Effective countermeasures include:

WHOIS privacy protection -- use domain privacy services to shield registration details from public WHOIS queries
DNS hardening -- disable zone transfers to unauthorized requestors, use split-horizon DNS to separate internal and external records, and regularly audit DNS records for stale or unnecessary entries
Search engine hygiene -- use robots.txt and meta tags to prevent indexing of sensitive pages, regularly search for your own organization using Google dorks to identify exposed information
Employee awareness training -- educate employees about the risks of sharing technical details on social media, public forums, and code repositories
Code repository auditing -- use tools like truffleHog, git-secrets, and gitleaks to scan for accidentally committed credentials and secrets
Attack surface management -- deploy continuous monitoring solutions to track your internet-facing assets and receive alerts when new services are exposed
Certificate management -- be aware that Certificate Transparency logs make all issued certificates publicly discoverable; plan subdomain naming accordingly

Organizations should regularly conduct their own reconnaissance exercises against themselves -- a practice often called attack surface mapping -- to identify and remediate exposed information before adversaries can exploit it.

To continue learning about the next phase of penetration testing, see Scanning and Enumeration. For related network security topics, explore Firewalls and Intrusion Detection Systems.

References

Weidman, G. (2014). Penetration Testing: A Hands-On Introduction to Hacking. No Starch Press.
Harper, A., Harris, S., Ness, J., Eagle, C., Lenkey, G., & Williams, T. (2015). Gray Hat Hacking: The Ethical Hacker's Handbook, 4th Edition. McGraw-Hill.
Long, J. (2007). Google Hacking for Penetration Testers, 2nd Edition. Syngress.
Matherly, J. (2016). Complete Guide to Shodan. Shodan Publications.
PTES Technical Guidelines. (2014). "Intelligence Gathering." Penetration Testing Execution Standard.
OWASP. (2023). "Information Gathering." OWASP Web Security Testing Guide, v4.2.
NIST SP 800-115. (2008). Technical Guide to Information Security Testing and Assessment. National Institute of Standards and Technology.
Bazzell, M. (2023). Open Source Intelligence Techniques, 9th Edition. IntelTechniques.
MITRE ATT&CK. (2024). "Reconnaissance." Tactic TA0043. The MITRE Corporation.