Introduction

Malware analysis is the process of studying malicious software to understand its functionality, origin, and potential impact. Security professionals analyze malware to develop detection signatures, build defensive countermeasures, assess the scope of an intrusion, and attribute attacks to threat actors. As the volume and sophistication of malware continue to grow -- with over one billion known malware samples cataloged by 2024 -- the discipline of malware analysis has become one of the most critical specializations in cybersecurity.

The field divides broadly into two complementary approaches: static analysis, which examines malware without executing it, and dynamic analysis, which observes malware behavior during controlled execution. Advanced analysts combine both approaches with reverse engineering to achieve a deep understanding of how malware operates at the code level.

"Malware analysis is to cybersecurity what pathology is to medicine. You must understand the disease to develop the cure." -- Lenny Zeltser, SANS Institute instructor and malware analysis expert

Static Analysis

Static analysis involves examining a malware sample without executing it. This approach is safe because the malware never runs, but it can be limited by obfuscation, packing, and encryption techniques that hide the true nature of the code. Static analysis ranges from basic file identification to advanced disassembly and decompilation.

File Identification and Hashing

The first step in any malware analysis is identifying the sample. Cryptographic hash functions produce unique fingerprints that allow analysts to identify known samples, share intelligence, and track malware families across organizations.

Hash AlgorithmOutput LengthPurpose in Malware AnalysisExample Tool
MD5128 bits (32 hex chars)Quick identification (legacy, collision-prone)md5sum
SHA-1160 bits (40 hex chars)Sample identification (being phased out)sha1sum
SHA-256256 bits (64 hex chars)Primary identification standardsha256sum
ssdeep (fuzzy hash)VariableIdentifying similar variants of the same malwaressdeep
imphash128 bits (32 hex chars)Import table hash for PE files -- groups malware familiespefile (Python)

Fuzzy hashing (also called context-triggered piecewise hashing) is particularly valuable because it can identify samples that are similar but not identical. Malware authors frequently make minor modifications to evade signature-based detection, but fuzzy hashes can still link variants together.

String Extraction

Extracting readable strings from a binary can reveal a surprising amount about its behavior. Analysts look for embedded URLs, IP addresses, file paths, registry keys, API function names, error messages, and command-and-control (C2) server addresses. The strings command on Linux or the Sysinternals Strings utility on Windows are standard tools. FLOSS (FireEye Labs Obfuscated String Solver) goes further by automatically deobfuscating strings that have been XOR-encoded or otherwise hidden.

PE File Analysis

Windows malware typically comes in the Portable Executable (PE) format. Analyzing the PE headers reveals critical metadata: the compilation timestamp, imported libraries and functions (Import Address Table), exported functions, the number and names of sections, and whether the file has been packed or modified. Tools like pestudio, PE-bear, and the Python pefile library allow detailed PE inspection.

Common indicators of suspicious PE files include:

  • Section names that deviate from standard names (.text, .data, .rsrc) -- packers often create sections named .UPX0, .aspack, or random strings
  • Very high entropy in code sections (suggesting encryption or packing)
  • A small import table with only LoadLibrary and GetProcAddress (indicating runtime API resolution)
  • Mismatched or zeroed compilation timestamps
  • Abnormal section sizes (raw size of zero with a large virtual size)

Dynamic Analysis

Dynamic analysis involves executing malware in a controlled environment and observing its behavior. This approach reveals what the malware actually does -- regardless of obfuscation or packing -- but requires careful containment to prevent the malware from spreading or causing damage.

Sandboxing

A sandbox is an isolated environment designed for safely executing suspicious files. Modern malware sandboxes run samples inside virtual machines or containers, monitor all system interactions, and produce detailed behavioral reports. The sandbox captures network traffic, file system modifications, registry changes, process creation, and API calls.

Sandbox PlatformTypeKey FeaturesDeployment
Cuckoo SandboxOpen sourceFull behavioral analysis, memory dumps, network captureSelf-hosted
Any.RunCommercial/free tierInteractive sandbox, real-time observationCloud
Joe SandboxCommercialDeep analysis, anti-evasion, multi-OS supportCloud/on-premise
Hybrid AnalysisFree (CrowdStrike)Falcon Sandbox engine, community sharingCloud
CAPE SandboxOpen sourceConfig extraction, payload dumping, Cuckoo forkSelf-hosted
Windows SandboxBuilt-in (Windows 10+)Lightweight disposable desktop, limited instrumentationLocal

Behavioral Monitoring

During dynamic analysis, analysts use specialized tools to monitor every action the malware takes:

  • Process Monitor (ProcMon): Captures real-time file system, registry, and process activity on Windows
  • Process Explorer: Provides detailed information about running processes, DLLs, and handles
  • Wireshark / tcpdump: Captures and analyzes network traffic generated by the malware
  • Regshot: Takes snapshots of the registry before and after execution, highlighting changes
  • API Monitor: Intercepts and logs API calls made by the malware
  • Fakenet-NG: Simulates network services to capture C2 communications without allowing real connections

Reverse Engineering

Reverse engineering is the most advanced and time-intensive form of malware analysis. It involves disassembling or decompiling the malware binary to understand its logic at the instruction level. This technique is essential for understanding custom encryption algorithms, unpacking routines, zero-day exploit mechanisms, and sophisticated C2 protocols.

The primary tools for reverse engineering are:

  • IDA Pro: The industry-standard interactive disassembler and debugger. Its Hex-Rays decompiler can reconstruct C-like pseudocode from assembly. Widely used in both government and private sector analysis.
  • Ghidra: A free, open-source reverse engineering suite released by the NSA in 2019. It includes a disassembler, decompiler, and scripting engine. Ghidra has rapidly become the most popular free alternative to IDA Pro.
  • x64dbg / x32dbg: Open-source user-mode debuggers for Windows that support plugins and scripting. Commonly used for unpacking and dynamic debugging.
  • Radare2 / Cutter: An open-source reverse engineering framework with a command-line interface (Radare2) and a Qt-based GUI (Cutter).
  • Binary Ninja: A modern reverse engineering platform with an intuitive interface and powerful intermediate language (IL) for cross-architecture analysis.

"Reverse engineering malware is like reading someone else's mind through their code. Every decision they made, every mistake, every clever trick -- it is all there in the binary." -- Chris Eagle, author of The IDA Pro Book

Analysts must be proficient in x86 and x86-64 assembly language, understand Windows internals (PE format, Win32 API, kernel structures), and be familiar with common compiler patterns to distinguish compiler-generated code from hand-written assembly.

Indicators of Compromise

Indicators of Compromise (IOCs) are forensic artifacts that identify malicious activity on a system or network. Extracting IOCs is one of the primary outputs of malware analysis, and these indicators are shared across organizations to enable collective defense.

IOC TypeDescriptionExamplesLongevity
File HashesCryptographic hashes of malicious filesSHA-256 of malware binaryLow (easily changed)
IP AddressesC2 server addresses192.168.x.x, 10.0.x.xLow (infrastructure changes)
Domain NamesC2 domains, phishing domainsevil-update.comMedium
URLsSpecific malicious URLs/gate.php, /panel/loginMedium
Registry KeysPersistence mechanismsHKLM\Software\Microsoft\Windows\CurrentVersion\RunHigh
Mutex NamesNamed mutexes for single-instance enforcementGlobal\MalwareMutex123High
File PathsDropped files, staging directories%TEMP%\svchost.exeMedium
YARA SignaturesPattern-based detection rulesByte sequences, string patternsHigh
TTPsTactics, techniques, and procedures (MITRE ATT&CK)T1059 (Command-Line Interface)Very high (hardest to change)

David Bianco's Pyramid of Pain framework ranks IOC types by how difficult they are for attackers to change. Hash values sit at the bottom (trivially changed), while TTPs sit at the top (fundamental to the attacker's methodology). Effective threat intelligence focuses on the higher levels of the pyramid.

YARA Rules

YARA is a pattern-matching tool designed to help malware researchers identify and classify malware samples. Created by Victor Alvarez at VirusTotal, YARA allows analysts to write rules that describe malware families based on textual or binary patterns, file properties, and logical conditions.

A YARA rule consists of three sections: meta (descriptive metadata), strings (patterns to search for), and condition (logic that determines a match). Here is an example:

rule Emotet_Loader { meta: description = "Detects Emotet loader variants" author = "Security Analyst" date = "2024-01-15" reference = "https://malpedia.caad.fkie.fraunhofer.de" strings: $mz = "MZ" $api1 = "VirtualAlloc" ascii $api2 = "CreateProcessW" ascii $xor_loop = { 8A 04 ?? 34 ?? 88 04 ?? 4? 75 F? } $mutex = "Global\\M" ascii condition: $mz at 0 and filesize < 500KB and 2 of ($api*) and ($xor_loop or $mutex)}

YARA rules can match on hex byte patterns (including wildcards and jumps), regular expressions, PE module attributes (imports, exports, sections), file size constraints, and more. The open-source community maintains large collections of YARA rules for known malware families, including the YARA-Rules repository on GitHub and Malpedia's curated rule sets.

Online Analysis Platforms

Several online platforms allow analysts to submit samples for automated analysis and compare results against community intelligence:

VirusTotal: Acquired by Google in 2012, VirusTotal scans uploaded files against 70+ antivirus engines and provides behavioral analysis, network indicators, community comments, and relationship graphs. It is the most widely used malware intelligence platform in the world. VirusTotal Intelligence allows searching across all submitted samples using YARA rules, content searches, and metadata queries.

MalwareBazaar: Operated by abuse.ch, MalwareBazaar is a free platform for sharing malware samples with the security community. It provides tagged samples, YARA rule matching, and integration with threat intelligence feeds.

Malpedia: Maintained by Fraunhofer FKIE, Malpedia is a curated repository of malware families with associated YARA rules, analysis reports, and actor attribution. It serves as a reference library for malware researchers.

MITRE ATT&CK: While not an analysis platform per se, MITRE ATT&CK provides a comprehensive knowledge base of adversary tactics and techniques. Analysts map their findings to ATT&CK to standardize reporting and enable cross-organization comparisons.

Anti-Analysis Techniques

Sophisticated malware employs numerous techniques to hinder analysis and evade detection:

  • Packing and Crypting: Compressing or encrypting the malware payload so that static analysis reveals only the unpacker stub. Common packers include UPX, Themida, and VMProtect. Custom packers are increasingly common in advanced threats.
  • Anti-VM Detection: Checking for virtual machine artifacts (VMware tools, VirtualBox guest additions, specific MAC address prefixes, registry keys) and refusing to execute or altering behavior in virtual environments.
  • Anti-Debugging: Using API calls like IsDebuggerPresent(), timing checks (rdtsc), hardware breakpoint detection, and structured exception handling tricks to detect and evade debuggers.
  • Code Obfuscation: Control flow flattening, dead code insertion, opaque predicates, and string encryption to make reverse engineering more difficult and time-consuming.
  • Environment Checks: Requiring specific system conditions (screen resolution, number of CPU cores, amount of RAM, presence of user files, mouse movement patterns) to distinguish real user machines from analysis environments.
  • Time Bombs: Delaying execution by a set period or until a specific date to evade short-duration sandbox analysis.

Analysis Workflow

A structured malware analysis workflow ensures thorough and repeatable results. Most organizations follow a phased approach that progressively deepens the level of analysis:

  1. Triage: Compute hashes, check against known samples on VirusTotal, classify the file type, and assess priority. Most samples are known variants and can be handled with automated tools.
  2. Static Analysis: Extract strings, analyze PE headers, identify packing, examine imports and exports. Look for obvious IOCs without running the sample.
  3. Dynamic Analysis: Execute in a sandbox environment, capture behavioral data. Monitor network connections, file drops, registry modifications, and process activity.
  4. Advanced Static Analysis: Disassemble and decompile the binary. Analyze code logic, encryption routines, C2 protocols, and exploitation techniques.
  5. Reporting: Document findings, extract IOCs, write YARA rules, map to MITRE ATT&CK, and share intelligence with the community and stakeholders.

To explore the types of malware that analysts encounter, see our pages on viruses, worms, trojans, ransomware, spyware, and rootkits.

References

  • Sikorski, M., & Honig, A. (2012). Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press.
  • Eagle, C. (2011). The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler. No Starch Press.
  • Zeltser, L. (2021). "Malware Analysis and Reverse Engineering Cheat Sheet." SANS Institute.
  • Bianco, D. (2013). "The Pyramid of Pain." Enterprise Detection & Response.
  • Alvarez, V. (2014). "YARA Documentation." VirusTotal / Google.
  • MITRE Corporation. "ATT&CK Framework." https://attack.mitre.org/
  • AV-TEST Institute. (2024). "Malware Statistics." https://www.av-test.org/
  • Ghidra. (2019). National Security Agency. https://ghidra-sre.org/
  • VirusTotal. "VirusTotal Documentation." https://docs.virustotal.com/
  • abuse.ch. "MalwareBazaar." https://bazaar.abuse.ch/