Introduction
A computer virus is a type of malicious software that, when executed, replicates itself by modifying other computer programs and inserting its own code into them. The term "virus" was first used in this context by computer scientist Fred Cohen in his 1986 PhD dissertation, drawing an analogy to biological viruses that replicate by injecting their genetic material into host cells.
The defining characteristic that distinguishes viruses from other malware is parasitic self-replication. A virus cannot exist independently -- it must attach itself to a host program, document, or boot sector. When the host is executed, the virus code runs as well, infecting additional hosts. This dependence on a host and on human action to spread differentiates viruses from worms, which spread autonomously without host programs or human intervention.
"A computer virus is a program that can 'infect' other programs by modifying them to include a possibly evolved copy of itself. Every program that gets infected can also act as a virus, and thus the infection grows." -- Fred Cohen, "Computer Viruses: Theory and Experiments" (1984)
History of Computer Viruses
The conceptual foundations of self-replicating programs predate modern computing. In 1949, mathematician John von Neumann described the theoretical possibility of self-reproducing automata. The first programs that could be considered virus-like appeared in the early 1970s, though they were experimental rather than malicious.
| Year | Virus/Event | Significance |
|---|---|---|
| 1971 | Creeper | First self-replicating program on ARPANET; displayed "I'm the creeper, catch me if you can!" |
| 1982 | Elk Cloner | First virus to spread in the wild; infected Apple II boot sectors via floppy disk |
| 1986 | Brain | First IBM PC virus; boot sector virus created by Basit and Amjad Farooq Alvi in Pakistan |
| 1987 | Vienna, Cascade, Jerusalem | Early file infectors; Jerusalem triggered payload on Friday the 13th |
| 1988 | Morris Worm | First major internet worm (technically a worm, not a virus); prompted creation of CERT |
| 1992 | Michelangelo | Boot sector virus that caused worldwide media panic despite limited actual infections |
| 1995 | Concept | First macro virus; infected Word documents; demonstrated a new attack vector |
| 1999 | Melissa | Macro virus/worm hybrid; spread via Outlook; caused $80 million in damage |
| 2000 | ILOVEYOU | VBScript virus; $10 billion in damage; infected 10% of internet-connected computers |
| 2003 | Sobig.F | Fastest spreading email virus at the time; generated 1 million copies in 24 hours |
| 2010 | Stuxnet | Sophisticated nation-state virus targeting Iranian nuclear centrifuges; used four zero-days |
How Viruses Work
Infection Mechanism
When a virus infects a program, it modifies the host file to include the virus code. The modification must be done in such a way that the virus code executes before (or instead of) the original program code. Common techniques include:
- Prepending: The virus inserts itself at the beginning of the host file. When the program runs, the virus executes first, then transfers control to the original code.
- Appending: The virus adds itself to the end of the host file and modifies the entry point to jump to the virus code. After executing, the virus restores the original entry point and transfers control.
- Cavity Infection: The virus inserts itself into unused spaces (code caves) within the host file without changing its size, making detection more difficult.
- Entry Point Obscuring (EPO): Rather than modifying the file's entry point, the virus patches a call instruction somewhere in the middle of the host code to redirect to the virus, making it harder to find.
The Virus Lifecycle
A virus typically passes through four phases:
- Dormant Phase: The virus is present on the system but inactive. Not all viruses have this phase; some activate immediately.
- Propagation Phase: The virus replicates itself by infecting new host files. It searches for uninfected targets (executables, documents, boot sectors) and modifies them to include a copy of itself. Most viruses include an infection marker to avoid infecting the same file twice.
- Triggering Phase: The virus activates its payload based on a trigger condition. Common triggers include specific dates (Jerusalem: Friday the 13th), boot counts (Elk Cloner: every 50th boot), or random conditions.
- Payload Phase: The virus executes its malicious payload. Payloads range from harmless messages and graphical effects to destructive actions like file deletion, disk formatting, or data corruption.
Types of Viruses
Boot Sector Viruses
Boot sector viruses infect the Master Boot Record (MBR) or Volume Boot Record (VBR) of storage devices. They load into memory during the boot process, before the operating system, giving them control over the system from the earliest stage. Boot sector viruses spread when an infected storage medium (historically floppy disks, later USB drives) is used to boot a computer.
The Brain virus (1986), the first IBM PC virus, was a boot sector virus. It replaced the boot sector with its own code, moving the original boot sector to another location on the disk. When the system booted from the infected disk, Brain loaded first, installed itself in memory, and intercepted disk access calls to hide its presence and infect other disks.
Boot sector viruses were dominant in the late 1980s and early 1990s but declined as floppy disk usage decreased. However, the concept lives on in modern bootkits that target the UEFI boot process.
File Infectors
File infector viruses attach themselves to executable programs (on Windows, typically .exe and .dll files). When the infected program runs, the virus code executes and searches for other executables to infect. File infectors were the most common virus type through the 1990s and early 2000s.
File infectors vary in their targeting strategy:
- Direct-action viruses: Search for files to infect each time they execute, then transfer control to the host. They do not stay resident in memory.
- Memory-resident viruses: Install themselves in memory and intercept system calls. They infect files as they are opened, executed, or copied, remaining active until the system is rebooted.
- Companion viruses: Create a separate file with the same name but a different extension that takes precedence in execution order (e.g., creating PROGRAM.COM alongside PROGRAM.EXE, since DOS executed .COM files first).
Macro Viruses
Macro viruses exploit the macro programming languages built into applications like Microsoft Word, Excel, and Access. Rather than infecting executable files, they infect documents and templates. When a user opens an infected document, the macro executes and infects the global template (Normal.dot in Word), which then infects every subsequently opened or created document.
The Concept virus (1995) was the first macro virus, demonstrating that data files could carry executable code. The Melissa virus (1999) combined macro virus techniques with email propagation, sending infected documents to the first 50 entries in the victim's Outlook address book. Macro viruses dominated the malware landscape from 1995 to 2002.
Microsoft's decision to disable macros by default in Office applications (and later to block macros in files downloaded from the internet entirely, announced in 2022) has significantly reduced the macro virus threat, though malicious macro documents remain a common delivery mechanism for trojans.
Evasion Techniques
Polymorphic Viruses
Polymorphic viruses encrypt their code with a variable encryption key each time they replicate, producing different-looking copies that evade signature-based detection. The virus body is encrypted and therefore looks different in each infected file, but the decryption routine (the "decryptor") must remain in plaintext to decrypt the virus at runtime.
Early antivirus programs could detect polymorphic viruses by scanning for the decryptor code. In response, advanced polymorphic viruses generate unique decryptors for each copy using techniques like register reassignment, instruction substitution, code reordering, and garbage code insertion. The Mutation Engine (MtE), created by the virus author "Dark Avenger" in 1991, was the first polymorphic engine available as a toolkit, allowing any virus writer to make their creations polymorphic.
Metamorphic Viruses
Metamorphic viruses go further than polymorphism by rewriting their entire code with each generation. Rather than encrypting a static body, the virus contains a metamorphic engine that analyzes its own code, disassembles it into an intermediate representation, applies transformations (instruction substitution, register reassignment, code transposition, subroutine permutation), and reassembles a functionally equivalent but structurally different version.
| Technique | Polymorphic | Metamorphic |
|---|---|---|
| Core mechanism | Encrypt virus body with variable key | Rewrite entire code each generation |
| Decryptor needed | Yes (constant-like stub in plaintext) | No (no encryption used) |
| Code variation | Encrypted body looks different; decryptor similar | Entire code is structurally different |
| Detection difficulty | Medium-High (decryptor analysis, emulation) | Very High (no consistent signatures) |
| Implementation complexity | Moderate | Very high (requires code analysis engine) |
| Notable examples | Storm Worm, Virlock, MtE-based viruses | Zmist (Zombie.Mistfall), Simile, Regswap |
The Zmist (Zombie.Mistfall) virus, created by the virus author "Z0mbie" in 2001, is considered the most sophisticated metamorphic virus ever written. It could disassemble its host program, integrate its own code into the host's code flow (code integration), and reassemble the result -- making it nearly impossible to distinguish virus code from host code.
Notable Viruses in History
ILOVEYOU (2000): A VBScript virus distributed as an email attachment named "LOVE-LETTER-FOR-YOU.TXT.vbs." When opened, it overwrote files with copies of itself, emailed itself to all Outlook contacts, and downloaded a password-stealing trojan. It infected an estimated 45 million computers within two days, causing $10 billion in damage and prompting the Pentagon, CIA, and British Parliament to shut down their email systems. The virus was traced to two programmers in the Philippines, but no charges were filed because the Philippines had no computer crime laws at the time.
CIH / Chernobyl (1998): One of the most destructive viruses ever created. It infected Windows PE executables using the cavity infection technique, fitting into unused spaces in PE section headers. On April 26 (the anniversary of the Chernobyl disaster), it attempted to overwrite the system's BIOS flash memory, potentially rendering the computer unbootable. It also overwrote the first megabyte of the hard drive, destroying the partition table. Its creator, Chen Ing-Hau, was eventually convicted under Taiwan's computer crime laws.
Stuxnet (2010): The most sophisticated virus ever discovered, Stuxnet was a joint US-Israeli operation (codenamed "Olympic Games") targeting Iran's Natanz uranium enrichment facility. It spread via USB drives and network shares, used four zero-day exploits, and carried stolen digital certificates. Its payload specifically targeted Siemens S7-315 and S7-417 PLCs controlling centrifuge motors, causing them to spin at destructive speeds while reporting normal readings to operators. Stuxnet destroyed approximately 1,000 of Iran's 6,000 centrifuges.
"Stuxnet was the first true cyber weapon. It crossed the line from the digital world into the physical world, causing actual physical destruction." -- Ralph Langner, the researcher who first analyzed Stuxnet's payload
Antivirus and Defense
The computer virus threat drove the creation of the antivirus industry in the late 1980s. Detection techniques have evolved significantly:
| Detection Method | How It Works | Strengths | Limitations |
|---|---|---|---|
| Signature-Based | Compares file content against a database of known virus byte patterns | Fast, accurate for known viruses, low false positives | Cannot detect unknown or polymorphic viruses |
| Heuristic Analysis | Analyzes code structure and behavior patterns for suspicious characteristics | Can detect unknown viruses with similar patterns | Higher false positive rate |
| Emulation/Sandboxing | Executes suspicious code in a virtual environment to observe behavior | Defeats encryption and packing; reveals true behavior | Slow; anti-emulation techniques exist |
| Behavioral Monitoring | Monitors running programs for virus-like behavior (mass file modification, etc.) | Detects zero-day viruses based on actions, not signatures | Can only detect after execution begins |
| Machine Learning | Classifies files using ML models trained on millions of malware and benign samples | Can generalize to unknown variants | Adversarial samples can evade ML models |
| Integrity Monitoring | Detects unauthorized changes to system files and executables | Reliably detects file infection | Generates alerts for legitimate updates |
Viruses vs. Other Malware
The term "virus" is often used colloquially to refer to all malware, but in technical usage it refers specifically to self-replicating code that requires a host. Understanding the distinctions is important:
| Characteristic | Virus | Worm | Trojan |
|---|---|---|---|
| Self-replication | Yes (requires host) | Yes (standalone) | No |
| Requires host program | Yes | No | Disguises as legitimate program |
| Requires human action to spread | Yes (running infected program) | No (spreads autonomously) | Yes (user must execute) |
| Primary spread method | Infected files, removable media | Network exploitation | Social engineering |
| Speed of spread | Slow (depends on sharing) | Very fast (automated) | Varies (depends on distribution) |
Modern Relevance
Traditional file-infecting viruses have declined significantly since the 2000s, replaced by trojans, ransomware, and fileless malware as the dominant threat categories. Several factors contributed to this decline:
- The shift from local file sharing to internet-based distribution reduced the effectiveness of file infection as a spreading mechanism
- Improved OS security features (DEP, ASLR, code signing) made file infection more difficult
- Antivirus software became ubiquitous and effective against known virus techniques
- Cybercrime shifted toward profit-motivated attacks where trojans and ransomware are more effective
However, virus techniques remain relevant. File infection is still used by some advanced threats, macro-based attacks continue to be a major initial access vector, and the evasion techniques pioneered by virus writers (polymorphism, metamorphism, anti-analysis) are now standard features of all sophisticated malware. For analysis of modern malware, see malware analysis.
References
- Cohen, F. (1986). "Computer Viruses." PhD Dissertation, University of Southern California.
- Szor, P. (2005). The Art of Computer Virus Research and Defense. Addison-Wesley Professional.
- Filiol, E. (2005). Computer Viruses: From Theory to Applications. Springer.
- Ludwig, M. (1993). The Giant Black Book of Computer Viruses. American Eagle Publications.
- Zetter, K. (2014). Countdown to Zero Day: Stuxnet and the Launch of the World's First Digital Weapon. Crown.
- Langner, R. (2013). "To Kill a Centrifuge: A Technical Analysis of What Stuxnet's Creators Tried to Achieve." The Langner Group.
- von Neumann, J. (1949). "Theory of Self-Reproducing Automata." University of Illinois Press (edited by A. Burks, 1966).
- AV-TEST Institute. (2024). "Malware Statistics and Trends." https://www.av-test.org/
- Microsoft. (2022). "Macros from the Internet Will Be Blocked by Default in Office." Microsoft Tech Community.
- CERT/CC. (1988). "CERT Advisory CA-1988-01: Internet Worm." Carnegie Mellon University.