Subscribe For Free Updates!

We'll not spam mate! We promise.

0

The simple COM file infector which we just developed might be good instruction on the basics of how to write a virus, but it is severely limited. Since it only attacks COM files in the current directory, it will have a hard time proliferating. In this chapter, we will develop a more sophisticated virus that will overcome these limitations. . . . a virus that can infect EXE files and jump directory to directory and drive to drive. Such improvements make the virus much more complex, and also much more dangerous. We started with something simple and relatively innocuous in the last chapter. You can’t get into too much trouble with it. However, I don’t want to leave you with only children’s toys. The virus we discuss in this chapter, named INTRUDER, is no toy. It is very capable of finding its way into computers all around the world, and deceiving a very capable computer whiz.

The Structure of an EXE File

An EXE file is not as simple as a COM file. The EXE file is designed to allow DOS to execute programs that require more than 64 kilobytes of code, data and stack. When loading an EXE file, DOS makes no a priori assumptions about the size of the file, or what is code or data. All of this information is stored in the EXE file itself, in the EXE Header at the beginning of the file. This
EXE File Header


Relocation Pointer Table



EXE Load Module



Figure 10: The layout of an E XE file.
header has two parts to it, a fixed-length portion, and a variable length table of pointers to segment references in the Load Module, called the Relocation Pointer Table. Since any virus which attacks EXE files must be able to manipulate the data in the EXE Header, we’d better take some time to look at it. Figure 10 is a graphical representation of an EXE file. The meaning of each byte in the header is explained in Table 1.
When DOS loads the EXE, it uses the Relocation Pointer Table to modify all segment references in the Load Module. After that, the segment references in the image of the program loaded into memory point to the correct memory location. Let’s consider an example (Figure 11): Imagine an EXE file with two segments. The segment at the start of the load module contains a far call to the second segment. In the load module, this call looks like this:
Address        Assembly Language             Machine Code
0000:0150      CALL    FAR 0620:0980         9A 80 09 20 06
From this, one can infer that the start of the second segment is 6200H (= 620H x 10H) bytes from the start of the load module. The
Figure 11: An example of relocating code.
Table 1: S tructure of the E XE Header.
Offset Size Name                  Description
 0      2  Signature             These bytes are the characters M                                   and Z in every EXE file and iden                                  tify the file as an EXE file. If                                   they are anything else, DOS will                                   try to treat the file as a COM                                   file.
 2      2  Last Page Size        Actual number of bytes in the                                   final 512 byte page of the file                                   (see Page Count).
 4      2  Page Count            The number of 512 byte pages in                                   the file. The last page may only                                   be partially filled, with the                                   number of valid bytes specified in                                   Last Page Size. For example a file                                   of 2050 bytes would have Page Size                                   = 4 and Last Page Size = 2.   6      2  Reloc Table Entries   The number of entries in the re                                  location pointer table   8      2  Header Paragraphs     The size of the EXE file header                                   in 16 byte paragraphs, including                                   the Relocation table. The header                                   is always a multiple of 16 bytes                                   in length.
0AH     2  MINALLOC              The minimum number of 16 byte                                   paragraphs of memory that the pro                                  gram requires to execute. This is                                   in addition to the image of the                                   program stored in the file. If                                   enough memory is not available,                                   DOS will return an error when it                                   tries to load the program.  0CH     2  MAXALLOC              The maximum number of 16 byte                                   paragraphs to allocate to the pro                                  gram when it is executed. This is                                   normally set to FFFF Hex, except                                   for TSR’s.
0EH     2  Initial ss            This contains the initial value                                   of the stack segment relative to                                   the start of the code in the EXE                                   file, when the file is loaded.                                   This is modified dynamically by                                   DOS when the file is loaded, to                                   reflect the proper value to store                                   in the ss register.
10H     2  Initial sp            The initial value to set sp to                                   when the program is executed.  12H     2  Checksum              A word oriented checksum value                                   such that the sum of all words in                                   the file is FFFF Hex. If the file                                   is an odd number of bytes long,                                   the lost byte is treated as a                                   word with the high byte = 0.                                   Often this checksum is used for                                   nothing, and some compilers do                                   not even bother to set it properOffset Size Name                   Description
12H        (Cont)                 properly. The INTRUDER virus                                    will not alter the checksum.  14H     2  Initial ip             The initial value for the                                    instruction pointer, ip, when                                    the program is loaded.  16H     2  Initial cs             Initial value of the code seg                                   ment relative to the start of                                    the code in the EXE file. This                                    is modified by DOS at load time.  18H     2  Relocation Tbl Offset  Offset of the start of the                                    relocation table from the start                                    of the file, in bytes.  1AH     2  Overlay Number         The resident, primary part of a                                    program always has this word set                                    to zero. Overlays will have dif                                   ferent values stored here.
Table 1: S tructure of the E XE Header (continued).
Relocation Pointer Table would contain a vector 0000:0153 to point to the segment reference (20 06) of this far call. When DOS loads the program, it might load it starting at segment 2130H, because DOS and some memory resident programs occupy locations below this. So DOS would first load the Load Module into memory at 2130:0000. Then it would take the relocation pointer 0000:0153 and transform it into a pointer, 2130:0153 which points to the segment in the far call in memory. DOS will then add 2130H to the word in that location, resulting in the machine language code 9A 80 09 50 27, or CALL FAR 2750:0980 (See Figure 11).
Note that a COM program requires none of these calisthenics since it contains no segment references. Thus, DOS just has to set the segment registers all to one value before passing control to the program.

Infecting an EXE File

A virus that is going to infect an EXE file will have to modify the EXE Header and the Relocation Pointer Table, as well as adding its own code to the Load Module. This can be done in a whole variety of ways, some of which require more work than others. The INTRUDER virus will attach itself to the end of an EXE program and gain control when the program first starts. This will require a routine similar to that in TIMID, which copies program code from memory to a file on disk, and then adjusts the file.
INTRUDER will have its very own code, data and stack segments. A universal EXE virus cannot make any assumptions about how those segments are set up by the host program. It would crash as soon as it finds a program where those assumptions are violated. For example, if one were to use whatever stack the host program was initialized with, the stack could end up right in the middle of the virus code with the right host. (That memory would have been free space before the virus had infected the program.) As soon as the virus started making calls or pushing data onto the stack, it would corrupt its own code and self-destruct.
To set up segments for the virus, new initial segment values for cs and ss must be placed in the EXE file header. Also, the old initial segments must be stored somewhere in the virus, so it can pass control back to the host program when it is finished executing. We will have to put two pointers to these segment references in the relocation pointer table, since they are relocatable references inside the virus code segment.
Adding pointers to the relocation pointer table brings up an important question. To add pointers to the relocation pointer table, it may sometimes be necessary to expand that table’s size. Since the EXE Header must be a multiple of 16 bytes in size, relocation pointers are allocated in blocks of four four byte pointers. Thus, if we can keep the number of segment references down to two, it will be necessary to expand the header only every other time. On the other hand, the virus may choose not to infect the file, rather than expanding the header. There are pros and cons for both possibilities. On the one hand, a load module can be hundreds of kilobytes long, and moving it is a time consuming chore that can make it very obvious that something is going on that shouldn’t be. On the other hand, if the virus chooses not to move the load module, then roughly half of all EXE files will be naturally immune to infection. The INTRUDER virus will take the quiet and cautious approach that does not infect every EXE. You might want to try the other approach as an exercise, and move the load module only when necessary, and only for relatively small files (pick a maximum size). Suppose the main virus routine looks something like this:
VSEG   SEGMENT
VIRUS:        mov     ax,cs               ;set ds=cs for virus        mov     ds,ax
      .
      .        .        mov     ax,SEG HOST_STACK   ;restore host stack        cli        mov     ss,ax
      mov     sp,OFFSET HOST_STACK        sti
      jmp     FAR PTR HOST        ;go execute host
Then, to infect a new file, the copy routine must perform the following steps:
  1. Read the EXE Header in the host program.
  2. Extend the size of the load module until it is an evenmultiple of 16 bytes, so cs:0000 will be the first byte of the virus.
  3. Write the virus code currently executing to the end ofthe EXE file being attacked.
  4. Write the initial values of ss:sp, as stored in the EXE Header, to the locations of SEG HOST_STACK and OFFSET HOST_STACK on disk in the above code.
  5. Write the initial value of cs:ip in the EXE Header to the location of FAR PTR HOST on disk in the above code.
  6. Store Initial ss=SEG VSTACK, Initial sp=OFFSET VSTACK, Initial cs=SEG VSEG, and Initial ip=OFFSET VIRUS in the EXE header in place of the old values.
  7. Add two to the Relocation Table Entries in the EXEheader.
  8. Add two relocation pointers at the end of the Relocation Pointer Table in the EXE file on disk (the location of these pointers is calculated from the header). The first pointer must point to SEG HOST_STACK in the instruction
           mov     ax,HOST_STACK
The second should point to the segment part of the
           jmp     FAR PTR HOST instruction in the main virus routine.
  1. Recalculate the size of the infected EXE file, andadjust the header fields Page Count and Last Page Size accordingly.
  2. Write the new EXE Header back out to disk.
All the initial segment values must be calculated from the size of the load module which is being infected. The code to accomplish this infection is in the routine INFECT in Appendix B.

A Persistent File Search Mechanism

As in the TIMID virus, the search mechanism can be broken down into two parts: FIND_FILE simply locates possible files to infect. FILE_OK, determines whether a file can be infected.
The FILE_OK procedure will be almost the same as the one in TIMID. It must open the file in question and determine whether it can be infected and make sure it has not already been infected. The only two criteria for determining whether an EXE file can be infected are whether the Overlay Number is zero, and whether it has enough room in its relocation pointer table for two more pointers. The latter requirement is determined by a simple calculation from values stored in the EXE header. If
16*Header Paragraphs-4*Relocation Table Entries-Relocation Table Offset
is greater than or equal to 8 (=4 times the number of relocatables the virus requires), then there is enough room in the relocation pointer table. This calculation is performed by the subroutine REL_ROOM, which is called by FILE_OK.
To determine whether the virus has already infected a file, we put an ID word with a pre-assigned value in the code segment at a fixed offset (say 0). Then, when checking the file, FILE_OK gets the segment from the Initial cs in the EXE header. It uses that with the offset 0 to find the ID word in the load module (provided the virus is there). If the virus has not already infected the file, Initial cs will contain the initial code segment of the host program. Then our calculation will fetch some random word out of the file which probably won’t match the ID word’s required value. In this way FILE_OK will know that the file has not been infected. So FILE_OK stays fairly simple.
However, we want to design a much more sophisticated FIND_FILE procedure than TIMID’s. The procedure in TIMID could only search for files in the current directory to attack. That was fine for starters, but a good virus should be able to leap from directory to directory, and even from drive to drive. Only in this way does a virus stand a reasonable chance of infecting a significant portion of the files on a system, and jumping from system to system.
To search more than one directory, we need a tree search routine. That is a fairly common algorithm in programming. We write a routine FIND_BR, which, given a directory, will search it for an EXE which will pass FILE_OK. If it doesn’t find a file, it will proceed to search for subdirectories of the currently referenced directory. For each subdirectory found, FIND_BR will recursively call itself using the new subdirectory as the directory to perform a search on. In this manner, all of the subdirectories of any given directory may be searched for a file to infect. If one specifies the directory to search as the root directory, then all files on a disk will get searched.
Making the search too long and involved can be a problem though. A large hard disk can easily contain a hundred subdirectories and thousands of files. When the virus is new to the system it will quickly find an uninfected file that it can attack, so the search will be unnoticably fast. However, once most of the files on the system are already infected, the virus might make the disk whirr for twenty seconds while examining all of the EXE’s on a given drive to find one to infect. That could be a rather obvious clue that something is wrong.
To minimize the search time, we must truncate the search in such a way that the virus will still stand a reasonable chance of infecting every EXE file on the system. To do that we make use of the typical PC user’s habits. Normally, EXE’s are spread pretty evenly throughout different directories. Users often put frequently used programs in their path, and execute them from different directories. Thus, if our virus searches the current directory, and all of its subdirectories, up to two levels deep, it will stand a good chance of infecting a whole disk. As added insurance, it can also search the root directory and all of its subdirectories up to one level deep. Obviously, the virus will be able to migrate to different drives and directories without searching them specifically, because it will attack files on the current drive when an infected program is executed, and the program to be executed need not be on the current drive.
When coding the FIND_FILE routine, it is convenient to structure it in three levels. First is a master routine FIND_FILE, which decides which subdirectory branches to search. The second level is a routine which will search a specified directory branch to
Figure 12: Logic of the file search routines.
a specified level, FIND_BR. When FIND_BR is called, a directory path is stored as a null terminated ASCII string in the variable USEFILE, and the depth of the search is specified in LEVEL. At the third level of the search algorithm, one  routine searchs for EXE files (FINDEXE) and two search for subdirectories (FIRSTDIR and NEXTDIR). The routine that searches for EXE files will call FILE_OK to determine whether each file it finds is infectable, and it will stop everything when it finds a good file. The logic of this searching sequence is illustrated in Figure 12. The code for these routines is also listed in Appendix B.

Anti-Detection Routines

A fairly simple anti-detection tactic can make this virus much more difficult for the human eye to locate: Simply don’t allow the search and copy routines to execute every time the virus gets control. One easy way of doing that is to look at the system clock, and see if the time in ticks (1 tick = 1/18.2 seconds) modulo some number is zero. If it is, execute the search and copy routines, otherwise just pass control to the host program. This anti-detection routine will look like this:
SHOULDRUN:        xor     ah,ah        ;read time using        int     1AH          ;BIOS time of day service        and     al,63        ret
This routine returns with z set roughly one out of 64 times. Since programs are not normally executed in sync with the clock timer, it will essentially return a z flag randomly. If called in the main control routine like this:
      call    SHOULDRUN
      jnz     FINISH       ;don’t infect unless z set        call    FIND_FILE
      jnz     FINISH       ;don’t infect without valid file        call    INFECT FINISH:
the virus will attack a file only one out of every 64 times the host program is called. Every other time, the virus will just pass control to the host without doing anything. When it does that, it will be completely invisible even to the most suspicious eye.
The SHOULDRUN routine would pose a problem if you wanted to go and infect a system with it. You might have to sit there and run the infected program 50 or 100 times to get the virus to move to one new file on that system. That is annoying, and problematic if you want to get it into a system with minimal risk. Fortunately, a slight change can fix it. Just change SHOULDRUN to look like this:
SHOULDRUN:        xor     ah,ah SR1:   ret        int     1AH        and     al,63        ret and include another routine to modify the SHOULDRUN routine,
SETSR:        mov     al,90H          ;NOP instruction = 90H        mov     BYTE PTR [SR1],al        ret which can be incorporated into the main control routine like this:
      call    SHOULDRUN        jnz     FINISH        call    SETSR        call    FIND_FILE        jnz     FINISH        call    INFECT FINISH:
After SETSR has been executed, and before INFECT, the SHOULDRUN routine becomes
SHOULDRUN:        xor     ah,ah SR1:   nop        int     1AH        and     al,63        ret
since the 90H which SETSR puts at SR1 is just a NOP instruction. When INFECT copies the virus to a new file, it copies it with the modified SHOULDRUN procedure. The result is that the first time the virus is executed, it definitely searches for a file and infects it. After that it goes to the 1-out-of-64 infection scheme. In this way, you can take the virus as assembled into the EXE, INTRUDER.EXE, and run it and be guaranteed to infect something. After that, the virus will infect the system more slowly.
Another useful tactic that we do not employ here is to make the first infection very rare, and then more frequent after that. This might be useful in getting the virus through a BBS, where it is carefully checked for infectious behavior, and if none is seen, it is passed around. (That’s a hypothetical situation only, please don’t do it!) In such a situation, no one person would be likely to spot the virus by sitting down and playing with the program for a day or two, even with a sophisticated virus checker handy. However, if a lot of people were to pick up a popular and useful (infected) program that they used daily, they could all end up infected and spreading the virus eventually.
The tradeoff in restraining the virus to infect only every one in N times is that it slows the infection rate down. What might take a day with no restraints may take a week, a month, or even a year, depending on how often the virus is allowed to reproduce. There are no clear rules to determine what is best—a quickly reproducing virus or one that carefully avoids being noticed—it all depends on what you’re trying to do with it.
Another important anti-detection mechanism incorporated into INTRUDER is that it saves the date and time of the file being infected, along with its attribute. Then it changes the file attribute to read/write, performs the modifications on the file, and restores the original date, time and attribute. Thus, the infected EXE does not have the date and time of the infection, but its original date and time. The infection cannot be traced back to its source by studying the dates of the infected files on the system. Also, since the original attribute is restored, the archive bit never gets set, so the user who performs incremental backups does not find all of his EXE’s getting backed up one day (a strange sight indeed). As an added bonus, the virus can infect read-only and system files without a hitch.

Passing Control to the Host

The final step the virus must take is to pass control to the host program without dropping the ball. To do that, all the registers should be set up the same as they would be if the host program were being executed without the virus. We already discussed setting up cs:ip and ss:sp. Except for these, only the ax register is set to a specific value by DOS, to indicate the validity of the drive ID in the FCB’s in the PSP. If an invalid identifier (i.e. “D:”, when a system has no D drive) is in the first FCB at 005C, al is set to FF Hex, and if the identifier is valid, al=0. Likewise, ah is set to FF if the identifier in the FCB at 006C is invalid. As such, ax can simply be saved when the virus starts and restored before it transfers control to the host. The rest of the registers are not initialized by DOS, so we need not be concerned with them.
Of course, the DTA must also be moved when the virus is first fired up, and then restored when control is passed to the host. Since the host may need to access parameters which are stored there, moving the DTA temporarily is essential since it avoids overwriting those parameters during the search operation.

WARNING


Unlike the TIMID virus, INTRUDER contains no notice that it is infecting a file. It contains nothing but routines that will help it reproduce. Although it is not intentionally destructive, it is extremely infective and easy to overlook. . . and difficult to get rid of once it gets started. Therefore, DO NOT RUN THIS VIRUS, except in a very carefully controlled environment. The listing in Appendix B contains the code for the virus. A locator program, FINDINT, is also supplied, so if you do run the virus, you’ll be able to see which files have been infected by it.

About The Author
Hasan Shaikh is the founder and admin of ShmHack, a popular blog dedicated for Learners,Geeks and Bloggers. He is currently 19 years old and loves to post articles related to blogging,SEO,adsense,hacking,security,social medias,computer and android. Find more about him...

Post a Comment

Write Your Precious Comments Here.!

 
Top