USER NOTE
SNS14U
Issue 1
Aug 1998
Program Debugging on the Unix Service

Contents

  • Introduction
  • Basic dbx commands
  • Example dbx session
  • Another dbx session
  • Other useful tools

     

    1. INTRODUCTION

    Run-time errors in Fortran and C programs can be quite difficult to track down by manual means. Often the error messages displayed by the system are impossible to understand and don’t tell you where in the source code the problem lies. The unix system has a debugging tool called dbx which can greatly speed up the resolution of run-time errors. It may be used with programs written in either C or Fortran and compiled with the system compilers cc and f77. In this note the program used in the example happens to be written in Fortran 77. The gnu debugging tool called gdb has similar facilities and some additional features not present in dbx. Type man gdb for more information about it.

    This user note provides a simple illustration of the use of the dbx debugging tool showing the more frequently used commands. For more information about all dbx facilities the online help within dbx may be used to list all available commands and to display information on the use of each.

    Consider the following simple example. The Fortran program mistake reads an array size from a file and then reads values into an array of that size. It then calls a routine to work out the mean of the values in the array. For the purpose of this example the program is intentionally trivial and the program units have been written in two separate source files, mistake.f and average.f, to demonstrate how a program in multiple source files is handled. It is not necessarily a recommendation that all programs should be written this way.

    In order to demonstrate the use of dbx the program has been written with a deliberate mistake. The statement to work out the mean of the values in the array is as follows:
     
     

    mean = sum / nsize Where sum is the previously calculated sum of all the values in the array.)

    This is OK provided that the value of nsize is not zero but, unfortunately, the program has not been written to check first that the value of nsize is valid. (Apart from being greater than zero it must also be no greater than the declared size of the array.)

    File mistake.f contains:

        program mistake
    c
    c Simple example to demonstrate dbx facilities
    c
        implicit none
        integer nsize, maxsize, i
        parameter( maxsize=10 )
        real array( maxsize ), mean
    c Open a data file
        open( unit=7, file='mistake.dat' )
    c Read array size and values
        read( 7, * ) nsize, ( array(i), i=1,nsize )
        call average( array, maxsize, nsize, mean )
        print *, 'Mean is ', mean
        stop
        end

    File average.f contains:

    c
        subroutine average( array, maxsize, nsize, mean )
        implicit none
        integer nsize, maxsize, i
        real array( maxsize ), sum, mean
        sum = 0.0
        do 1 i = 1, nsize
            sum = sum + array( i )
    1   continue
        mean = sum / nsize
        return
        end

    This program could be compiled with the following command:

    % f77 -o mistake mistake.f average.f

    This creates an executable file called mistake from the two component source files. The resulting executable file can be run simply by typing its name:
     

    % mistake (The program then displays the following on the screen:)

    Mean is NaN
    Note: IEEE floating-point exception flags raised:
    Invalid Operation;
    See the Numerical Computation Guide, ieee_flags(3M)

    The program has run to completion but the answer printed is "NaN" which stands for "Not a number". It is the result of an expression whose value is mathematically undefined. The only other indication that something went wrong is the note at the end referring to a "floating-point exception flag" being raised, the flag in question being "Invalid Operation". Translated into English it means that at some point the program tried to do something which is not allowed but it gives no indication of what the operation was or where. In a big program it would be impossible to find the error given this little information.

    The system tries a bit harder if you compile the program with the option -fnonstd. Note that this option applies only to the f77 compiler:

    % f77 -fnonstd -o mistake mistake.f average.f

    Running the program again gives the following output:

    % mistake

    Floating point exception 7, invalid operand, occurred at address 10400000.

    Abort

    This time the system has attempted to show where the error occured but all it gives is an address in the executable file and this cannot be related to the original lines of Fortran. This time it does not attempt to print an answer since the option -fnonstd causes execution to stop at the point where the invalid operation occured.

     

    2. BASIC DBX COMMANDS

    Life is made very much easier by running the program under the control of the debugging tool dbx. This is done by first compiling the program with the extra option -g:

    % f77 -g -fnonstd -o mistake mistake.f average.f

    Note that the option -fnonstd must still be used as well since we still want the program to stop in the event of a run-time error.

    The program is then run within dbx by typing:

    % dbx mistake

    The program does not run automatically. Instead, after displaying a few lines of information, dbx presents you with its prompt:

    (dbx)

    You may then type a variety of dbx commands, the most commonly used being:

    run  To start program execution.
    stop  To define a statement or program unit where the program is to stop (to allow you to look at the value of program variables for example). These locations where execution is to stop are sometimes called "breakpoints". At least one breakpoint should be defined before running the program. Otherwise it will run without any interaction with the user which may in fact be enough to find the error on some occasions. Breakpoints are useful if, having found where the error occurs you then want to find out why.
    print  Print the value of a program variable, array (or in Fortran 77 a section of an array).
    step  Execute the next program statement. You can do this only if the program has previously stopped at a breakpoint.
    next  The same as step except that it does not go "into" the code of any called routines or procedures.
    cont  Continue execution until the next breakpoint is reached (or the program stops).
    list  Display source lines. Useful in determining where to set a breakpoint.
    where  Display the stack of subroutine or procedure calls which are active at the time. 
    help  By itself gives a list of all the dbx commands. When followed by a command name it displays information about that command.

    3. EXAMPLE DBX SESSION

    The dbx commands listed above are best appreciated by looking at a simple example. Now run the program mistake, this time under the control of dbx. Firstly compile with the option -g (and -fnonstd).

    % f77  -g  -fnonstd  -o mistake  mistake.f  average.f

    Then run dbx. User input is in bold type and comments on each command follow in italics.
     

    % dbx mistake (Start a dbx session, loading the executable file "mistake". Dbx then prints a few information messages before displaying its prompt (dbx) to show that it is ready for the user to type commands.)

    Reading symbolic information for mistake
    Reading symbolic information for rtld /usr/lib/ld.so.1
    Reading symbolic information for /opt/SUNWspro/lib/libF77.so.3
    Reading symbolic information for /opt/SUNWspro/lib/libsunmath.so.1
    Reading symbolic information for /opt/SUNWspro/lib/libm.so.1
    Reading symbolic information for /usr/lib/libc.so.1
    Reading symbolic information for /usr/lib/libdl.so.1
    Reading symbolic information for /usr/platform/SUNW,Ultra-2/lib/libc_psr.so.1
     

    (dbx) run  (Run the program without setting any breakpoints. Often this will be enough to determine the location of the error.) 

    Running: mistake
    (process id 2360)
    Floating point exception 7, invalid operand, occurred at address 10400000.
    signal ABRT (abort) in _kill at 0xef5f434c
    _kill+8: bgeu _kill+0x30
     

    Current function is average (Dbx prints out the name of the routine and the source line.) 
     
    10 mean = sum / nsize        (Dbx prints out the name of the routine and the source line where the error occured. The line number is relative to the start of the source file which contains the routine.) 
     
    (dbx)     list 1,10  (Display source lines to try to determine how the error occured.  The lines displayed are from the currently active routine. The dbx command "func" can be used to change the scope in order to display lines from another part of the program.) 

     1 c
     2     subroutine average( array, maxsize, nsize, mean )
     3     implicit none
     4     integer nsize, maxsize, i
     5     real array( maxsize ), sum, mean
     6     sum = 0.0
     7     do 1 i = 1, nsize
     8         sum = sum + array( i )
     9 1   continue
    10     mean = sum / nsize
     

    (dbx) stop in average (Prepare to run the program again by setting a breakpoint.  In this case the breakpoint is the first executable statement in routine average. The program can be run as often as required within a single dbx session.) 
    (2) stop in average
    (dbx) run (Run the program again.)

    Running: mistake
    (process id 2532)
    stopped in average at line 6 in file "average.f"

    6 sum = 0.0 (Dbx displays the line at which it has stopped.)
    (dbx) stop at 10 (Set another breakpoint, further on at line number 10 in the currently active source file.) 

    (3) stop at "average.f":10

    (dbx) cont (Continue execution.)

    stopped in average at line 10 in file "average.f"

    10 mean = sum / nsize (Dbx again displays the line where it has stopped. Note that this line has not yet been executed. It will be the next statement to execute when the user types "next", "step" or "cont".)
    (dbx) print sum (Display the current value of the program variable "sum".)

    sum = 0.0

    (dbx) print array (Sum has an unexpected value of zero. Now print out the values in the array to try to get more information. Note that print array(4:6) for example would print just the 3 elements of the array starting with the 4th. If array had had 2 dimensions then print array(2:4,5:7) would print a 3x3 section. The printing of array sections works for Fortran only.)
    array =
    (1) 0.0
    (2) 0.0
    (3) 0.0
    (4) 0.0
    (5) 0.0
    (6) 0.0
    (7) 0.0
    (8) 0.0
    (9) 0.0
    (10) 0.0
    (dbx) print nsize (The array values don’t seem correct so now print the value of the array size.) 
    nsize = 0
    (dbx) quit (An array size of zero suggests incorrect values in the data file so the dbx session can be terminated.)
     
    % cat mistake.dat  (All becomes clear when the data file is examined; it contains just one zero. In reality an examination of the data file would have been the first course of action.) 

    0

     

    4. ANOTHER DBX SESSION TO ILLUSTRATE MORE COMMANDS

    % dbx mistake

    Reading symbolic information for mistake
    Reading symbolic information for rtld /usr/lib/ld.so.1
    Reading symbolic information for /opt/SUNWspro/lib/libF77.so.3
    Reading symbolic information for /opt/SUNWspro/lib/libsunmath.so.1
    Reading symbolic information for /opt/SUNWspro/lib/libm.so.1
    Reading symbolic information for /usr/lib/libc.so.1
    Reading symbolic information for /usr/lib/libdl.so.1
    Reading symbolic information for /usr/platform/SUNW,Ultra-2/lib/libc_psr.so.1

    (dbx) stop in MAIN  (Stop at the first executable statement in the program. Note that MAIN must be in upper case.)
    (2) stop in MAIN
    (dbx) help (Help by itself displays a list of all dbx commands.)

    Command Summary

    Execution and Tracing

    cancel catch clear cont delete fix
    fixed handler ignore intercept next pop
    replay rerun restore run save status
    step stop trace unintercept when whocatches

    Displaying and Naming Data

    assign call demangle dis display down dump
    examine exists frame hide inspect print undisplay
    unhide up whatis where whereami whereis which

    Accessing Source Files

    bsearch cd edit file files func
    funcs line list loadobject loadobjects module
    modules pathmap pwd search use

    Debugging Multiple Threads

    lwp lwps thread threads

    Run Time Checking

    check showleaks suppress uncheck unsuppress

    Miscellaneous

    collector dalias dbxbugreport dbxenv debug
    detach document help history import
    kalias kill language make quit
    setenv sh source ! !!

    Debugger

    button menu toolenv unbutton unmenu

    Machine Level

    examine listi nexti stepi stopi tracei wheni

    Language Specific Information

    c++

    Other Topics

    alias callbacks changes .dbxrc editing FAQ
    events fix-pitfalls follow-fork forwardref invocation ksh
    lwpid MT path redirection registers rtc
    rtc8M scope signals startup tid

    The command `help <cmdname>' provides additional
    help for each command or topic. See `help changes'
    for new and changed features, and `help FAQ' for
    answers to frequently asked questions about dbx.
     
     

    (dbx) help step (Display information on the command "step".)

    step # Single step one line (step INTO calls).

    # With MT programs when a function call is skipped
    # over, all LWPs are implicitly resumed for the
    # duration of that function call in order to
    # avoid deadlock.
    # Non-active threads cannot be stepped.

    step <n> # Single step <n> lines (step INTO calls)

    step up # ... and out of the current function

    step ... -sig <sig> # ... and deliver the given signal

    step ... <tid> # Step the given thread. Does not apply to `step up'.

    step ... <lwpid> # Step the given LWP. Will not implicitly resume all LWPs when skipping a function.

    When an explicit <tid> or <lwpid> is given, the deadlock avoidance measure of the generic `step' is defeated.

    (dbx) run  (Start the program running. It stops immediately because of the breakpoint set at the first statement in the main program.) 

    Running: mistake
    (process id 2943)
    stopped in MAIN at line 10 in file "mistake.f"

    10 open( unit=7, file='mistake.dat' )
     

    (dbx) step (Execute the next statement.)

    stopped in MAIN at line 12 in file "mistake.f"

    12 read( 7, * ) nsize, ( array(i), i=1,nsize )
     

    (dbx) step (Execute the next statement.)

    stopped in MAIN at line 13 in file "mistake.f"

    13 call average( array, maxsize, nsize, mean )

    (dbx) quit (Finish this session.) 
    %

     

    5. OTHER USEFUL TOOLS FOR FORTRAN PROGRAMS

    5.1 Array bound checking

    A common error is to allow an array subscript to take a value beyond the declared size of the array. In Fortran this can be guarded against very simply by using the -C option when compiling. This will cause all array subscripts to be checked at run-time and an error message will be displayed if one goes out of range.

    5.2 F77 Tools

    The unix service has a suite of tools for analysing and transforming Fortran 77 programs. The most useful of these is nag_pfort which performs a very strict check on the source code and may find errors which the compiler does not detect.

    Type:

    % nag_pfort prog.f (Assuming that prog.f is the name of the source file. In the case of a program in two or more source files these would have to be concatenated into a single file before using nag_pfort. )

    For more information about nag_pfort type man nag_pfort.

    For more information about other F77 tools type man nag_analysers or man nag_transformers.

    ©Lancaster University   ISS Governance   Computer User Agreement   Privacy & Cookies Notice  

Lancaster University
Bailrigg
LancasterLA1 4YW United Kingdom
+44 (0) 1524 65201