Computational Chemistry: Minimization Tutorial and Exercises

Minimization in Computational Chemistry

Python Programming Tips

The programs in the previous pages had very limited usability: for example, our first program allowed us to find only the minimum of of a harmonic potential that has parameters k = 2743.0 and r_eq = 1.1283. That might be fine if all we are interested in is carbon monoxide, but what if you decide that you want to minimize the structures of oxygen or hydrogen fluoride instead? Furthermore, the program always starts the search at r_ini = 1.55 and finishes when the force is less than 0.01. However, to validate the program, one should verify that the minimum found is independent of the starting point and that the convergence criterion 0.01 is sufficient. Of course, one can change these values in the program but this gets difficult as the programs grow larger. In general, it is best to write programs that have a broad applicability and give the user full control over all the options without changing the program code.

Writing programs that offer grater flexibility requires learning a few programming techniques. We will illustrate two basic techniques, reading data from text files on the disk, and receiving program input from the arguments given to the program on the command line. The principles are quite similar across different languages, but implementations differ significantly across different computer languages. The examples here are given in Python because of the simplicity and power of the language.

Receiving Arguments from the Command Line

Let's modify the first program such that we can give the program two input arguments. The first input argument will be the name of a text file that has parameters for the potential energy function and the second argument is the value of the starting distance. In other words, we would start the program by typing:

HarmOsc_SD2.py CO_HarmOsc.par 1.40

In python, the zeroth argument is the name of the program (in our case, HarmOsc_SD2.py) and thus the name of the parameter file (CO_HarmOsc.par) will be the first argument. To tell the program that the first argument is a filename that needs to be opened for reading, we add a following line to the program:

par_file = open( sys.argv[ 1 ] )

The new program must also read in the starting distance, which will be given as the second argument. To tell the program that the second argument is a value for the initial distance, we add the following line to the program:

r_ini   = float(sys.argv[ 2 ])

The statement float() is needed because otherwise Python would interpret the second argument as a text string "1.55", which cannot be subtracted from the equilibrium distance.

Python Modules

One of the powers of Python as as programming language is its extensibility with modules. The core of Python provides a rather limited set of built-in functions. For example, float() in the above example is a built-in function. Other functions are contained in modules that must be imported before the function is used. For example, argv is part of the module that defines system-specific functionality; in order to use sys.arg, the sys module must be imported as:

import sys

You can access underlying operating system commands via the os module. Mathematical functions such as log and sin are found in the math module. Linear algebra operations, such as inv are in the numpy module. More advanced data scientific data analysis operations are collected in the scipy module. Many Python modules are explained at PythonInfo Wiki.

Reading a Text File

So far the program knows to open a file CO_HarmOsc.par for reading and associates the content of this file with par_file object. The content of the CO_HarmOsc.par file could be:

CO   2743.0   1.1283

To actually read the parameters from this file, readlines() and split() functions are used. They do what their names imply: the first reads individual lines from the file, the second splits individual lines into fields. Notice that the example below is more general than what is needed in this case because our file will have only one line for CO.

parameters = par_file.readlines()

for parameter in parameters:
    fields = parameter.split()
    mol  = (fields [0] )                        # Name of the molecule or bond
    k    = float (fields [1] )                  # Harmonic force constant
    r_eq = float (fields [2] )                  # Potential minimum distance

The example above also introduces a for loop. This for loop is repeated for each line in the parameter file; in this case only once. For loops are commonly used to perform iterations, such as repeated evaluation of energy and derivatives in the minimization program. For loop is one example of a program control structures. Two other common program control structures in Python are the while loop and the if statement.

Producing Nicer Output

The first program found a minimum all right but the output it produced was not particularly well organized. This was so because the print command produces unformatted output. Using

print "%3d   %6.4f      %8.3f    %8.3f " % (i, r, E, F)

instead of

print i, r, E, F

will give the desired format in which the loop counter is 3-digit integer, the distance value is printed with four decimal points, and energy and forces are given with three decimal points.

Graphical Output

In addition to producing text output, some computational programs also generate graphical output. The latter can be either displayed in the real time through a graphical user interface, or stored on a disk for for later display. For example, it would be nice to see how the function we minimize looks, and what is the path that a minimizer takes. Programming for graphical output is generally more challenging because one has to be familiar with the capabilities of the graphical user interface or the graphing programs. A powerful free scientific graphing program for UNIX computer is Xmgrace. The program accepts text-encoded input and produces plots that can be saved in a variety of formats. The text-encoded input that specifies the appearance of the plot can be automatically generated by the minimization program after adding the following lines

a_pot = open("harmosc_steepest_descent.tbl", 'w')

print >> a_pot, '@ TITLE "Steepest Descent Search Path" '
print >> a_pot, '@ XAXIS LABEL "Distance, Ang" '
print >> a_pot, '@ YAXIS LABEL "Energy, kcal/mol" '
print >> a_pot, '@ S0 line type 1'
print >> a_pot, '@ S1 line type 3'
print >> a_pot, '@ S1 line linestyle 2'
print >> a_pot, '@ S1 symbol 1'
print >> a_pot, '@ S1 symbol size 0.5'  

for i in range(40,195):
   r = 0.01*i
   E = 0.5 * k * (r - r_eq)**2
   print >> a_pot, r, E

Program Organization: Defining Functions

The last aspect of programming concerns the organization of the program. So far, we have used explicit formulas, such as E = 0.5 * k * (r - r_eq)**2 every time we need to calculate the energy. This is fine in short programs and short formulas but if the energy expression gets longer and is required in many parts of the program, the program gets difficult to read and maintain. It would be helpful if we can define energy once and later just refer to this energy function when needed. This is done in Python using the def statement

def Hamiltonian(r):
    pot = 0.5 * k * (r - r_eq)**2
    return pot

A More Flexible Minimization Program

We are now equipped to write a more useful minimization program. The program code is given in the next page but before you continue you should take time and try to write yourself a program that performs steepest descent minimization of a harmonic potential and gives the user a freedom to choose which parameter file to use for bond description, and where to start the minimization. The program should define commonly used functions and generate formatted text output. This exercise gives you more practice in programming and prepares you for the final assignment.