Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

Manuscript Title: L2CXCV: A FORTRAN 77 PACKAGE FOR LEAST SQUARES CONVEX / CONCAVE DATA SMOOTHING
Authors: I. C. Demetriou
Program title: L2CXCV
Catalogue identifier: ADXM_v1_0
Distribution format: tar.gz
Journal reference: Comput. Phys. Commun. 174(2006)643
Programming language: FORTRAN 77.
Computer: PC Intel Pentium, Sun Sparc Ultra 5, Hewlett-Packard HP UX 11.0.
Operating system: WINDOWS 98, WINDOWS 2000, Unix/Solaris 7, Unix/HPUX11.0.
RAM: O(n), where n is the number of data
Word size: 8 bits in a byte
Keywords: Antarctica temperatures, approximation, convex, concave, data fitting, divided difference, increasing / diminishing rates of change, inflection point, growth, least squares, noise, quadratic programming, S-curve, sigmoid, smoothing, spline, utility function.
PACS: 02.60.Ed, 02.60.Pn.
Classification: 4.9.

Nature of problem:
  • Analysis of processes that show initially increasing and then decreasing rates of change (sigmoid shape), as for example in heat curves, reactor stability conditions, evolution curves, photoemission yields, growth models, utility functions etc.
  • Identifying an unknown convex / concave (sigmoid) function from some measurements of its values that contain random errors. Also, identifying the inflection point of this sigmoid function.

Solution method:
Univariate data smoothing by minimizing the sum of the squares of the residuals (least squares approximation) subject to the condition that the second order divided differences of the smoothed values change sign at most once. Ideally, this is the number of sign changes in the second derivative of the underlying function. The remarkable property of the smoothed values is that they consist of one separate section of optimal components that give nonnegative second divided differences (convexity) and one separate section of optimal components that give nonpositive second divided differences (concavity). The solution process finds the joint (that is the inflection point estimate of the underlying function) of the sections automatically. The underlying method is iterative, each iteration solving a structured strictly convex quadratic programming problem in order to obtain a convex or a concave section over a subrange of data.

Restrictions:
Number of data, n, is not limited in the software package, but is limited to 2000 in the main driver. The total work of the method requires 2n-2 structured quadratic programming calculations over subranges of data, which in practice does not exceed the amount of O(n2) computer operations.

Unusual features:
Summary; L2CXCV is a package of Fortran 77 subroutines for least squares smoothing to n univariate data values contaminated by random errors subject to one sign change in the second divided differences of the smoothed values, where the location of the sign change is unknown. The piecewise linear interpolant to the smoothed values satisfies the same sign condition in the second derivative, so it gives a convex / concave fit to the data. The underlying algorithm is based on the property that in a best convex / concave fit, the convex and the concave section are both optimal and separate. The algorithm is iterative, each iteration solving a strictly convex quadratic programming problem for the best convex fit to the first k data, starting from the best convex fit to the first k-1 data. By reversing the order and sign of the data, the algorithm obtains the best concave fit to the last n-k data. Then it chooses that k as the optimal position of the required sign change (which defines the inflection point of the fit), if the convex and the concave components to the first k and the last n-k data, respectively, form a convex-concave vector that gives the least sum of squares of residuals. In effect the algorithm requires at most 2n-2 quadratic programming calculations over subranges of data. The package employs a technique for quadratic programming, which takes advantage of a B-spline representation of the smoothed values and makes use of some efficient O(k) updating procedures, where k is the number of data of a subrange. The package has been tested on a variety of data sets and it has performed very efficiently, terminating in an overall number of active set changes that is about n, thus exhibiting quadratic performance in n. The Fortran codes have been designed to minimize the use of computing resources. Attention has been given to computer rounding errors details, which are essential to the robustness of the software package. A numerical example with output is provided to help the use of the software. Distribution material that includes driver programs, technical details of the installation of the package and test examples that demonstrate the use of the software is available independently of this paper in an ASCII file.

Additional comments:
The single / double precision version of the package consists of 2866 Fortran lines including comments. The distribution material amounts to 800 kb and includes driver programs, single and double precision versions of L2CXCV, installation guidelines and test examples.

Running time:
CPU time on a PC with an Intel 733 MHz processor operating in Windows 98: About 2 seconds to smooth n=1000 noisy measurements that follow the shape of the sine function over one period.