`gnuplot` includes a few general-purpose routines for interpolation and approximation of data; these are grouped under the smooth option. More sophisticated data processing may be performed by preprocessing the data externally or by using fit with an appropriate model.

Syntax:

smooth {unique | frequency | cumulative | kdensity | csplines | acsplines | bezier | sbezier}

`unique`, `frequency`, and `cumulative` plot the data after making them monotonic. Each of the other routines uses the data to determine the coefficients of a continuous curve between the endpoints of the data. This curve is then plotted in the same manner as a function, that is, by finding its value at uniform intervals along the abscissa (see samples) and connecting these points with straight line segments (if a line style is chosen).

If autoscale is in effect, the ranges will be computed such that the plotted curve lies within the borders of the graph.

If autoscale is not in effect, and the smooth option is either `acspline` or `cspline`, the sampling of the generated curve is done across the intersection of the x range covered by the input data and the fixed abscissa range as defined by xrange.

If too few points are available to allow the selected option to be applied, an error message is produced. The minimum number is one for `unique` and `frequency`, four for `acsplines`, and three for the others.

The smooth options have no effect on function plots.

— ACSPLINES —

The `acsplines` option approximates the data with a "natural smoothing spline". After the data are made monotonic in x (see `smooth unique`), a curve is piecewise constructed from segments of cubic polynomials whose coefficients are found by the weighting the data points; the weights are taken from the third column in the data file. That default can be modified by the third entry in the using list, e.g.,

plot 'data-file' using 1:2:(1.0) smooth acsplines

Qualitatively, the absolute magnitude of the weights determines the number of segments used to construct the curve. If the weights are large, the effect of each datum is large and the curve approaches that produced by connecting consecutive points with natural cubic splines. If the weights are small, the curve is composed of fewer segments and thus is smoother; the limiting case is the single segment produced by a weighted linear least squares fit to all the data. The smoothing weight can be expressed in terms of errors as a statistical weight for a point divided by a "smoothing factor" for the curve so that (standard) errors in the file can be used as smoothing weights.

Example:

sw(x,S)=1/(x*x*S) plot 'data_file' using 1:2:(sw($3,100)) smooth acsplines

— BEZIER —

The `bezier` option approximates the data with a Bezier curve of degree n (the number of data points) that connects the endpoints.

— CSPLINES —

The `csplines` option connects consecutive points by natural cubic splines after rendering the data monotonic (see `smooth unique`).

— SBEZIER —

The `sbezier` option first renders the data monotonic (`unique`) and then applies the `bezier` algorithm.

— UNIQUE —

The `unique` option makes the data monotonic in x; points with the same x-value are replaced by a single point having the average y-value. The resulting points are then connected by straight line segments.

— FREQUENCY —

The `frequency` option makes the data monotonic in x; points with the same x-value are replaced by a single point having the summed y-values. The resulting points are then connected by straight line segments. See also smooth.dem

— CUMULATIVE —

The `cumulative` option makes the data monotonic in x; points with the same x-value are replaced by a single point containing the cumulative sum of y-values of all data points with lower x-values (i.e. to the left of the current data point). This can be used to obtain a cumulative distribution function from data. See also smooth.dem

— KDENSITY —

The `kdensity` option is a way to plot a kernel density estimate (which is a smooth histogram) for a random collection of points, using Gaussian kernels. A Gaussian is placed at the location of each point in the first column and the sum of all these Gaussians is plotted as a function. The value in the second column is taken as weight of the Gaussian. (To obtain a normalized histogram, this should be 1/number-of-points). The value of the third column, if supplied, is taken as the bandwidth for the kernels. If only two columns have been specified, or if the value of the third column is zero or less, gnuplot calculates the bandwidth which would be optimal if the input data was normally distributed. (This will usually be a very conservative, i.e. broad bandwidth.)