# Rewriting plot.qcc using ggplot2 and grid

The free and open-source R statistics package is a great tool for data analysis. The free add-on package qcc provides a wide array of statistical process control charts and other quality tools, which can be used for monitoring and controlling industrial processes, business processes or data collection processes. It’s a great package and highly customizable, but the one feature I wanted was the ability to manipulate the control charts within the grid graphics system, and that turned out to be not so easy.

I went all-in and completely rewrote qcc’s plot.qcc() function to use Hadley Wickham’s ggplot2 package, which itself is built on top of grid graphics. I have tested the new code against all the examples provided on the qcc help page, and the new ggplot2 version works for all the plots, including X-bar and R, p- and u- and c-charts.

In qcc, an individuals and moving range (XmR or ImR) chart can be created simply:

library(qcc)
my.xmr.raw <- c(5045,4350,4350,3975,4290,4430,4485,4285,3980,3925,3645,3760,3300,3685,3463,5200)
x <- qcc(my.xmr.raw, type = "xbar.one", title = "Individuals Chart\nfor Wheeler sample data")
x <- qcc(matrix(cbind(my.xmr.raw[1:length(my.xmr.raw)-1], my.xmr.raw[2:length(my.xmr.raw)]), ncol = 2), type = "R", title = "Moving Range Chart\nfor Wheeler sample data")


This both generates the plot and creates a qcc object, assigning it to the variable x. You can generate another copy of the plot with plot(x).

To use my new plot function, you will need to have the packages ggplot2, gtable, qcc and grid installed. Download my code from the qcc_ggplot project on Github, load qcc in R and then run source("qcc.plot.R"). The ggplot2-based version of the plotting function will be used whenever a qcc object is plotted.

library(qcc)
source("qcc.plot.R")
my.xmr.raw <- c(5045,4350,4350,3975,4290,4430,4485,4285,3980,3925,3645,3760,3300,3685,3463,5200)
x <- qcc(my.xmr.raw, type = "xbar.one", title = "Individuals Chart\nfor Wheeler sample data")
x <- qcc(matrix(cbind(my.xmr.raw[1:length(my.xmr.raw)-1], my.xmr.raw[2:length(my.xmr.raw)]), ncol = 2), type = "R", title = "Moving Range Chart\nfor Wheeler sample data")


Below, you can compare the individuals and moving range charts generated by qcc and by my new implementation of plot.qcc():

The qcc individuals chart as implemented in the qcc package.

The qcc individuals chart as implemented using ggplot2 and grid graphics.

The qcc moving range chart as implemented in the qcc package.

The qcc moving range chart as implemented using ggplot2 and grid graphics.

### New features

In addition to the standard features in qcc plots, I’ve added a few new options.

size or cex
Set the size of the points used in the plot. This is passed directly to geom_point().
font.size
Sets the size of text elements. Passed directly to ggplot() and grid’s viewport().
title = element_blank()
Eliminate the main graph title completely, and expand the data region to fill the empty space. As with qcc, with the default title = NULL a title will be created, or a user-defined text string may be passed to title.
new.plot
If TRUE, creates a new graph (grid.newpage()). Otherwise, will write into the existing device and viewport. Intended to simplify the creation of multi-panel or composite charts.
digits
The argument digits is provided by the qcc package to control the number of digits printed on the graph, where it either uses the default option set for R or a user-supplied value. I have tried to add some intelligence to calculating a default value under the assumption that we can tell something about the measurement from the data supplied. You can see the results in the sample graphs above.

### Lessons Learned

This little project turned out to be somewhat more difficult than I had envisioned, and there are several lessons-learned, particularly in the use of ggplot2.

First, ggplot2 really needs data frames when plotting. Passing discrete values or variables not connected to a data frame will often result in errors or just incorrect results. This is different than either base graphics or grid graphics, and while Hadley Wickham has mentioned this before, I hadn’t fully appreciated it. For instance, this doesn’t work very well:

my.test.data <- data.frame(x = seq(1:10), y = round(runif(10, 100, 300)))
my.test.gplot <- ggplot(my.test.data, aes(x = x, y = y)) +
geom_point(shape = 20)
index.1 <- c(5, 6, 7)
my.test.gplot <- my.test.gplot +
geom_point(aes(x = x[index.1], y = y[index.1]), col = "red")
my.test.gplot


Different variations of this sometimes worked, or sometimes only plotted some of the points that are supposed to be colored red.

However, if I wrap that index.1 into a data frame, it works perfectly:

my.test.data <- data.frame(x = seq(1:10), y = round(runif(10, 100, 300)))
my.test.gplot <- ggplot(my.test.data, aes(x = x, y = y)) +
geom_point(shape = 20)
index.1 <- c(5, 6, 7)
my.test.subdata <- my.test.data[index.1,]
my.test.gplot <- my.test.gplot +
geom_point(data = my.test.subdata, aes(x = x, y = y), col = "red")
my.test.gplot


Another nice lesson was that aes() doesn’t always work properly when ggplot2 is called from within a function. In this case, aes_string() usually works. There’s less documentation than I would like on this, but you can search the ggplot2 Google Group or Stack Overflow for more information.

One of the bigger surprises was discovering that aes() searches for data frames in the global environment. When ggplot() is used from within a function, though, any variables created within that function are not accessible in the global environment. The work-around is to tell ggplot which environment to search in, and a simple addition of environment = environment() within the ggplot() call seems to do the trick. This is captured in a stack overflow post and the ggplot2 issue log.

my.test.data <- data.frame(x = seq(1:10), y = round(runif(10, 100, 300)))
my.test.gplot <- ggplot(my.test.data, environment = environment(), aes(x = x, y = y)) +
geom_point(shape = 20)
index.1 <- c(5, 6, 7)
my.test.subdata <- my.test.data[index.1,]
my.test.gplot <- my.test.gplot +
geom_point(data = my.test.subdata, aes(x = x, y = y), col = "blue")
my.test.gplot


Finally, it is possible to completely and seamlessly replace a function created in a package and loaded in that package’s namespace. When I set out, I wanted to end up with a complete replacement for qcc’s internal plot.qcc() function, but wasn’t quite sure this would be possible. Luckily, the below code, called after the function declaration, worked. One thing I found was that I needed to name my function the same as the one in the qcc package in order for the replacement to work in all cases. If I used a different name for my function, it would work when I called plot() with a qcc object, but qcc’s base graphics version would be used when calling qcc() with the parameter plot = TRUE.

unlockBinding(sym="plot.qcc", env=getNamespace("qcc"));
assignInNamespace(x="plot.qcc", value=plot.qcc, ns=asNamespace("qcc"), envir=getNamespace("qcc"));
assign("plot.qcc", plot.qcc, envir=getNamespace("qcc"));
lockBinding(sym="plot.qcc", env=getNamespace("qcc"));


### Outlook

For now, the code suits my immediate needs, and I hope that you will find it useful. I have some ideas for additional features that I may implement in the future. There are some parts of the code that can and should be further cleaned up, and I’ll tweak the code as needed. I am certainly interested in any bug reports and in seeing any forks; good ideas are always welcome.

### References

• R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
• Scrucca, L. (2004). qcc: an R package for quality control charting and statistical process control. R News 4/1, 11-17.
• H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
• Wheeler, Donald. “Individual Charts Done Right and Wrong.” Quality Digest. 2 Feb 20102 Feb 2010. Print. <http://www.spcpress.com/pdf/DJW206.pdf>.

# Individuals and Moving Range Charts in R

Individuals and moving range charts, abbreviated as ImR or XmR charts, are an important tool for keeping a wide range of business and industrial processes in the zone of economic production, where a process produces the maximum value at the minimum costs.

While there are many commercial applications that will produce such charts, one of my favorites is the free and open-source software package R. The freely available add-on package qcc will do all the heavy-lifting. There is little documentation on how to create a moving range chart, but the code is actually quite simple, as shown below.

The individuals chart requires a simple vector of data. The moving range chart needs a two-column matrix arranged so that qcc() can calculate the moving range from each row.

library(qcc)
my.xmr.raw <- c(5045,4350,4350,3975,4290,4430,4485,4285,3980,3925,3645,3760,3300,3685,3463,5200)
#' Create the individuals chart and qcc object
my.xmr.x <- qcc(my.xmr.raw, type = "xbar.one", plot = TRUE)
#' Create the moving range chart and qcc object. qcc takes a two-column matrix
#' that is used to calculate the moving range.
my.xmr.raw.r <- matrix(cbind(my.xmr.raw[1:length(my.xmr.raw)-1], my.xmr.raw[2:length(my.xmr.raw)]), ncol=2)
my.xmr.mr <- qcc(my.xmr.raw.r, type="R", plot = TRUE)


This produces the individuals chart:

The qcc individuals chart.

and the moving range chart:

The qcc moving range chart.

The code is also available as a gist.

### References

• R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
• Scrucca, L. (2004). qcc: an R package for quality control charting and statistical process control. R News 4/1, 11-17.
• Wheeler, Donald. “Individual Charts Done Right and Wrong.” Quality Digest. 2 Feb 20102 Feb 2010. Print. <http://www.spcpress.com/pdf/DJW206.pdf>.

# Scientific Consensus

I have seen a number of comments lately that “consensus” has no place in science, and that claims that there is a “scientific consensus” are just thinly veiled political double-speak. I have to take issue with such criticisms. In fact, consensus is one of the cornerstones of science.

This is not quite the same consensus used in politics or everyday life. Consensus means “general agreement” and is usually achieved through some form of discussion and negotiation. Consensus is therefore agreement over opinions, and is often agreement over a course of action despite differing opinions.

In science, consensus is derived from data and independent replication of experiments. It is the consensus that an idea—specifically, a testable hypothesis—is correct. It is the expression of scientists that a hypothesis is (a) scientifically testable and falsifiable, (b) that it has not been falsified and (c) that it explains the universe better than competing hypotheses. It is a consensus derived from the replication of observations or tests by other researchers.

Part of the problem, here, is that science has a habit of taking everyday words and developing very specific meanings around them. This happens because scientists need to communicate clearly and exactly at times, and language is messy and full of fuzzy concepts. The same thing happens in a lot of occupations. For example, accountants also develop specific meanings for everyday words.

Scientists no longer argue over the validity of Newton’s hypothesis on the gravitational force because there is broad consensus that the hypothesis is correct (as far as it goes). Objects attract each other according to their mass and the distance between them, and there have been plenty of independent experiments confirming the specific relationship, F = -G M1m2 / r2. The consensus is so strong that it’s referred to as Newton’s law of gravity. Likewise, scientists no longer argue over the geocentric model of the universe because there is broad consensus—derived from data collected over centuries by many independent researchers—that the Earth is not at the center of the solar system, let alone the universe.

Conversely, there is very little consensus when it comes to the accelerating expansion of the universe. Cosmologists agree that the universe is expanding faster than can be explained by our current understanding of the universe, but there are many, conflicting hypotheses about the causes. There is consensus over the fact of the accelerating expansion but the data does not yet support consensus on the underlying physical processes or mechanism causing it, and so there is no consensus about the physical process.

In one sense, scientific consensus is stronger, or more robust, than we are used to thinking about in politics and everyday life, precisely because it is based on observation and careful analysis by independent groups. It’s not just consensus based on what we think might be true, or what we want to be true, but based on what careful observation tells us must be true. Scientific consensus is not subject to whim.

From another perspective, though, scientific consensus is much weaker than we are used to. In science, there is no downside to abandoning or overturning a consensus when the data points in a different direction. In fact, there is significant benefit to being the person who can overturn a previous consensus; we remember Galileo, Newton, Darwin, Einstein and others precisely because their work, collecting and analyzing data, was so pivotal in altering the scientific consensus. In everyday life, if you back out of a consensus agreement, it’s likely that others party to the agreement will feel betrayed. There can be a significant social cost to pay for backing out of a consensus, even when you are convinced that you are right. Scientists may sometimes feel this same social pressure, but the scientific method provides clear guidance for adopting or abandoning consensus, and it doesn’t focus on the people involved but rather on external, objective observations of how the universe works. Scientific consensus is, perhaps, more readily changed than conventional consensus.

So consensus does exist is science and it plays an important role in science. We should be careful to distinguish, though, between consensus based on independent replication of results and consensus based on preconceptions and social negotiation.

# And You Thought Physics Was *YAWN*

Part of my day job involves monitoring the renewable energy market, and particularly keeping abreast of storage technologies. It’s like combining my hobby with my job.

A new wind turbine design was recently announced, the SeaTwirl. It’s an off-shore turbine design using vertical blades. The key technological advance here is that it includes a method to store energy, so that it can continue to produce electricity when the wind stops blowing. This ability to deliver a constant output is important, because, as you may have heard, wind energy is intermittent; you only get electricity when the wind blows, and only to the degree that it’s blowing. Demand, unfortunately, doesn’t follow wind’s intermittancy—nobody stops to check that the wind is blowing before they turn on their lights—and the utilities, transmission system operators and distribution system operators all have to supply electricity to meet demand.

The SeaTwirl stores energy by integrating an unusual wind turbine design with a pumped hydro system. It has the turbine blades on a large circular ring, which rotates parallel to the water’s surface kind of like a hula-hoop. This ring is hollow (more or less), and is filled up with water when the wind blows. When the wind stops blowing, the momentum of the water keeps the tube spinning, generating electricity, and the water is allowed to drain back out, spinning a hydro turbine to generate electricity.

The nice thing about this is that the storage can always be “recharged” and it’s “free.” Or at least it seems to be. If you’ve ever held a bucket while spinning around, you know that spinning with an empty bucket takes a lot less effort than spinning around with a full bucket. In part, this is because of a property known as the moment of inertia. The heavier or larger a spinning object gets, the more it resists changes to its rate of rotation (or rpm).

If the SeaTwirl is filling up this horizontal “hula-hoop” with water, then the weight of the tube is increasing and so is the moment of inertia. As the moment of inertia increases, the energy needed to reach a given rpm increases. Wind turbines normally generate electricity in proportion to the wind speed, because the rpm of the blades is proportional to the wind speed. Increase the moment of inertia and you decrease the rpm, which means you generate less electricity for a given wind speed.

Now comes the physics. For something shaped like a hula-hoop, the moment of inertia, I, is calculated from the mass, M, and the radius of the hoop, R, according to:

$I = M R^{2}$

The energy, E, in a spinning object is equal to the moment of inertia times the speed of rotation, ω, according to

$E=\frac{1}{2} I \omega ^{2}$

If we know the energy (because we know the wind speed), then we can calculate the speed of rotation, ω, by rearranging that equation to get ω on the left-hand side:

$\omega = \sqrt{\frac{2E}{I}}$

We can then replace I with mass and radius from the first equation to get

$\omega = \sqrt{\frac{2E}{MR^{2}}}$

So we can see that, if we don’t change the energy E (or don’t change the wind speed), and don’t change the radius R of the spinning hoop, then increasing the mass M results in a slower rate of rotation.

From SeaTwirl’s website and press releases, we can estimate how big the SeaTwirl is, which will let us estimate how much slower a full SeaTwirl will spin than an empty one, and therefore how much less electricity must be generated. We can calculate this by taking the ratio of ω full to ω empty, so that the parts that we don’t have to know E and R.

$\frac{\omega_{full}}{\omega_{empty}} = \frac{\sqrt{\frac{2E}{M_{full}R^{2}}}}{\sqrt{\frac{2E}{M_{empty}R^{2}}}} = \sqrt{\frac{\frac{2E}{R^{2}}}{\frac{2E}{R^{2}}}}\sqrt{\frac{M_{empty}}{M_{full}}}=\sqrt{\frac{M_{empty}}{M_{full}}}$

For the SeaTwirl, we now have to find out what R is, and estimate M for both filled and empty

The whole turbine assembly is made of composite materials, which probably have a density, $\rho_{c}$, of around 2500 kilograms per cubic meter (similar to fiberglass). Water has a density, $\rho_{w}$, of near 1000 kilograms per cubit meter (depending on temperature). The diameter of the turbine will be near 180 meters, so the radius, R, of our “hula-hoop” is half that, or 90 meters. From the pictures, it looks like the thickness of that hula-hoop is a few percent of the total diameter of the turbine, so we can figure an outside diameter of the “hula-hoop” of about 2 meters, for a radius, r, of 1 meter. Figure that at least ten percent of this is composite, and the rest is the hollow, water-filled portion.

To estimate the weight of the water in the “hula-hoop,” we can approximate the water as being a cylinder of radius $r_{w} = 0.9r$ and length equal to the circumference of the “hula-hoop,” $l = 2\pi R$. The volume of such a cylinder is equal to the cross-sectional area of the water column, $A_{w}=\pi r_{w}^{2}$ times the length of the column, l. The total mass of the water, $m_{w}$ is the density times this volume.

$m_{w} = \rho_{w}\pi r_{w}^{2}2\pi R = \rho_{w}3\pi (0.9r)^{2} R$

Plugging in our estimates for the above values gives us

$m_{w} = 1000 \cdot 3\pi (0.9)^{2} 90 = 687000$ kg

That’s a lot of water.

Now for the empty “hula-hoop.” We can treat it in the same way: a cylinder of material of radius r, length $l = 2\pi R$. However, we don’t want to calculate for a solid cylinder of composite; we have to subtract out the hollow part with radius $r_{w}$. So the mass of the composite is

$m_{c} = \rho_{c} 2 \pi R ( \pi r^{2} - \pi r^{2}_{w} )$

$m_{c} = 2500 \cdot 3 \pi 90 (1^{2} - 0.9^{2} ) = 403000$ kg

So the water more than doubles the weight of the hoop.

From the picture, you can see that there’s another hoop at the top, and the two hoops are connected by the turbines, which combined are probably worth at least another hoop in weight, so we can further assume that the mass of this bottom hoop, empty, is roughly one-third of the total mass of the movable parts of turbine.

The mass of the turbine, empty, is therefore about 1200000 kg, or 1200 tons. Filled with water, this goes up to about 1900000 kg, or 1900 tons. Empty, that’s maybe twice the largest off-shore turbine currently in existence, but this thing is easily twice as big as any current turbine, so our estimate appears to be in the right neighborhood.

Now we go back to our equation for the ratio of the rotational velocities, ω, and plug in these weights:

$\frac{\omega_{full}}{\omega_{empty}} = \sqrt{\frac{M_{empty}}{M_{full}}} =\sqrt{\frac{1200}{1900}} = 0.8$

So we get about 80% as much electricity from a water-filled turbine as from an empty one, when the wind blows. This is a direct efficiency loss due to the storage of energy in the spinning-water-hoop.

In addition, there’s the efficiency losses in loading water into the hoop, or “charging” the hoop, and the efficiency losses of “discharging” the hoop, running the water back out through a hydro turbine. Pumped hydro is usually about 72% efficient, or less, in each direction, so the total round-trip efficiency of storage + discharge is about 50% efficient. There are a lot of other storage technologies that do at least this good, if not better.

These two figures, the 80% efficiency loss of just operating the turbine and the 50% storage-discharge efficiency, can be used to directly compare SeaTwist with other wind turbine + storage technology solutions. Any storage technology that has at least a 50% round-trip efficiency and increases the total system cost by less than 20% over the system’s operating lifetime will outperform SeaTwist in terms of return on investment.

# Is there a Market for Premium R Packages?

Nathan Yau, of the excellent FlowingData blog, recently asked on his Twitter stream:

I wonder if there’s a market for premium R packages, like there is for say, @wordpress themes and plugins

There are some great packages available for R, all of which are currently free. I think it would be great if authors like Hadley Wickham and Ian Fellows received remuneration for their efforts. However, I see a trap here.

From my perspective, R has two main barriers to adoption: the learning curve and IT support.

The learning curve is steep enough that casual users will not get very far, and infrequent users tend to slide backwards and have to relearn (I’ve had to develop a mind map of common functions to help mitigate this problem for myself). R packages generally address the learning curve. There aren’t many packages that provide functions that the user couldn’t have created for themselves with base R, but the packages make using their functions much easier. ggplot2 and its support packages plyr and reshape make a perfect example. The default R graphical output is pretty good, but ggplot2 offers better aesthetic defaults and provides an easier path to advanced functions, like transforming data and adding fitted curves and “ribbons.”

IT departments will not have any readily available, professional support should problems arise with any R installation. I’ve seen a couple of IT departments balk at supporting open source software for this very reason, and one of them balked at supporting R for this reason. IT departments must evaluate software through the lenses of incident response and down time. However good the community, open source software leaves a big uncertainty when planning for support budgets. The only solution that I’ve found for IT is to convince them that they don’t have to support R; I can do it myself. I’m sure some of you are luckier that way, and it seems that Revolution is slowly addressing this issue, but it’s not an issue that has been generally addressed in the community. In addition, IT departments are usually responsible for ensuring that all software installed on their organization’s computers is legally licensed to the organization. With everything currently free, that’s a problem that is easily overcome.

Make ggplot2, or any other package, available only to those who can pay, and you exacerbate the two main problems with R: great functionality that flattens the learning curve will be lost to a large segment of users (i.e. casual or infrequent users and cash-strapped users like students), and IT departments will have to choose between actively supporting R and simply banning it. Providing user-installable R packages where some are freely licensed and others are not would create an environment where some IT departments would simply ban R rather than have to sort out the licensing issues. I suspect that many would ban R.

We need a way to repay package authors for their time, without losing the benefits of freely available packages. Donationware seems like a good first step, even though the response rate is typically very low.

# R Function Reference

Updated below

The R Function Reference is a mind map that I created as a guide for novice and intermediate users of the R statistics language. When you first open it, I suggest that you collapse all the nodes by clicking on the “Expand/Collapse all nodes” button in the bottom left of the screen to make the map easier to navigate. You can also adjust the zoom level with the slider next to that button.

The top-level nodes of the R Function Reference

The mind map is arranged in eight sections, or main branches, arranged by task. What do you want to do? Each branch covers a general set tasks, such as learning to use R, running R, working with data, statistical analysis or plotting data. The end of each string of nodes is generally a function and example. The Reference provides code fragments, rather than details of the function or complete reproducible code blocks. Once you’ve followed the Reference and have an idea of how to accomplish something, you can look up the details in R’s help system (e.g. “?read.csv” to learn more about using the read.csv() function), or search Google or the online R-Help mailing list archives for answers using the function name.

There are a lot of useful nodes and examples, especially in the “Graphs” section, but the mind map is not complete; some trails end before you get to a useful function reference. I am sorry for that, but it’s a work in progress, and will be slowly updated over time.

### Update 1

In comments, several users reported problems opening the mind map. With a little investigation, it appears that the size of the mind map is the problem. To try to fix the problems , I have split the mind map out into several small mind maps, all linked together.

The new main mind map is the R Function Reference, Main. The larger branches on this main map no longer expand to their own content, but contain a link to a “child” mind map. The link looks like a sheet of paper with an arrow pointing to the right, click on it and little cartoon speech bubble will pop up with a link that you have to click on to go to the child mind map. Likewise, the central nodes on the child mind maps contain a link back to the main mind map.

Due to load times and the required extra clicks, this may slightly reduce usability for users who didn’t have a problem with the all-in-one version, but will hopefully make the mind map accessible to a broader audience.

I have to offer praise to the developers of Mind42. Though I couldn’t directly split branches off into their own mind maps or duplicate the mind map, it was very easy to export the mind map as a native Mind42 file and then import it multiple times, editing the copies without any loss of data or links. The ability to link directly between mind maps within Mind42 was also a key enabling feature. Considering that this is a free web app, its capabilities are most impressive. They were also quick to respond when I posted a call for help on the Mind42 forum.

Please let me know how the new, “improved” version works.

The old mind map, containing everything, is still available, but I will not update it.

# Process Stability

(Updated below)

While performing a web search, I remembered how difficult the concept of “process stability” can be. How do you know when a process is stable?

D. C. Montgomery, one of the recognized authorities on the subject of statistical process control, seems to give conflicting advice on this. For instance, he’s careful to point out the assumptions underlying all of the measures that one would use on a process, and unstable processes invalidate most or all of these assumptions. How do you know if a process is stable if none of your analyses are applicable?

Process stability needs an operational definition. Luckily, there are at least two:

1) No signals on the appropriate process behavior chart (a.k.a. control chart);

2) Cpk / Ppk == 1 and Cp / Pp == 1

Signals on a process behavior chart do not necessarily mean that a process is out of control (i.e. false signals are possible, and expected at certain mathematically determinable rates), but we can be sure of process stability if there are no signals.

Likewise, we can take issue with using the process capability indices Pp, Ppk, Cp and Cpk in this manner. All assume a normal distribution, which you only get with a stable process, so you shouldn’t trust them as measures of process capability. In this case, that’s fine: don’t report the actual values; just report the ratio of Cp to Pp or Cpk to Ppk. When the ratio is 1, the process is stable; the larger the ratio, the worse the process. Donald Wheeler discusses this use of Ppk and Cpk, and the measures’ relation to production costs, in his latest column for Quality Digest.

Whether or not the process is economical (i.e. Cpk and Ppk are high enough) is a question completely separate from stability.

### Update:

I was discussing this with a friend who, for various reasons, needs to allow for some process drift. In other words, a Ppk less than Cpk is expected and acceptable, but only up to a certain point. The nice thing about the Cpk/Ppk ratio is that it’s simple: a ratio of 1 means the process is stable; a ratio greater than 1 means the process is not stable; a ratio of less than 1 means someone has made a mistake or is lying. If we need to allow for some process drift, we lose this simplicity.

So suppose that we have a Cpk of 1.66. There are then five standard deviations between the process mean and the nearest specification limit. Assuming a process drift of 1.5 Sigmas, our Ppk is 1.16, giving us a ratio Cpk/Ppk of 1.43. If, however, our Cpk is 1.00, then a process drift of 1.5 Sigmas gives us a Cpk/Ppk ratio of 2.00.

With an allowed process drift of a fixed number of Sigma, it’s no longer so simple to determine, from the Cpk/Ppk ratio, whether or not a process is “stable” within the limits set by management.

A slightly more sophisticated calculation is needed, then. What we can calculate is the ratio

(Short Term SigmaLong Term Sigma) / Allowed Process Drift

If the result is less than or equal to 1, then the process is “good enough” (i.e. within our allowed drift). If the ratio is greater than 1, then the process is considered out of control and action needs to be taken to eliminate sources of variation. If the ratio is less than 0, then someone made a mistake or is lying (i.e. long-term Sigma can never be less than short-term Sigma).