R Function Reference

Updated below

The R Function Reference is a mind map that I created as a guide for novice and intermediate users of the R statistics language. When you first open it, I suggest that you collapse all the nodes by clicking on the “Expand/Collapse all nodes” button in the bottom left of the screen to make the map easier to navigate. You can also adjust the zoom level with the slider next to that button.

R Function Reference screenshot

The top-level nodes of the R Function Reference

The mind map is arranged in eight sections, or main branches, arranged by task. What do you want to do? Each branch covers a general set tasks, such as learning to use R, running R, working with data, statistical analysis or plotting data. The end of each string of nodes is generally a function and example. The Reference provides code fragments, rather than details of the function or complete reproducible code blocks. Once you’ve followed the Reference and have an idea of how to accomplish something, you can look up the details in R’s help system (e.g. “?read.csv” to learn more about using the read.csv() function), or search Google or the online R-Help mailing list archives for answers using the function name.

There are a lot of useful nodes and examples, especially in the “Graphs” section, but the mind map is not complete; some trails end before you get to a useful function reference. I am sorry for that, but it’s a work in progress, and will be slowly updated over time.

Comments and suggestions are welcome.

Update 1

In comments, several users reported problems opening the mind map. With a little investigation, it appears that the size of the mind map is the problem. To try to fix the problems , I have split the mind map out into several small mind maps, all linked together.

The new main mind map is the R Function Reference, Main. The larger branches on this main map no longer expand to their own content, but contain a link to a “child” mind map. The link looks like a sheet of paper with an arrow pointing to the right, click on it and little cartoon speech bubble will pop up with a link that you have to click on to go to the child mind map. Likewise, the central nodes on the child mind maps contain a link back to the main mind map.

Due to load times and the required extra clicks, this may slightly reduce usability for users who didn’t have a problem with the all-in-one version, but will hopefully make the mind map accessible to a broader audience.

I have to offer praise to the developers of Mind42. Though I couldn’t directly split branches off into their own mind maps or duplicate the mind map, it was very easy to export the mind map as a native Mind42 file and then import it multiple times, editing the copies without any loss of data or links. The ability to link directly between mind maps within Mind42 was also a key enabling feature. Considering that this is a free web app, its capabilities are most impressive. They were also quick to respond when I posted a call for help on the Mind42 forum.

Please let me know how the new, “improved” version works.

The old mind map, containing everything, is still available, but I will not update it.

Process Stability

(Updated below)

While performing a web search, I remembered how difficult the concept of “process stability” can be. How do you know when a process is stable?

D. C. Montgomery, one of the recognized authorities on the subject of statistical process control, seems to give conflicting advice on this. For instance, he’s careful to point out the assumptions underlying all of the measures that one would use on a process, and unstable processes invalidate most or all of these assumptions. How do you know if a process is stable if none of your analyses are applicable?

Process stability needs an operational definition. Luckily, there are at least two:

1) No signals on the appropriate process behavior chart (a.k.a. control chart);

2) Cpk / Ppk == 1 and Cp / Pp == 1

Signals on a process behavior chart do not necessarily mean that a process is out of control (i.e. false signals are possible, and expected at certain mathematically determinable rates), but we can be sure of process stability if there are no signals.

Likewise, we can take issue with using the process capability indices Pp, Ppk, Cp and Cpk in this manner. All assume a normal distribution, which you only get with a stable process, so you shouldn’t trust them as measures of process capability. In this case, that’s fine: don’t report the actual values; just report the ratio of Cp to Pp or Cpk to Ppk. When the ratio is 1, the process is stable; the larger the ratio, the worse the process. Donald Wheeler discusses this use of Ppk and Cpk, and the measures’ relation to production costs, in his latest column for Quality Digest.

Whether or not the process is economical (i.e. Cpk and Ppk are high enough) is a question completely separate from stability.

Update:

I was discussing this with a friend who, for various reasons, needs to allow for some process drift. In other words, a Ppk less than Cpk is expected and acceptable, but only up to a certain point. The nice thing about the Cpk/Ppk ratio is that it’s simple: a ratio of 1 means the process is stable; a ratio greater than 1 means the process is not stable; a ratio of less than 1 means someone has made a mistake or is lying. If we need to allow for some process drift, we lose this simplicity.

So suppose that we have a Cpk of 1.66. There are then five standard deviations between the process mean and the nearest specification limit. Assuming a process drift of 1.5 Sigmas, our Ppk is 1.16, giving us a ratio Cpk/Ppk of 1.43. If, however, our Cpk is 1.00, then a process drift of 1.5 Sigmas gives us a Cpk/Ppk ratio of 2.00.

With an allowed process drift of a fixed number of Sigma, it’s no longer so simple to determine, from the Cpk/Ppk ratio, whether or not a process is “stable” within the limits set by management.

A slightly more sophisticated calculation is needed, then. What we can calculate is the ratio

(Short Term SigmaLong Term Sigma) / Allowed Process Drift

If the result is less than or equal to 1, then the process is “good enough” (i.e. within our allowed drift). If the ratio is greater than 1, then the process is considered out of control and action needs to be taken to eliminate sources of variation. If the ratio is less than 0, then someone made a mistake or is lying (i.e. long-term Sigma can never be less than short-term Sigma).

Definitions

I was recently asked a question that raised some good design issues. The question went “why should changing this cause a change in that characteristic?”

The immediate and obvious answer was that it wouldn’t and couldn’t. Theoretically, a large decrease in this (X) might cause an increase of a few percent in that (Y); nothing more. Only someone was claiming that decreasing X decreased Y, too.

They were right. No, the theoretical relationship isn’t wrong. It’s right.

The theoretical calculation is fairly straightforward. You put so much of X in, and, after some calculation, you get so much of Y out. The less X you have, the more Y you get. The hard part is figuring out just how much of X you’re putting in.

The measurement of Y introduces a bunch of variation based on other factors. You measure by changing certain conditions A, B and C. These, in turn, affect some other factors, M and N. X, A, M and N together determine what value you measure for Y.

So decreasing X affects the other factors in such a way that the net effect is a decrease in the measured value of Y.

“Oh, sure,” you respond. “But the theoretical calculation should account for that.”

Not really. The theoretical calculation should tell us what the best case is…what our target should be. The actual measurement is going to produce different results based on various factors, some of which we control and some we can’t. A calculation based on the measurement process would require uncertainty ranges and return a probability distribution; not a singular value. Messy.

Engineers and researchers need to consider both of these as definitions. If you’re designing for some characteristic, as a researcher or engineer you’re usually going to be concerned with the theoretical calculations. This is how you were taught in school, and you’ll naturally be interested in getting as close to the best case as possible. However, not everyone is going to be interested in the theoretical calculation. The folks in Quality who are checking the product for conformance will be more interested in how it’s measured, the operational definition, than in the theoretical definition. The manufacturing plant only want to hear about the operational definition; for them, the world would be a better place without the theoretical definition.

As a design engineer, you need to be more concerned about the operational definition. You’ll be arguing that you designed a part for Y performance (or to “do Y“). The next question that management and your customers should (and probably will) ask is, how do you know you designed it to do that? The answer is always by data analysis. How do you get the data? Via the operational definition. What you know is determined by how you measure, and that’s the operational definition.

This has applicability well outside of engineering design. Physicists have been arguing this very point ever since Bohm and Heisenberg developed the Copenhagen interpretation of quantum physics. Management by objective depends on the ability to close the loop by measuring outcomes. This means that management by objectives requires operational definitions of every objective (though few organizations actually get this far, and management by objectives becomes management by manager gut feeling). Even more enlightened management techniques, such as those advocated by Deming and Scholtes, require operational definitions to enable an organization’s performance improvement (e.g. through the use of control charts, which are only possible with operational definitions).

Use the theoretical definition to tell you the best possible case, but be sure to design according to the operational definition.

Beginning to End

Product development covers all activities from program initiation and concept development through the start of production or service delivery. There are many process models for product development, among them the classic waterfall, the spiral, the Systems V model, Lean and Agile. In the U.S. automotive industry, the product development process is defined, or at least constrained, by the Advanced Product Quality Planning (APQP) manual from the Automotive Industry Action Group, or AIAG. The standard in academia seems to be laid out in Ulrich and Eppinger’s Product Design and Development (U&E).Most of these have some common features. Many start with defining the business goals and authorizing the project. The rest start with the next step: identifying customer needs. They also end somewhere between the hand-off to manufacturing and post-manufacturing support.

We can see that product development is a process that starts with the customer and ends with the customer. The output of product development is customer fulfillment; not merely an engineering design. The input is customer needs; not a product specification. Product development is not simply an engineering activity; it’s blend of business and engineering activities, the goal of which is to maximize company profit through customer fulfillment. Product development is a customer-focused process, and it looks something like this rather cycle:

Product Development as Customer-Focused Process

From a customer’s perspective, though, this process looks much simpler:

Product Development as Customer-Focused Process, From the Customer’s Perspective

One of the primary problems with product development is this delay. For you, the developer, all of the technical and market risk are wrapped up in this delay, and the market risk is the more troublesome of the two risks. Market risk is the risk that the customer will change their mind, developing a new set of priorities, or that a competitor will enter the market with a similar product before you do.

One of the key mitigation strategies for product development is the reduction of this delay. Design and manage your product development processes to bring the customer closer to their fulfillment.

To achieve this in a consistent and effective manner, you have to understand the economics of your development projects and the market. Every decision in product development is a trade off, and these trade-offs need to be focused on the goal: increasing profit through by increasing the gap between value and costs. For instance, you will be faced with a choice: spending more time in requirements gathering and analysis vs. decreasing the delay in delivering product to the customer. Just how you balance this depends on the cost of a performance shortfall (technical risk) vs. the cost of delay. With highly risk-averse customers, the cost of a performance shortfall is much greater than the cost of delay, which is probably why aerospace projects are notorious for falling behind schedule yet often held up as the gold standard for safety and technical performance. In contrast, consumer electronics tend to have very short time-to-market, but notoriously poor reliability; the customers value immediate fulfillment over technical performance.

It is important, too, to recognize that these kinds of trade-offs are not made just once; they are made on a daily basis. The upper management of the development organization needs to understand these economics so that they can design the product portfolio and product development strategy (e.g. selecting between more modular designs, shifting technology development off the critical path of customer deliverables, vs. more integrated designs that are more tailored to fit the customer needs). The program managers and design responsible engineers need this knowledge, too, in order to intelligently design and manage the product.

Your product development processes, then, must be designed to provide rapid feedback to project managers and engineers relative to these trade-offs between risks, and to assist them in making consistent decisions. The natural result of this line of reasoning is the development of decision tools, standard cost models and standard measurements focused on technical performance risk, project expenses, product costs and delays.

The Value of Standard Work

I recently had a conversation with a colleague about standardizing and documenting some of our work. He commented, in a mix of humor and exasperation “nothing we do here is standard.”

I’ve spent most of my career in R&D, and usually management have stated things a little more seriously and strongly: “we can’t standardize what we do; it’s not possible.” Their argument usually takes one of two forms: (A) this is R&D, so we can’t possibly know what the next step is, and therefore we cannot standardize; or (B) this is R&D and standardization is the enemy of the creativity that is needed.

Hogwash.

What I’ve found, and others have also reported, is that standard work is the best and surest way to improve R&D effectiveness and efficiency. Standard work enables and facilitates

  • Avoidance of errors, assuring that lessons learned are utilized and not forgotten;
  • Team learning and training;
  • Improvements to make the work more effective;
  • Reduction in variability;
  • Creation of meaningful job descriptions;
  • Greater innovation by reducing the mental and physical overhead of repetitive or standardized work.

In one job, I had the responsibility to develop a small problem-solving group, responsible for initiating and overseeing root cause analysis and corrective action activities. The problem solving activities had been performed on an as-needed basis by another group of experts, but were largely ad hoc. There was an element of customer interface, and my job was to maintain customer satisfaction through timely resolution of problems while reducing the overall cost of the work.

The workload increased by five to ten times during my tenure, but total, bottom-line costs remained roughly constant, representing an increase in efficiency of roughly eighty percent. These cost savings came about almost entirely by developing standard work: documenting processes and developing a suitable set of tools

Mind you, this wasn’t cut-and-dried work; it was problem-solving at its most difficult and “creative.” We were identifying and tracking down new problems with no idea of where we would end up and little indication of where to start. We didn’t have a pre-defined roadmap for tracking down the problems, and the information we had going into each case sometimes looked the same as other cases, yet we would end up with completely different root causes and corrective actions. The work required a high degree of thoughtful assessment and planning of next steps, with a very narrow look-ahead window (in almost every case, the next step would depend on what we learned from the test that we were initiating).

Despite the fact that we were learning at each step and determining what to do next based on newly-available data, we were able to standardize much of the work, reducing the error rate, reducing the effort required for each case and reducing the variation in effort required from case to case.

Standard work does not preclude flexibility. You can still do a lot of different jobs, and be able to address new problems. Standard work just takes the things you do repeatedly and makes them routine, so you don’t waste time thinking about them.

Individual and Team Learning

If product development is about learning, then there must be at least two kinds of learning going on: individual learning and team learning. By their very nature, individuals and teams must learn in different ways, so our product development and management processes need to support both kinds of learning. I will lay the groundwork for future posts by looking at how people and teams learn and what sort of behaviors they engage in as part of the learning process. Learning and behavior are open fields of research, with volumes of published material. I will be brief.

Nancy Leveson, in her book Safeware: System Safety and Computers, has a couple of excellent chapters on human learning and behavior, from which I’ll borrow. I recommend her book; the first ten chapters or so are well worth reading even if you’re not involved with computers. Borrowing from Jens Rasmussen, she discusses three levels of cognitive control: skill-based behavior; rule-based behavior; and knowledge-based behavior.

Knowledge-based behavior sits at the highest level of cognitive learning and control. Performance is controlled by explicit goals and actions are formulated through a conscious analysis of the environment and subsequent planning. One of the primary learning tools used at this level is the scientific method of hypothesis formulation, experimentation and evaluation.

Rule-based behavior develops when the environment is familiar and fairly unchanging. Situations are controlled through the application of heuristics, or rules, that are acquired through training and experience, and that are triggered by conditions or indicators of normal events or states. This sort of behavior is very efficient, and learning is achieved primarily through experimentation, or trial and error, that leads to further refinement of the rules and improved identification of conditions under which to apply those rules. People transition back to knowledge-based behavior when the environment changes in unexpected ways, but only as they become aware that the rules-based behaviors are failing to produce the usual results.

Skill-based behavior “is characterized by almost unconscious performance of routine tasks, such as driving a car on a familiar road.” The behavior requires a trade-off, often between speed and accuracy, and learning involves constant tests of the limits of that trade-off. These tests are experimental in nature, but largely sub-conscious (not planned). Only by occasionally crossing the limits (making “mistakes”) can learning be achieved. As with rule-based behavior, people transition from skill-based behavior to rules-based behavior only after they become aware of a change in the environment that is having a negative impact on outcomes of the skill-based behavior.

When people learn new skills they typically progress from knowledge-based behaviors to rule-based behaviors and eventually to skill-based behaviors. A surprising amount of engineering is based on heuristics rather than knowledge. This is often a good thing, as it allows us to efficiently deal with very complex problems and systems, making it possible to arrive at approximately-correct solutions much faster than through more explicit planning and evaluation. It can also go badly wrong when signs incorrectly lead one to apply the wrong rules to a situation that appears familiar but is not.

What might not be obvious is that learning at any level is only possible through a mix of successful and non-successful experiments. Unexpected outcomes (“mistakes,” “errors” or “failures”) are a necessary part of learning. In fact, the rate of learning is maximized (learning is most efficient) when the rate of unexpected outcomes is 50%.

Teams learn when individuals communicate, sharing and synthesizing the knowledge and heuristics that they’ve learned. This occurs primarily through two behaviors or tools: assessment and feedback. Assessment involves observation and reflection on what behaviors are working or not working in pursuit of the team goals. It should be performed by both by individuals and by the team as a whole. Feedback is comprised of constructive observations provided by others and objective measurements, and is the input to assessment. Because feedback and assessment are so important to team learning and growth, they should be planned, structured and ongoing.

For assessment to be effective at generating team learning and growth, some action needs to follow. An action might be to change an existing condition (e.g. change how meetings are run, explore design alternatives, etc.), or to document a process or norm that “works” and sharing the result with team members (e.g. documenting the work flow, standardizing parts, implementing decision processes, etc.).

Repeated and effective use of both individual and team learning behaviors results in team learning cycles. In any product development project, these learning cycles occur in different domains simultaneously. The main forms of feedback in product design are testing and design review, and there are multiple ways to assess and plan those feedback loops. Project feedback comes primarily from monitoring the schedule performance. At the same time, the team should be eliciting feedback about its ability to make decisions, work together and work with the rest of the organization, and assessing that feedback.

Peter Scholtes’ The Team Handbook is a well-written and practical guide to implementing team processes and behaviors, and is geared toward both team members and team leaders. It’s the kind of book that a new team could refer to throughout a project to guide them in becoming more cohesive and effective. His The Leader’s Handbook: Making Things Happen, Getting Things Done also provides an excellent supplement, geared more toward team leaders and business managers.

To summarize, product development processes and organizations must be designed to support repeated structured and “accidental” experimentation by individuals and team processes of feedback, assessment, decision-making and actions that result in either change or standardization.

Waste in TPS vs. Lean PD

The Toyota Production System (TPS), or Lean, is widely considered a remarkable and effective system for improving operations. People naturally try to apply the same system to other product streams and other activities. However, I have found that people often struggle with the correct application of the concepts, especially to product development. For instance, Glen Reynolds, an experienced project manager, is working to understand Agile, and has recently come across the statement by some Agile advocates that “testing is a waste.” Calling testing a waste, as a general statement, is nonsense, and Glen is engaged in some excellent discussion on the subject.

I have encountered a fair number of people with experience in product development, and a fair number with experience in Lean manufacturing, but few people with sufficient overlap in the two domains to combine them. The TPS is very formulaic, laying out clear rules or guidelines that are easily followed. Unfortunately, product development is sufficiently different from manufacturing that the TPS cannot be applied directly.

The TPS defines waste as any activity (or inactivity) that does not add value to the product. Value is defined from the customer’s perspective as any activity that the customer is willing to pay for as a part of the product. Assembling the product adds value; holding the product in inventory does not. Shipping product in exchange for payment adds value; the time spent processing that payment through Accounting does not.

Testing is used in manufacturing to detect defects. The customer is willing to pay for a correctly-manufactured product; not for the rejects. From this perspective, testing is a waste, because it does nothing to add value to the product. In fact, all of product development should be considered waste, because product development does not produce any product (unless, of course, your product happens to be product designs for others to manufacture, as with Ideo).

However, in product development, testing can add value. In fact, testing is probably the only way to maximize value creation in product development.

Manufacturing is a highly repetitive activity, which takes as its input a product and process design and tooling, and produces as its output a physical product. The physical product generates revenue. Ideally, the product is manufactured exactly to specification, and there are no activities required that do not lead directly to the manufacture of the product. There is no uncertainty as to what the output should be. Waste, then, is any deviation from this ideal state. Economically, the assumption is that the product is designed to maximize revenue generation, and the process is designed to correctly manufacture the product. Since nothing is perfect, the TPS seeks to improve profit by eliminating or minimizing those activities that do not generate revenue.

In contrast, product development is a highly variable activity, with variable inputs. The goal of product development is not simply to produce product designs according to some predefined specification. If it was, then product development would be a simple exercise in transforming specifications into conforming drawings. In fact, product development operates under a high degree of uncertainty; from the start of the project, the desired outcome is not fully known. In order to overcome this uncertainty, product developers have to learn. They have to learn about the customer—the end user—and about the technology that they are working with. Value is created as the uncertainties are resolved. The goal of product development is to maximize the rate of learning, thereby maximizing the rate of value creation.

The best way to learn is through trial and error, following the scientific method of hypothesis testing. So testing not a waste if it generates new knowledge. Testing that does not generate new knowledge is a waste. If you generate data that you already had, you’re generating waste. If you generate data that you don’t use, you’re generating waste. If the organization learns something, then you’ve created value.

Notice that this definition of product development follows the same principle that is used in the TPS: do what creates (or produces) value; eliminate or minimize everything else. The details change in the presence of uncertainty about the desired outcome.

We can take this a step further, to develop a more sophisticated approach to managing product development projects, if we understand the economics of our project. The uncertainty that is reduced through testing is all technical risk that the product will not meet the market’s requirements. This means that testing translates into increased sales volumes. Testing in product development adds time, which means a later entry into the market and possibly reduced sales volumes. If we have some estimate of what the cost of delay and the cost of risk are, we can then perform a cost-benefit analysis on the proposed testing to determine whether or not it results in a net creation of value.

Such economic models do not have to be complex, and they do not have to be highly accurate. They need to be just accurate enough that they do not lead to very bad decisions and they must be usable. This suggests, as Reinertsen has advocated, that new product development projects should develop economic models as early as possible and derive decision rules that project managers and lead engineers can use on a daily basis.

Developing Products Is About Learning

Product development is a learning activity, fundamentally no different than any class you’ve had in school.

You start off with a goal, and only a general idea of how to get there. As you learn more about the product you are developing you get closer to that goal. Sometimes you hit roadblocks that you cannot find any way to overcome. Once in a while, things go exactly the way you thought they would. Most of the time product development falls somewhere in between these extremes.

I think that we all recognize this. The path through a product development project is uncertain, making project planning difficult. How many times are we going to design/test/redesign? What testing will we be doing? We can guess, be we don’t know. The duration of even “standard” activities have a lot of variability, and may easily vary by a factor of ten. Telling the boss or a customer when you expect to complete a product development project is very difficult, because the schedule is necessarily uncertain.

Instead of being an exercise in executing a project plan, product development is a sequence of learning cycles; it’s about constantly climbing the learning curve.

Project planning for product development can take this into account. Instead of drawing up plans where we say “specify this, then design it, then build it” in very sequential and defined steps, we can instead focus on the unknowns and the learning curve. Our plans can be focused on the unknowns; on the things that we need to learn in order to successfully complete the development. This turns our development process into a series of learning cycles, with known activities worked in around the learning activities.

More importantly, it forces us to plan. At the very beginning of the development process, we will have to think about the things that we don’t know and think about ways to turn them into designed product. We will have to think ahead. Planning like this is priceless.

Like any good project plan, there will be a planning horizon. We will progressively add detail as we go. At the earliest stages, we will just set out blocks of “we need to figure this out.” The team that will be doing the work needs to be involved in this, and they need to estimate the duration and possible variability of such events. As we get closer, the team needs to plan in greater detail, until you get to well-defined “learning events” with two- to four-week durations. Each “learning event” needs to have the desired outputs specified and the activities for each member of the team clearly laid out. At the end of each learning event, the team needs to regroup, share what they’ve learned and plan the next event. The level of planning conducted and the duration of each event needs to be determined by the amount of risk that the organization is willing to adopt.

Of course, not all learning events are going to come out with the desired results. That’s the nature of learning, and the cause of uncertainty and variation in product development. In fact, if you have an effective organization, about fifty percent of each learning cycle will have to be repeated. Unlike manufacturing, where repeating the same work—rework—is considered waste, this “rework” is value-added, because learning is a necessary part of the job. If your organization does not have mature processes for organizational learning, then you may have to repeat as much as ninety percent of the work. These extra repeats are not value-added; they are waste.

There are ways to reduce the cost associated with the learning cycles, and to reduce the number of cycles. TRIZ, for instance, can greatly shorten learning cycles and reduce the number of them by enabling engineers to learn from the hard-earned lessons of others. Processes that help the organization better understand the customer, such as QFD, add cycles at the beginning of the project. However, these cycles cost little and are entirely value-added, and they prevent or significantly reduce the number of costly, non-value-added cycles “fixing,” or reworking, a product design late in the development process.

When you have a roll in product development—whether as engineer, customer interface, product testing or manufacturing—keep this in mind: in product development, the real value to your organization comes from learning, and design your organization and processes accordingly.

I will discuss TRIZ, QFD, organizational design for product development and planning for product development in more detail in the future. In the meantime, I suggest reading Joel Spolsky’s thoughts on obtaining reliable schedules in product development.

Innumeracy

A number of blogs that I follow are talking about a recent article in the Wall Street Journal, We’re Number One, Alas. The author argues that the U.S.’s corporate tax rate is too high, claiming that countries with somewhat lower corporate tax rates generate more revenue from those taxes as a fraction of GDP. He uses the graph below to make his point.

Corporate Taxes and Revenue, 2004

The Laffer Curve on this graph is claimed to show the relationship between revenue and tax rate. A Laffer curve is based on the hypothesis that a government generates zero revenue when the tax rate is at either zero percent or one hundred percent, and that the maximum revenue from taxes falls somewhere in between these two extremes. The author is claiming that the optimum is below the current U.S. rate, and illustrates this by placing the U.S. on the far side of a big cliff.

This is an egregious case of innumeracy. I told myself when I started blogging that I would steer clear of the blog echo chamber as much as possible, but it is not all that uncommon to see similar presentations in the corporate world. Some data points are plotted, and then some chartjunk is added to tell a story…a story that may not be supported by the data at all. There are a few things that managers and engineers can do to combat this. For instance, if there’s supposed to be a correlation between values, we can ask for the correlation coefficient.

In this case, the correlation coefficient is about 0.1. This is equivalent to saying that just ten percent of the variation in revenues from taxes is due to the tax rate. If you were working on process improvement, you would not want to focus on a factor that only accounted for ten percent of the variation; you would be looking for a factor that explained greater than fifty percent.

Another approach would be to ask for a hypothesis test. The hypothesis would be that there is no correlation; the alternate hypothesis is that there is some correlation. As a business manager, you want to select the level of risk that you’re willing to accept. This is an economic decision, as risk analysis usually is. For the sake of argument, we will accept a risk of five percent. This is our “alpha” value (α-value), which we’ll express as a fraction: 0.05. We now need to perform the appropriate hypothesis test and compare the resulting p-value against our α-value. If the p-value is lower than the α-value, then we have correlation; if the p-value is greater than the α-value, there is no correlation.

There are plenty of statistics packages out there that can perform these analyses for us. Some are easier to use than others; some are more powerful than others. We use Minitab at work, and I find it indispensable. I also drop into R occasionally. R is much more powerful and free, but it’s all command-line programming, so it also has a much larger learning curve.

The p-value on this data is greater than 0.05, which means there is no linear correlation between the revenue and the tax rate.

Linear, though, means you have a straight line, and the Laffer curve is not linear by definition. The data fails our first tests, but the assumptions in our tests may have driven us to a false failure.

Let’s go back and start by plotting just the data.

Corporate Taxes and Revenue, 2004, data only

The exaggerated Laffer curve in the original presentation is not evident in this data. Excluding the outlier where revenue is 0.1 of GDP (looking at the Wall Street Journal’s graph, we see that this is for Norway), the data is roughly linear: zero revenue at a zero percent tax rate, and slightly increasing revenue with increasing tax rate. There may be a slight rounding-off or flattening in the tax rate range 0.20 to 0.35.

Since we do not know what shape the Laffer curve should take—where the maximum should be—and we don’t have enough data to find it, we can use the Lowess algorithm to create a smoothed curve.

Corporate Taxes and Revenue, 2004, with Lowess curve

This confirms our observation that the relationship is essentially linear, with a possible rounding off above 0.20. I’ve added a rug plot to the axes, which gives a tick for every data point. This is useful because it helps us to focus on the distribution of the data, much as a separate histogram would.

Where does all this get us? It tells us that the author’s curve was most likely drawn in just to make his point and does not fit any data or data analysis. It also tells us that the author’s story had nothing to do with the data.

I have seen this many times in the corporate world. Graphs of neat lines, where all the data points have been averaged out and left out. Graphs where a preconceived curve is fitted to data without regard for how well (or poorly) the curve fits the data. Data may be removed completely and fitted curves smoothed until they look “look right.”

Combating this is not hard. It just takes some thought and a few questions.

First, make sure you actually see the data, and not just some prettified lines. Graphs should contain data points, and real data usually is not terribly clean or smooth.

Second, when a curve is fit to the data, ask what the correlation is. This should be a number, and less than 0.5 (or 50%) means there is no useful correlation. The closer to 1 (or 100%), the better. Ask, too, what the basis of the fitted line is: is this just some high-order polynomial or spline that was selected because of the high correlation, or is there a solid physical—or theoretical—basis for the selected line? If there is no physical basis, straight lines are preferable to curves.

Third, ask for a numerical analysis. Graphs are powerful; they allow us to see all the data, to see the trends, and to determine what sorts of numerical analyses are possible. However, graphs are not a substitute for numerical, or statistical, analysis. The two compliment each other. So ask for the hypothesis statement and the alternate hypothesis. Ask for the p-value and the α-value (which should have been selected before the experiment was conducted).

I realize that this is unfamiliar territory for a lot of managers, who have little mathematical background and often no training in statistics. It’s not hard to ask the questions, and it’s not hard to understand the answers—they’re pretty straight-forward—and I don’t think that you need to have a deep understanding of the mathematics. Let the experts be the experts, but ask questions to make sure that you are covering the risk.

Lean Product Development Implementation Summit

A couple of weeks ago, I attended the Lean Product Development Implementation Summit, hosted by the Management Roundtable and chaired by Don Reinertsen, author of Developing Products in Half the Time and Managing the Design Factory. Don’s been a big advocate of understanding the theoretical underpinnings of Lean Manufacturing and applying them, sensibly, to new product development. His basic approach, if I can summarize in a single sentence, is to understand the development project’s economics, make sound decisions based on cost/value trade-offs and to engage in practices that encourage the flow of work. Since this is Lean, the goal is to be approximately correctly rather than wasting time trying to be perfect.

The Summit was delivered as a series of case studies, presented by the people actually implementing Lean in new product development (NPD). Presenters came from backgrounds ranging from software development to aircraft design.

It was extremely interesting, and generated a lot of ideas for me. Some of these ideas I have already implemented in my day job; others I’m working with on a longer time-frame, and some I’ve squirreled away for when I need them. For instance, Bernard North, V.P. of Global R, D & E at Kennametal and David J. Anderson, Senior Director of Software Engineering at Corbis, both presented (independent) case studies showing how simple visual controls allowed them and their teams to control both planned work and work in progress, greatly reducing cycle times and accelerating output. In the case of Kennametal, work in progress in the test labs was significantly reduced, eliminating the need to prioritize incoming requests and actually decreasing the amount of work in each request. Within five years, mean lead times were less than twenty percent of previous levels and the percent of revenue from new products had doubled. At Corbis, a very similar visual system enabled them to reduce “hot” requests while greatly accelerating the rate of sofware releases, all by limiting the number of active tasks or projects (applying WIP constraints).

These are two very different environments, but both used essentially the same solution to their problems of cycle time and communication.

Boeing implemented Lean PD, Theory of Constraints and Six Sigma in their development process. If I read the numbers right, it looks like they have cut their development cycle time in half by working to achieve flow in product development and managing projects to the buffers. Other presenters showed how creating very short cycle times (reducing batch sizes) and pushing testing further upstream greatly accelerates product development and reduces design defects.

Another company explicitly re-engineered their new product development process (PDP) around the idea that product development is a learning activity. Their PDP starts with a team meeting in which the goals for the next learning cycle are laid out and the work is divvied up. The team then breaks out for about two weeks to complete the planned work, and completes the cycle with another team meeting, in which they integrate what they’ve learned and plan the next cycle. These planning steps are essentially design reviews, but very explicitly focused on learning and team processes.

I have now put up a whiteboard for warranty returns, in the warranty processing area. It is divided up according to the phases of our analysis process. Each return is now tracked visually with a sticky note. Queued work (returns that have not been touched) is visible, as are heavy loads on resources. We can now calculate the lead time for our entire process, as well as for major phases within our process. In just a week’s time, the flow of work has smoothed out noticably, and our ability to communicate the status of each return–and problems such as resource overload–has changed from chaotic to dirt simple.

In the past week, we have realized that we also need to track dates, so that we know how old each request is. Our implementation is to simply write the date on the sticky notes each time we move them. This sort of change o.k.; we implemented the system knowing it was not perfect, and that we would improve along the way. This is continuous improvement in action.

There are still clear areas for improvement. One of the big ones is that we still have a push process rather than a pull process. I have not figured out what to do about that, as I have no control over the rate of incoming returns and our customer generally pushes hard to see activity on each return.

Still, this is a simple and effective technique, and can be applied in all areas of R, D & E. One of my long-term projects is the transfer of this technique outside of my sphere of authority.

Presenter David Anderson has posted his own thoughts on the Summit.