Introduction to R for Excel Users

Download the PDF

As the saying goes, when all you have is a hammer, everything looks like a nail. Excel was designed to do simple financial analyses and to craft financial statements. Though its capabilities have been expanded over the years, it was never designed to perform the sort of data analysis that industry scientists, engineers and Six Sigma belts need to perform on a daily basis.

Most data analyses performed in Excel look more like simple financial spreadsheets rather than actual data analysis, and this quality of work translates into bad—or at least sub-optimal—business decisions. There are alternatives to Excel, and the free, open-source data analysis platform R is one of them.

Unfortunately, R has a steep learning curve. I’m offering, for free, a short primer on R [PDF] where I’ve sought to make that learning curve a little less painful for engineers and scientists who normally work in Excel.

Background

A couple of years ago, I was developing a short course to teach R to scientists and engineers in industry who normally used Excel. The goal was to help them transition to a more capable tool. My course design notes morphed into a handout, and when plans for the course fell through, that handout grew into a self-study guide, which I later adapted into this seventy-page, stand-alone introduction for Excel users.

Organization

The primer walks the reader through the basics of R, starting with a brief overview of capabilities, then diving into installation, basic operations, graphical analysis and basic statistics. I believe that a picture is worth a thousand words, so it’s light on text and heavy on examples and visuals.

The end of the book rounds out with a look at some of the most useful add-ons, the briefest of introductions to writing your own, custom functions in R, and a cross-reference of common Excel functions with their equivalents in R.

The text is broken up into chapters and fully indexed so that it can be used either as a walk-through tutorial or as a quick reference.

And You Thought Physics Was *YAWN*

Part of my day job involves monitoring the renewable energy market, and particularly keeping abreast of storage technologies. It’s like combining my hobby with my job.

A new wind turbine design was recently announced, the SeaTwirl. It’s an off-shore turbine design using vertical blades. The key technological advance here is that it includes a method to store energy, so that it can continue to produce electricity when the wind stops blowing. This ability to deliver a constant output is important, because, as you may have heard, wind energy is intermittent; you only get electricity when the wind blows, and only to the degree that it’s blowing. Demand, unfortunately, doesn’t follow wind’s intermittancy—nobody stops to check that the wind is blowing before they turn on their lights—and the utilities, transmission system operators and distribution system operators all have to supply electricity to meet demand.

The SeaTwirl stores energy by integrating an unusual wind turbine design with a pumped hydro system. It has the turbine blades on a large circular ring, which rotates parallel to the water’s surface kind of like a hula-hoop. This ring is hollow (more or less), and is filled up with water when the wind blows. When the wind stops blowing, the momentum of the water keeps the tube spinning, generating electricity, and the water is allowed to drain back out, spinning a hydro turbine to generate electricity.

The nice thing about this is that the storage can always be “recharged” and it’s “free.” Or at least it seems to be. If you’ve ever held a bucket while spinning around, you know that spinning with an empty bucket takes a lot less effort than spinning around with a full bucket. In part, this is because of a property known as the moment of inertia. The heavier or larger a spinning object gets, the more it resists changes to its rate of rotation (or rpm).

If the SeaTwirl is filling up this horizontal “hula-hoop” with water, then the weight of the tube is increasing and so is the moment of inertia. As the moment of inertia increases, the energy needed to reach a given rpm increases. Wind turbines normally generate electricity in proportion to the wind speed, because the rpm of the blades is proportional to the wind speed. Increase the moment of inertia and you decrease the rpm, which means you generate less electricity for a given wind speed.

Now comes the physics. For something shaped like a hula-hoop, the moment of inertia, I, is calculated from the mass, M, and the radius of the hoop, R, according to:

I = M R^{2}

The energy, E, in a spinning object is equal to the moment of inertia times the speed of rotation, ω, according to

E=\frac{1}{2} I \omega ^{2}

If we know the energy (because we know the wind speed), then we can calculate the speed of rotation, ω, by rearranging that equation to get ω on the left-hand side:

\omega = \sqrt{\frac{2E}{I}}

We can then replace I with mass and radius from the first equation to get

\omega = \sqrt{\frac{2E}{MR^{2}}}

So we can see that, if we don’t change the energy E (or don’t change the wind speed), and don’t change the radius R of the spinning hoop, then increasing the mass M results in a slower rate of rotation.

From SeaTwirl’s website and press releases, we can estimate how big the SeaTwirl is, which will let us estimate how much slower a full SeaTwirl will spin than an empty one, and therefore how much less electricity must be generated. We can calculate this by taking the ratio of ω full to ω empty, so that the parts that we don’t have to know E and R.

\frac{\omega_{full}}{\omega_{empty}} = \frac{\sqrt{\frac{2E}{M_{full}R^{2}}}}{\sqrt{\frac{2E}{M_{empty}R^{2}}}} = \sqrt{\frac{\frac{2E}{R^{2}}}{\frac{2E}{R^{2}}}}\sqrt{\frac{M_{empty}}{M_{full}}}=\sqrt{\frac{M_{empty}}{M_{full}}}

For the SeaTwirl, we now have to find out what R is, and estimate M for both filled and empty

The whole turbine assembly is made of composite materials, which probably have a density, \rho_{c}, of around 2500 kilograms per cubic meter (similar to fiberglass). Water has a density, \rho_{w}, of near 1000 kilograms per cubit meter (depending on temperature). The diameter of the turbine will be near 180 meters, so the radius, R, of our “hula-hoop” is half that, or 90 meters. From the pictures, it looks like the thickness of that hula-hoop is a few percent of the total diameter of the turbine, so we can figure an outside diameter of the “hula-hoop” of about 2 meters, for a radius, r, of 1 meter. Figure that at least ten percent of this is composite, and the rest is the hollow, water-filled portion.

To estimate the weight of the water in the “hula-hoop,” we can approximate the water as being a cylinder of radius r_{w} = 0.9r and length equal to the circumference of the “hula-hoop,” l = 2\pi R. The volume of such a cylinder is equal to the cross-sectional area of the water column, A_{w}=\pi r_{w}^{2} times the length of the column, l. The total mass of the water, m_{w} is the density times this volume.

m_{w} = \rho_{w}\pi r_{w}^{2}2\pi R = \rho_{w}3\pi (0.9r)^{2} R

Plugging in our estimates for the above values gives us

m_{w} = 1000 \cdot 3\pi (0.9)^{2} 90 = 687000 kg

That’s a lot of water.

Now for the empty “hula-hoop.” We can treat it in the same way: a cylinder of material of radius r, length l = 2\pi R. However, we don’t want to calculate for a solid cylinder of composite; we have to subtract out the hollow part with radius r_{w}. So the mass of the composite is

m_{c} = \rho_{c} 2 \pi R ( \pi r^{2} - \pi r^{2}_{w} )

m_{c} = 2500 \cdot 3 \pi 90 (1^{2} - 0.9^{2} ) = 403000 kg

So the water more than doubles the weight of the hoop.

From the picture, you can see that there’s another hoop at the top, and the two hoops are connected by the turbines, which combined are probably worth at least another hoop in weight, so we can further assume that the mass of this bottom hoop, empty, is roughly one-third of the total mass of the movable parts of turbine.

The mass of the turbine, empty, is therefore about 1200000 kg, or 1200 tons. Filled with water, this goes up to about 1900000 kg, or 1900 tons. Empty, that’s maybe twice the largest off-shore turbine currently in existence, but this thing is easily twice as big as any current turbine, so our estimate appears to be in the right neighborhood.

Now we go back to our equation for the ratio of the rotational velocities, ω, and plug in these weights:

\frac{\omega_{full}}{\omega_{empty}} = \sqrt{\frac{M_{empty}}{M_{full}}} =\sqrt{\frac{1200}{1900}} = 0.8

So we get about 80% as much electricity from a water-filled turbine as from an empty one, when the wind blows. This is a direct efficiency loss due to the storage of energy in the spinning-water-hoop.

In addition, there’s the efficiency losses in loading water into the hoop, or “charging” the hoop, and the efficiency losses of “discharging” the hoop, running the water back out through a hydro turbine. Pumped hydro is usually about 72% efficient, or less, in each direction, so the total round-trip efficiency of storage + discharge is about 50% efficient. There are a lot of other storage technologies that do at least this good, if not better.

These two figures, the 80% efficiency loss of just operating the turbine and the 50% storage-discharge efficiency, can be used to directly compare SeaTwist with other wind turbine + storage technology solutions. Any storage technology that has at least a 50% round-trip efficiency and increases the total system cost by less than 20% over the system’s operating lifetime will outperform SeaTwist in terms of return on investment.

Successful Labs

I’ve worked in and managed a few R&D and test labs in my career. Lately, I’ve been thinking about success factors for those labs. At a high level, I believe that a successful Lab has four key attributes: good data; the right data; the ability to communicate the data clearly and effectively; and a relentless pursuit of perfection.

Good Data

Good data is accurate, has a known precision, and is reproducible by your lab and by third parties. This requires good calibration practices, careful evaluation of measurement uncertainty and documented test methods. In short, you need:

  • good calibration procedures and a calibration schedule that keeps equipment in calibration;
  • good procedures for measuring and documenting the measurement uncertainty and sources of error;
  • good procedures for how to set up tests, collect data and then utilize the information from calibration and measurement system analysis in your data analysis.

Yes, this all boils down to standard work. I’ve said it before, and I’ll say it again. Despite the common opinion that standard work is an impediment creative R&D-type work, I’ve found just the opposite is true.

This is harder to accomplish than it sounds, but there’s plenty of resources available to help. There’s even an international standard that a lab can be accredited against: ISO 17025.

The Right Data

Ensuring that you’re collecting the right data makes collecting good data look easy. Getting the right data means performing a test that provides useful information. There are two components to this: testing the expected conditions; and testing the boundary conditions.

If you’re doing your own testing, then testing the expected conditions is easy. You know what you’re thinking and what you expect, and you go test it. If I want to test if ice freezes at zero degrees Celsius, then I test it. However, if the testing is outsourced, then things get complicated. Suppose I live in Denver, Colorado, and I want to test if ice freezes at zero degrees Celsius in the winter. To test it myself, I might stick a thermometer in a glass of water, put it outside and wait. Suppose, though, that the testing is outsourced to a lab in a place like Bangladesh, India, that’s hotter, lower in altitude and more humid. They can provide an answer, but will they address my intent? As the test requester, I may not ask the right question; as the test group, they may leap to test without fully understanding why I’m asking. This sort of confusion actually happens quite frequently, even when the test group and the requester are in the same building. It can be months before people realize that their question was only partially answered.

Paradoxically, testing the boundary conditions is difficult when you’re doing your own testing, while the communication errors described above make it easier for an outside lab to test the boundary conditions. It’s almost inevitable that they’ll test some boundary conditions. The reason for this is that boundary conditions are defined by one’s assumptions, and people are generally pretty poor at identifying and thinking through their own assumptions. Mentally, we tear right through the assumptions to the interesting bits. The outside lab, though, isn’t going to be quite testing the expected conditions; they’ll always be nearer at least one set of boundary conditions.

One common solution used by labs is to develop a detailed questionnaire to try to force their customers to detail their request in the lab’s terms. I’ve done this myself. It doesn’t work. A questionnaire, like a checklist, can help capture the things you know you’d otherwise overlook, but neither a questionnaire nor a checklist can bridge a communication gap.

The solution that works is to send the lab personnel out into the gemba; to go and see the customer’s world and understand what they’re doing and why they’re making their request. This is difficult. The lab may be geographically distant from the gemba. People working in a lab often got there by being independent thinkers and workers, and not by being very gregarious. Labs are also paid or evaluated for the testing they do; not for the customer visits they make. A lab manager needs to be able to overcome these obstacles.

Communication

Once the lab has the right, good data, there’s still one big challenge left. All that data goes to waste if it isn’t communicated effectively. This means understanding that the data tells a story, knowing what that story is, and then telling that story honestly and with clarity. One could write several books on this subject. I direct your attention to the exceptional works of Edward Tufte, especially The Visual Display of Quantitative Information, an excellent NASA report by S. Katzoff titled Clarity in Technical Reporting, and the deep works of William S. Cleveland, including The Elements of Graphing Data and Visualizing Data. If you don’t have these in your library, then you’re probably not communicating as clearly and effectively as you should.

Effective communication is critical even when you’re doing your own testing. It provides a record of your work so that others can follow in your footsteps. Without effective communication, whatever you learn stays with you, and you lose the ability to leverage the ideas and experience of others.

Pursuing Perfection

While we have to get the job done today, we probably haven’t delivered everything your customer needed. There’s always opportunity for improvement. A good lab recognizes this, constantly engages in self reflection and finds ways to improve. This is a people-based activity. A good manager enables this critical self-reflection and supports and encourages the needed changes.

Such critical self-reflection is not always easy. In many business environments, an admission of imperfection is an invitation to be attacked, demoted, or fired. Encouraging the self-reflection that is crucial to improvement requires building trust with your employees; providing a safe environment. Employees have to be comfortable talking about their professional faults, having others talk about those faults, and they have to believe that they can improve.

Being genuine and honest can help a manager move their group in this direction. Ensuring that there are no negative consequences to the pursuit of perfection will also help. Unfortunately, it’s not entirely up to the manager; it’s a question of politics, policies and corporate culture. Effective managers and leaders need to navigate these waters for the good of their team. They’ll do this better with the support and involvement of their team.