Process Data Analysis – VeryProfessionalEngineer.com

Climate Change, Energy Business, Process Data Analysis

What is Really the Cheapest Source of Energy?

Spoilers: It’s not elephant power.

A friend of mine in college recently posted the following open question:

There has been a lot of brouhaha recently about “there are X times the number of jobs in ‘green energy’ than in coal.” Relating to the number of jobs, can someone tell me how many megawatts/gigawatts are produced on a per capita basis in each of these energy “sectors?”

My initial though was that this was really a loaded question, a fallacious argument that ‘green energy’ wasn’t as efficient because the cost to build out new capacity (often referred to as Capital Expenditure, or CAPEX) far exceeded the operating expenditures (OPEX) required to maintain power generation from existing plants. Of course coal plants are less labor and cost intensive, they already exist! I also knew from my previous research on this subject that overall costs of solar plants were approaching parity with traditional coal plants. My initial instinct was to pour cold water on the whole argument[i].

However, before I hit send, I thought about the question a bit more. I recalled what I knew about oil refiners in the US, and how building new capacity usually costs much more than simply buying or expanding existing facilities, and thus no new refineries have been built in decades. Maybe my friend was right, and reports of cost parity of green energy sources relied too heavily on a fallacy of their own, mistaking the alternative not as new coal plants, but rather maintenance and expansion of existing units.

I left a comment saying this might be something that would be interesting to look into. I assume Facebook’s algorithm took note of my reply because it then put a comment made by another of my friends five days earlier on the top of my feed:

It’s amazing to me that the solar industry flaunts its terrible productivity as a selling point. “We produce the least terawatt-hours using the most workers!” That’s not a benefit!

That comment was issued with a link to a Fast Company article trumpeting that in the US, solar energy now provides twice as many jobs as coal.

I do agree with both of my friends that claiming certain investments create jobs is dubious, and I would rather focus on the cost of each alternative in dollar terms to make sure these investments are economically sustainable or at least approaching sustainability to make sure any jobs they create do not suddenly vanish once political winds change or subsidies expire. For that reason, understanding how close green energy is to competing to legacy energy sources economically is the exercise I took on here. Note that when I say energy in the scope of this post, I am referring to electric power generation and not discussing fuels directly used for personal transport.

The Data

The first data point I wanted to explore was something that was said in the aforementioned Fast Company article, which stated:

While 40 coal plants were retired in the U.S. in 2016, and no new coal plants were built, the solar industry broke records for new installations, with 14,000 megawatts of new installed power.

If my hypothesis was that existing coal capacity would be more competitive than newly-built capacity, the fact that 40 existing coal plants were shut down with no new ones built would seem to indicate that this is not true. However, following the oil refinery model where refineries have been closing for decades without replacements being built, this could also just mean the lost capacity was being offset by increases in production in other units. Following the article’s source for the statement led me back to the US Energy Information Administration (EIA), a reputable government source which I have used multiple times for other pieces on this blog. The EIA provides tons of data as well as projections, both of which can be used to infer how different technologies will compete for utilization now and in the future.

The EIA provides monthly spreadsheets tracking almost all US power generators with some exceptions, and each one appears to contain details for all individual US power generation plants with capacities over 1 MW[ii] as well as planned plant retirement times. There are 20,870 plants listed in the “operating” tab of the nearly 7 megabyte Excel spreadsheet that in all have 1,183,011 MW of total listed nameplate capacity.[iii]

By filtering the data for the most recent spreadsheet from March 2017, I can see that plants representing 26,614 MW of capacity have planned retirement dates between March 2017 and December 2021. Only 215.5 MW of this retired capacity represents ‘green energy’, and of this 215.5, 207.6 is the result of the planned retirement of some of the capacity of the Wanapum hydroelectric plant in Washington State, which has been in operation since 1963, although a quick Google search makes it appear that this capacity is actually going to be replaced and expanded. In terms of capacity, most of the retirements affect coal (12,163 MW), Natural Gas (9,320 MW), and Petroleum Liquids (778 MW) facilities.

While the total capacity being retired between now and the end of 2021 is very low in terms of total capacity, replacing this with renewable resources would account for over two-thirds of the EIA’s current projected green energy growth between 2017 (207,200 MW) and 2021 (244,630 MW)[iv]. The “Planned” plant tab of the EIA generator spreadsheet backs this up, with the 113,698 MW of planned capacity to be started up between now and 2027 much more heavily tilted towards green (or at least “Carbon-free”) energy sources than the existing energy mix. These plants include wind (22,570 MW, 19.9%), solar (9949 MW, 8.8%), nuclear (5,000 MW, 4.4%), as well as hydroelectric and geothermal combining for an additional 997 MW (0.9%).

This is good and bad news for the future of green energy. On one hand, it would appear to support my position that there is significant growth available for renewables to compete with other sources where new infrastructure is required. The downside of this is that following the EIA projections through 2050 only gives a total green energy capacity of 433,490 MW, well under 5% of the total electrical generation asset mix.

The Alternative

You might have noticed that the “Carbon-free” energy sources only represent about a third of the total planned capacity to be added. But there are no coal mines replacing the ones being shut down. Instead, almost two-thirds of the planned new plants utilize Natural Gas (72,659 MW, 63.9%).

If green energy has a real threat going forward, natural gas is a…uh, erm…natural choice[v]. In terms of price per unit energy, Natural gas has cost only a fraction of what oil does, even when oil prices crashed[vi]. The current price of Natural Gas is approximately $3/million BTU, and a barrel of oil contains about 5.8 million BTU, making the cost of Natural gas approximately $17.4 per barrel of oil equivalent (BOE).

From an environmental perspective, Natural gas is cleaner than coal and produces about half the amount of CO2 per unit energy produced. This is primarily because Natural gas has a higher heating value per unit weight of fuel than coal (Coal heating value is generally between 7,000 and 14,000 BTU/lb, while Natural gas is up to 21,500 BTU/lb). In fact, the US’s aggressive shifting to Natural gas-based electricity generation was cited by many as a reason the Paris climate accord was unfair to the US, as we had already reduced our Carbon emissions quite a bit because of technological advances that allowed the US to replace chunks of coal power with Natural gas:

As I indicated in my comments yesterday, and the president emphasized in his speech, this — this administration and the country as a whole — we have taken significant steps to reduce our CO2 footprint to levels of the pre-1990s.

What you won’t hear — how did we achieve that? Largely because of technology, hydraulic fracturing and horizontal drilling, that has allowed a conversion to and natural gas and the generation of electricity. You won’t hear that from the environmental left. –EPA Head Scott Pruitt, June 2^nd, 2017

I don’t want to wander back into climate change again (although that is and will continue to be a recurring theme here), but I do bring this up because when judging renewable energy on its cost merits, I believe too much emphasis has been placed on coal and not enough on Natural gas.

And the Winner Is?

So, if you think the EIA has been pretty informative about this whole topic, you’re right. In fact, they basically have the answer to the cost question already spelled out, but that wouldn’t have made for a great discussion. Here’s what EIA says the CAPEX and OPEX numbers say for Natural gas compared to the most cost-effective wind and solar options (for those of you that did not bother to click the link above to see the raw data, EIA did note that these are unsubsidized costs based in 2016 dollars). The CAPEX and OPEX numbers are what I pulled from EIA, while the hypothetical CAPX and OPEX were calculated in Excel based on a theoretical 100 MW plant.

First up, conventional fired combined cycle Natural gas:

Natural Gas (Fired Combined Cycle)	Conventional	Advanced	Natural Gas (Advanced with Carbon Capture and Sequestration)
CAPEX ($/kW)	969	1013	2153
OPEX (Fixed, $/kW-yr)	10.93	9.94	33.21
OPEX (Variable, $/MW-hr)	3.48	1.99	7.08
Hypothetical CAPEX, 100 MW Plant	$96,900,000	$101,300,000	$215,300,000
Hypothetical Annual OPEX, 100 MW Plant	$4,141,480	$2,737,240	$9,523,080

Fired Natural Gas Turbine (this is what we use on the FPSO where I work)[vii]:

Natural Gas (Fired Combustion Turbine)	Conventional	Advanced
CAPEX ($/kW)	1092	672
OPEX (Fixed, $/kW-yr)	17.39	6.76
OPEX (Variable, $/MW-hr)	3.48	10.63
Hypothetical CAPEX, 100 MW Plant	$109,200,000	$67,200,000
Hypothetical Annual OPEX, 100 MW Plant	$4,787,480	$9,987,880

Finally, Wind and Solar:

Wind, Solar	Solar (Photovoltaic)	Wind (Onshore)
CAPEX ($/kW)	2277	1686
OPEX (Fixed, $/kW-yr)	21.66	46.71
OPEX (Variable, $/MW-hr)	0	0
Hypothetical CAPEX, 100 MW Plant	$227,700,000	$168,600,000
Hypothetical Annual OPEX, 100 MW Plant	$2,166,000	$4,671,000

Based on the EIA numbers, the cheapest option by far is advanced or conventional Natural gas plants. However, if you include carbon capture and sequestration (CCS) costs, wind and solar would seem to come out on top (although solar only marginally so on the strength of its much lower OPEX in that scenario).

There is a significant cost differential created by the need for CCS that shifts the equation. However, given that even using Natural gas without CCS does significantly cut overall CO₂ emissions when replacing coal facilities, there is still some environmental driver to employ that technology even without CCS, as a “stepping stone” to environmentally friendlier power generation.

For those looking for a talking point against Natural gas, from a long-term environmental viewpoint, the amount that cheap Natural gas will stall efforts to install permanent green energy solutions could stall and by some estimations could eventually leave us in a worse position than we are currently. Also, cutting our carbon footprint in half is great, but if a country like India increased their per capita electricity usage to even a quarter of the US using Natural gas than the net impact would be an increase in CO₂ emissions[viii]. It may seem ironic that our great achievement in cutting CO₂ emissions through the installation of Natural gas fired generation facilities would result in absolutely massive overall increases if replicated throughout the world, but that is a natural consequence of living in a fully developed country with energy demands an order of magnitude higher than that of the developing world.

In any event, EIA projects based on our current path that even the application both of these technologies will not result in a dramatic shift CO₂ emissions by 2050, with or without adherence to the Obama Administration’s Clean Power Plan (CPP). The US per-capita carbon footprint will fall from its current 16.3 metric tons/year to either 12.7 (22.1% reduction) with CPP or 14.0 (14.1% reduction) without it.

I won’t be researching the projected external costs and consequences of climate change in this article, but I can state with confidence that investment in Carbon-neutral energy will have to accelerate at a much faster pace if we plan to effectively mitigate them. Of course, as is the case with emerging technologies, I’m not sure what green energy might look like in 5 or 10 years. I used to think that green energy was a great idea but an investment with costs an order of magnitude higher than conventional fuels. This isn’t the case, and if there are even a few marginal breakthroughs left to be found the field, the situation could easily be flipped on its head, with Carbon on the losing end. I don’t know how/if/or when this might happen, but I may take that on as a separate entry later.

How Does This Money Support Jobs?

Going back to the original question in this post, CAPEX money is generally a one-time cost for construction of a plant. As noted, this is very labor-intensive and why the solar and wind energy companies can boast about how many jobs they create compared to coal. Variable OPEX costs generally refer to fuel, which is why these are 0 for wind and solar. However, money paid for Natural gas will directly support jobs in the Natural gas industry. Fixed OPEX costs are more likely to include maintenance and repairs, which also require skilled labor.

For me, while I concur with my second friend that from an economic standpoint it’s generally better focus first on the per-capita value from the jobs you create than the quantity, I don’t necessarily agree that the idea that the only thing we get in this case is energy. Without an honest discussion about how to quantify the costs of externalities associated with CO₂ emissions, we can’t really pass judgement.

Thanks for making it to the end. I didn’t make it easy this time, only one safari picture/sight gag (I may add some later if anyone has any ideas). As always, let me know what you think, especially if you think I screwed something up.

[ix]

[i] It’s amazing how often I, and everyone else, mistake gasoline for ‘cold water’ when trying to end an internet argument. I’ll talk about that more in another post.

[ii] Although the notes on the file claim “Capacity from facilities with a total generator nameplate capacity less than 1 MW are excluded from this report. This exclusion may represent a signifciant portion of capacity for some technologies such as solar photovoltaic generation,” there are some plants with a capacity of <1 MW in the list. Also, the word “significant” is misspelled in the report and this seems a suitable forum to issue a public service reminder that Excel doesn’t spell check cell text by default.

[iii] As mentioned in a previous blog post, US electricity consumption is 3,913,000,000,000 kW-hr/year, which converts to 446,689 MW on a continuous basis. It seems important to note that plant nameplate capacity is generally the highest designed usage, generally whatever the highest anticipated peak usage for the facility, which will be much higher than the average use.

[iv] From a separate projection provided by the EIA. You can find theire energy projections here: https://www.eia.gov/analysis/projection-data.cfm#annualproj

[v] I can’t tell you how much I hate myself for this joke. Oddly, I can’t seem to make myself remove it.

[vi] US Natural gas has cost is currently about $3/Million BTU between 1 and 11 dollars per thousand cubic feet for decades (https://www.eia.gov/totalenergy/data/browser/?tbl=T09.10#/?f=M), and about 5800 cubic feet equal a barrel of oil equivalent (BOE). This puts the range of prices in BOE as approximately $5.80 to $63.80, well below the

[vii] EIA did not include estimated costs if Carbon Capture and Sequestration were to be applied to Gas Turbines. I’m not certain whether this is due to a lack of data or technical limitation that prevents CCS from being applied to gas turbines (I can’t think of one but if anyone knows this please let me know).

[viii] From the IEA (different than EIA), US per-capita electricity consumption is 13 MWh/capita compared to 0.8 for India. India also has 3 times the population of the US. Therefore if the US cuts the carbon footprint of electricity generation by a factor of two through the use of Natural gas, India could wipe out all of those gains by installing the exact same plants “more environmentally friendly” plants in order to lift their per capita energy consumption to 3 MWh/capita, less than a quarter of the US per capita demand. This is why I find it dishonest to claim that countries like India aren’t doing their fair share in multi-national climate accords that show their total emissions rising while countries like the US decrease.

[ix] I’ll get back to this in a later post. I’ve already written too much.

Climate Change, Process Data Analysis

How Many Solar Panels Can You Make For the Cost of The Border Wall and Would They Even Fit?

This was originally something I looked at over the course of a few minutes and wanted to post on Twitter, but then realized there was no way I was going to make it fit. So here it is on the depository I created for the long-form version of my most boring-est thoughts.

A story broke earlier today that President Trump suggested adding solar panels to the border wall to help it pay for itself. In light of this and earlier accounting suggesting the price of the Border Wall as originally envisioned could be as high as US$66.9 billion, I was wondering how much electricity we could currently build in solar for that amount of money. As it turns out (assuming my math is correct), if we were talking about using that sum of money to build utility scale single axis tracking solar systems, we could potentially build enough solar capacity to generate about 10% of the total electricity consumption of the US.

My 30 second Excel spreadsheet completed with a Wikipedia source is below.

Value	Unit	Source
1.49	$/Watt Solar Energy	http://www.nrel.gov/docs/fy16osti/66532.pdf
66,900,000,000	Dollars (66.9 Billion)	http://time.com/4745350/donald-trump-border-wall-cost-billions/
44,899,328,859	Total Watts	66.9 billion/$1.49 per Watt
393,318,120,805 (393 billion)	kWh/yr	(Total Watts/1000)36524
3,913,000,000,000 (3,913 billion)	US Electricity Consumption (kWh/yr)	https://en.wikipedia.org/wiki/List_of_countries_by_electricity_consumption
10.1%	% of Total US Electricity Consumption

In fairness to myself, that Wikipedia article did do a good job of at least listing its source, the CIA World Factbook.

Now, would the wall provide sufficient surface area for all that wattage? Google tells me that solar panels generate 10-13 Watts per square foot. Based on my total wattage number above, 44899328859, this means we would need about 4489932885.9 square feet of area using the conservative 10 Watt/square foot number (see what I did there?). My favorite source also tells me that the US-Mexico border is 1954 miles (10117120 ft). That would make the border wall need to be over 400 feet wide to accommodate the proposed wattage at 10 watts per square foot.

This doesn’t really belong here, but here’s a picture I took of an ostrich to break up all the boring text.

Yeah, so none of this seems likely for a bunch of reasons. On top of the behemoth panel width, I’m guessing the proposed panels were probably fixed and not tracking, which brings down the efficiency, and the installation costs would be massive and likely in addition to the actual construction costs of the wall instead of displacing any of those costs. As for how you tie in 1954 miles of 400 foot wide solar panels into a distribution grid and how much you would lose in transmission losses tying all that in, I’m just going to say that’s outside my scope. Not even I like math that much.

Feel free to check my work and let me know what I messed up if you find anything.

P.S.-After posting, I realized the formatting of the Excel table is terrible. I’m not fixing it, I just wanted to make sure you knew that I knew that it is terrible, that’s all.

Process Data Analysis

Ghost Ride The (Pencil) Whip

During my undergraduate study at The University of Tulsa, I was required to take Organic Chemistry and lab as part of the core Chemical Engineering requirements. Although I enjoyed the subject, the lab portion of the class was boring and tedious, and my work tended to be sloppy and rushed.

One thing we were forced to do periodically was find the melting point temperature of whatever solid substance we had precipitated out of solution that day. I remember this task distinctly because it was always the last thing we had to do before the lab was finished. The first time we had to do this I rushed the test, turning up my Bunsen burner as high as I could to get out of the lab and on to something more interesting. My boiling point was way lower than the number the literature suggested it should have been for the pure component, so I had to perform the test again to make sure there wasn’t some contaminant in my precipitate that lowered the result. What I discovered was that my mercury thermometer’s indication couldn’t quite keep up with the actual temperature of the material when heated so quickly, and there was still some lag in the reading even when I heated the precipitate at a “normal” rate.

Following this discovery, I began performing this test more slowly to make sure my numbers were correct… Just kidding, I started making my boiling points a little bit less than what the textbook said they should be and called it a day. I was a Freshman in college and had more interesting things to do than spend an extra 30 seconds watching Mercury rise. I guess the moral of that story, other than “Cory was a terrible O-Chem lab student”, is that if you tell someone to perform a test and all they need to document is a single, relatively predictable number, you can expect that number might be made up.

It seems strange now that my terrible lab work would teach me some of my best lessons. If I was to make up or massage lab numbers, I was going to give some good numbers, or at least believable ones. I figured out that different types of analysis have different biases, different ways they naturally skew data. I knew straightaway that I couldn’t make my numbers too perfect, but they also couldn’t be messed up in a way that the test method would have never done. In the real world, this knowledge is far more valuable than most of the things you officially ‘learn’ in lab, because as it turns out testing results get made up constantly.

Back in 2014 I was just starting my stint as an Offshore Process Engineer for an FPSO in Brazil. One of my first orders of business was chasing down strange results we were having with a field test that consistently produced different results from an onshore lab, and I decided to make my own standard to verify the device we were using was accurate.

The results of the field test were in parts per million (ppm), and the calibrated range of the device should have gone up to at least 100 ppm. However, when I tried to make my own 100 ppm standard the device read it as 350 ppm. I made two more 100 ppm standards and tested them across all of the available field testing devices, and got readings ranging from 340-380 ppm. I may not have been a great chemistry lab student but I’m pretty sure I can pipette a standard volume of fluid without being off by a several hundred percent. At least, I never had pipetting problems in the past, but now I wasn’t so sure that maybe I had skipped the class where they explained how pipettes will secretly suck up some random quantity of liquid unless you know the magic word[i].

OK, enough pipette talk. Creating a standard wasn’t the first troubleshooting method attempted, as a colleague of mine who was convinced the issue was with the onshore lab had left on my desk four signed and dated calibration certificates from a licensed third party contractor that showed that all four devices I had tested were nearly perfect. Each certificate showed how the device read the standards created by the contractor, who presumably hadn’t been absent from the aforementioned pipette awareness day in college:

Prepared Standard (ppm)	Device 1 Reading	Device 2 Reading	Device 3 Reading	Device 4 Reading
0	0	0	0	0.1
10	9.7	10.2	10	10.4
50	49.4	50.3	50.5	50.7
100	99.6	100	101	99.9

Drawing upon my experience as a terrible lab student, I immediately knew something was wrong, or we had gotten some sort of magical chemistry wizard out to prepare the most accurate standards I had ever seen. I refused to believe this guy prepared a bunch of standards so precisely that he got each of the devices to read within about 1% of where they should be while I seemed to be hitting everything but the lottery with my results. When I wrote to the vendor support site, even the president of the company that manufactured the equipment found the need to chime in and note that they could not reproduce these types of numbers that the contractor had provided in their own lab. The certificates were almost certainly BS.

But just to be certain…I had our certified third party calibration expert come back out and watched over his shoulder as he got the devices all calibrated and reading correctly, this time within a +/- 15% range. As suspected, all of the previously calibrated devices were off by a factor of about 3.5, and my ability to work a pipette was validated. I distinctly remember being more upset that the guy wasn’t competent enough to make up reasonable numbers than I was that he didn’t actually do any work. The latter I had already become accustomed to expecting in Brazil, the former would take me another few months to get used to as I had more run-ins with what they call the Jeitinho Brasileiro.

A few months later I ran into an issue with results coming from the same device. To be fair, the problem I had wasn’t that the device itself was giving bad readings as it was the people responsible for collecting data seemed to disappear for long periods of time without performing any tests. I decided since I had thousands of data points from them, I might as well see if I could do some sort of randomization analysis of their results to see if anything was amiss[ii].

Randomization analysis is tricky. It’s not often that you can say you have data that should be truly random. Even cherry-picking a random decimal point in a dataset might not work if the results cluster or there are too few significant digits in the data. For example, imagine a test reporting out to the tenths digit numbers that typically fell around 1.9 to 2.3, but never fell below 1.8. This would cause the results to bunch around the minimum, over-representing numbers close to it. Even the tenths digit would be a poor choice of a random number, as this would be skewed towards the digits that appeared in the most common results, 1.9, 2.0, 2.1, 2.2, etc., while I would also expect a relative dearth of threes, fours, fives, sixes, and sevens if the result rarely rose much past 2. However, if you were to examine a test where the results were spread between 15 and 100 (increasing your number of significant digits to 3 in the process), those tenths-digit numbers might start to look very random.

Fortunately, these happened to be the type of results I recently had to review. They ranged from the mid-teens to over a hundred, with no clear low-end asymptote to skew that last tenths digit one way or another. Perhaps if some of the operators performing the test rounded off results to the nearest whole number I would expect to see an excess of zeroes, but other than that I couldn’t think of a reason that tenths digit wouldn’t appear to be random. Most importantly, I had been diligently recording all of these results into an Excel spreadsheet every day for over a year, and everyone knows that you can’t have more than a hundred numbers in any given spreadsheet without some sort of dubious statistical analysis being performed on it[iii].

I took all of the results and deleted all of the times where no reading was taken (as these blanks would register as 0.0 and throw off the analysis), then used the MOD function of excel to get the remainder left over when you divided the results by 1. Multiplying this by 10 gave me a neat set of whole numbers between 0 and 9. I used the Excel data analysis add-in to run a histogram on these numbers and found that out of 7358 readings the tenths digit was only zero 531 times. Using the binomial distribution function of excel I calculated the odds that a random sample of 7358 numbers between 0 and 9 would only have 531 or fewer 0’s. The odds of that happening naturally in a set without any inherent bias are, to put it lightly, low.

This is the part where I want to caution that the fact that an outcome is unlikely does not necessarily mean foul play was involved. For instance, you would only have a 12.5% chance of winning a coinflip three times in a row, but I wouldn’t automatically call you a liar or a cheat if this happened. However, the odds of the results containing this few “0’s” is the same as winning that coinflip 56 times in a row. As they like to say at the coinflip table in Vegas; “Fool me 55 times, shame on you, fool me 56 times, shame on me.” To make matters worse, if any real results were rounded to the nearest whole number, the actual number of legitimate zeroes would be even less. Also, it just so happens that when human beings try to create random numbers, they generally select 0 a lot less frequently than other numbers[iv].

Digit	Frequency
0	531
1	843
2	873
3	661
4	818
5	705
6	658
7	795
8	750
9	724
Total	7358

So there are approximately 200 zeroes missing-plus or minus maybe 30, as there is about an 8% chance you would get fewer than 700 zeroes, back within the realm of possibility. On average, you’d have to make up 10 readings without a zero for a zero to go missing, and you would have to make up 2000 readings for 200 zeroes to go missing. And that assumes that you never make up readings with round numbers. If they make up random readings that are whole numbers 5% of the time instead of never, then that means that about 4000 of the readings are fabricated, as it would take 20 bad readings to get rid of a zero. And of course, if any of those guys ever round off their numbers, boosting the number of zeroes artificially, the situation could be even worse than this indicates.

For me, the moral here is that if the number looks strange, investigate the number first. This doesn’t just apply to numbers supplied by humans either, transmitters slide from their calibration or break for other reasons all the time[v]. I have found data scrambled in so many ways that it would be impossible to remember them all. However, what I do know is that it’s much easier to troubleshoot why a number is wrong than to scour your process in vain searching for explanations for garbage data.

[i] Skittles

[ii] This is apparently not everybody’s first response to a problem like this. In fact, my former rotating alternate had this to say about my analysis: “Wow, you really went full Beautiful Mind on that. I am surprised that I did not go into our cabin to find a bunch of newspaper articles cut out with pins and red string connecting them. Or photos of people with the eyes cut out. I think that was a different movie.

[iii] Cough, six sigma, cough.

[iv] I know I read this somewhere, but I can’t remember where I got this from. It makes sense to me though, I can certainly imagine I would instinctively put a non-zero number at the end of my bogus data if my goal was to create random looking numbers, or at least numbers that didn’t seem made up.

[v] Seems odd to leave a footnote a sentence before the end of the article, but seems wrong to leave out one of my favorite bad data troubleshooting stories entirely. Two machines testing a refined product specification in a refinery had never disagreed until one day, they did…repeatedly. Nobody could figure out why, but they could pinpoint that the machines started disagreeing after they changed the specification in question from -40 degrees to -50 or so. Eventually somebody found out that the first machine was configured to read Celsius, while the second one read Fahrenheit. I guess that just goes to show that there’s nothing like the Jet A cloud point specification to bring together the English and SI systems of measurement.

Share this:

Share this:

Share this: