View RSS Feed

Virgil

Stress and Frustration at Work

Rate this Entry
I hate to write an angry blog. I have so many ideas for positive blogs just waiting to be written (Walt Whitman’s poetry, the great baseball season the Baltimore Orioles had this year, coping with Matthew’s terrible twos, etc.) but time and laziness has prevented me from writing them. But venting is an incredible motivator, and so this one gets written. Be aware this is a work related blog, so names have been changed and details are avoided for internet anonymity. Not even sure if anyone here is even going to care about this one. It's long and tedious.

Six months ago, the week before an upper management review of my project, one of the principal sub elements to the system we’re developing had a major failure with catastrophic breakages. To give a little background, the system we’re developing is comprised of three principal sub elements and when you integrate it all together it will be a unique system. Actually all three of the sub elements are fairly unique in themselves. That’s about as specific as I want to get. The broken sub element was our only prototype, and if we wanted to build another from scratch it probably would have taken a year and a lot of money. But it was repairable. We just had to make sure that whatever caused it to break was understood and we addressed the design. I briefed the incident to my management, and they gave me the go ahead to figure out what went wrong and redesign and rebuild it. I came up with a nice program plan change where we could progress on the other two sub elements while the broken sub element was rebuilt. And if everything worked out as plan, that is the broken sub element back in functionality in six months, I would actually catch back up to the original schedule, and other than the cost of designing and refurbishing there would be no major impact.

It took almost two months to do a root cause failure analysis which identified a rather minor design change, so minor that we were kicking ourselves for not having that in the original design. Four months after that, the broken parts are replaced with newly fabricated parts, put together, and ready to be tested. Right on schedule, and that was last week with coincidentally another management review the very week after.

And I was preparing for the management review all last week. It normally takes me a solid week to put the brief together, and that’s even with taking work home. At the beginning of the week I felt a sinus infection, and it kept getting worse so that by Wednesday morning I had this sharp splitting pain in my head behind my sinuses; it felt like my head was going to explode. I had never had sinus pain like that before. There was no way I could go to work. I needed to get to a doctor right away. So I took off, got to the doctor for antibiotic prescription, knowing full well I would be working over the weekend to get my brief done. I sent an email out from my blackberry wanting to know how the test of that sub element went. I never received one. Thursday morning I got back to work (the antibiotic worked remarkably fast) and played the messages on my phone. The sub element under test the previous night had had another catastrophic failure in much the same way it did six months prior. I just tossed my pen that I was writing with across my desk and against the wall.

So not only was I behind in my brief but I had to rework parts of it to address what just had happened, and without really understanding the cost, schedule , and technical implications of this second failure I had to re-strategize to at least project some level of confidence that this is still a viable program. With poise, honesty, and even humility I gave a pretty decent presentation. Though there were some unhappy faces I did not get beaten to a rhetorical pulp as can happen in these briefings. The most critical statement came from one manager who said he wanted to know what I was going to do differently this time to getting to the cause. Fair enough. I certainly didn’t intend to go down the very same path as before.

And the implications of this second failure were graver. That simple fix of the first breakage took six months to get it back functional…or rather functional enough to re-fail. A major redesign could mean a year, and now the other two sub elements have gone as far as they could without integrating with the third. There just isn’t much for the people on the other two sub elements to do until the broken sub element came back on line. I can’t pay people to sit around doing nothing for six months to a year and wait for that third element. I’m probably going to lose those people to other programs and there’s no guarantee I’ll get them back when I need them. Or they may just be let go, given the economy the way it is, and it’s not getting better. And finally if it does take a year to refurbish the broken sub element, then the timing of this completed system will not meet the customer’s needs. A year’s schedule slip could mean the program’s termination.

So you can see the stress that we are under.

Right after the management brief, we held a team meeting to plan a go forward strategy. The team members who did the root cause analysis six months ago were in denial. They really could not believe what had just happened. Embarrassed yes, but beyond that—stunned. The design change either had no effect or if anything actually made things worse. Let me bring in George (name changed). He’s the expert on this sub element, one of the original designers. He’s got a very strong personality, gets incredibly prickly if he or his work is criticized, and because of his personality and the status of his expertise, he dominates his sub team, so that his team members are overly differential. He was particularly embarrassed too.

And he ought to be. Six months ago after the first failure I questioned the way the sub element had been fixtured. It was different than originally fixture where the vibration was causing it to not function properly. So George designed a super stiff fixture, and for the first couple of runs worked. I didn’t particularly like it before we tested it (it just seemed to be too extreme a solution) but it seemed to work. Well, shortly after that it failed the first time, and as we were brainstorming I mentioned the stiff fixture. I wasn’t an expert on that technology, but when I actually did design work back in my younger days, I was pretty decent at stress analysis. When you increase the stiffness, the loads on parts increase because they don’t dissipate elsewhere. If there is a part that is borderline to failure, the increased load could push it over to failure. That’s basic stuff. George disagreed. He insisted that the fixture was decoupled from the load transfer (in retrospect, how could that be? It can’t.), and, when I pressed his team on this, in his prickly manner said that in his “engineering judgment” built from twenty-something years designing this technology it wasn’t the issue. However to satisfy me they would include it in their fault tree possibilities.

Ok, a couple of months later they completed their root cause analysis—which I thought was fairly thorough—and they identified a single part that was improperly designed. The increased stiffness issue was deemed a possible contributor to the failure but only because the incorrectly designed part couldn’t accommodate the new loading. A single part needed redesign with no changes to anything else. Alright that seemed to jive. So off we were to manufacturing the broken parts and the new part.

But after that analysis we had to address our risk mitigation plan. A risk mitigation plan (or some people call it a risk register) is a documentation of all the technical and programmatic risks that might be encountered, assessed and scored (usually low, medium, or high), and those rated high risks mitigation plans are put into place and with a backup plan if the risk is actualized. I called George into my office to come up with the new risk evaluation for the broken sub element. Risk mitigation planning is something I take very seriously, and I go to great effort. People don’t realize how important understanding risk is to a project. As you can see a failure can alter the best laid plans. Wikipedia has a good write up on risk management (http://en.wikipedia.org/wiki/Risk_management) if anyone is interested.

So sitting in my office we get to “robustness of parts insufficient” for his sub element. Prior to the failure it had been rated a moderate risk, given it had worked a few times, though with vibration. Without even a thought that it would be controversial I said that this had to be a high risk until proven otherwise. Well it was controversial. George didn’t like that at all. “What,” I exclaimed, “you want me to not raise a red flag after this had just broke?” “Yeah,” he insisted. And here was his logic. The sub element had gone through design and was working for a while. We failed because of a single part. We analysized it to death, so that now we completely understood it, and so it should stay at moderate, if not actually be reduced to low.
“What!? You think because you analysized it you have reduced the risk of something we know has already failed?”

“Yes. It was a moderate risk before the failure and now it should at most be a moderate risk again.” This went back and forth for a while, each time our voices getting louder to emphasize our points.

“No f’n way am I going to stand up in front of management and claim we have actually reduced risk after we just broke the damn thing and it’s going to cost hundreds of thousands of dollars to refurbish.”

“You’re wrong.” And he gave me this reasoning. “What’s the risk of being hit by lightening? Low, right? Let’s say you get hit by lightening and you survive. What’s the risk of being hit by lightening again? The same. It hasn’t changed. Same thing for the sub element. It was rated moderate; it broke; we fixed it; now it’s the same risk as before.”

I was flabbergasted. My Systems Engineering Lead, who was also in the room actually started nodding in agreement with him. Well, he should have known better, but I think he was more intimidated with George’s elevated voice than really using his brain.

“No, it doesn’t work that way,” I yelled back. Now on the spot of the argument I couldn’t articulate the technical flaw in his reasoning, but I knew he was wrong. Here’s why he was wrong; I was able to think straight without the adrenaline flowing later after the meeting. Being hit by lightening is a known statistical risk because enough events have been tabulated . This blog (http://www.stumblerz.com/chances-of-...-by-lightning/) claims it’s 1 in 280,000, which is a lot higher than I would have guessed, but it can be approximated by statisticians. George is right, it doesn’t change if you’ve been hit once. It’s the same risk. Here’s the difference. The sub element that broke has no statistical history associated with it, or not enough. When we say it’s moderate or high, it’s judgment assessment based on very limited history. All I know is that one out of the five times we actually operated the damn thing it broke. That’s high in my book. But there aren’t enough statistically valid events to base a prediction. In contrast to something you might identify with, take the water pump in your car. The car company has made millions of them over the years and have built up a statistical database of testing and collecting car histories. They know (I’m making these numbers up for illustration) that there is a 95% probability it will break after 80 months of regular car usage and a 5% chance of breaking after 36 months. So you could say you have low risk at 36 months and a high risk at 80 months. George’s sub element doesn’t have any history to know a statistical probability. So it’s a judgment, and given it’s already once failed and it has only been operational a few times, one has no choice but to say it’s a high risk. And thank goodness I left it at that.

George doesn’t know what he’s talking about. After he saw I wouldn’t back down he said, “ok, you’re the program manager, you can do what you want. I just don’t understand risk.” And with that he left the office. Now George’s elevated voice was not a result of anger. It was just an increase in volume to emphasize his points, and because he was trying to impose his will through that volume, I reflexively raised my voice to counter. There was no animosity. I was surprised later at how loud we must have gotten. The system engineer that was in the room was completely taken aback. He would have folded to George’s voice. People from outside the office stopped me to ask what happened since they heard the shouting. It was a very memorable moment.

And I’m sure George sitting in the room with the entire team strategizing after the second failure was remembering both the stiffness discussion and how wrong he had been on the risk. How could he not. And after the team consensus was that the increased stiffness had to play a part in the failure of both breakages, I was burning inside. F’n sh*t.

And it doesn’t stop there. Here’s George’s recommendation. Since we’re under a time constraint, let’s rebuild it again and run it at a lower speed (lower forces on the parts) and accept this capability with the understanding that we’ll have time to redesign it for the customer later on.

What? Build it again without verifying what went wrong? How do you know what speed it can operate at? Even if I got the ok to do that, which I doubt I could, can you imagine standing in front of management with a third f’n breakage?

No. We are going to understand exactly what happened and re-perform the root cause failure analysis and this time I am adding someone outside the current team to manage the effort and another stress analyst who I have complete confidence in to be part of the failure team. We need fresh eyes looking at the problem, and I don’t care what the so called experts on this technology seem to know.

And that’s if we even have a program. The day after the management brief I got a call from what I’ll call the “chief scientist” of the company who wants to meet to understand the technical fundamentals of the system in meeting its requirements. No problem there. I have high confidence in our modeling and system approach. But in five years I’ve been running the program I was never questioned before. And then the day after that, the assistant to the vice president of R&D called to set up a meeting (Friday morning 8 AM) to sit with several of the upper management to understand the viability of the program. Yeah, that doesn’t sound good. I feel like I’m a character in a mafia movie being asked to meet with the Godfather where the likelihood is I’m going to get whacked as I step into the room. That may be a closing joke, but I’m not a happy camper.
Categories
Uncategorized

Comments

  1. Hawkman's Avatar
    Eeek! It's the scenario every team leader dreads. That's the trouble with teams, it's only as strong as the weakest link but the leader carries the can. I would be very careful when you go into that room for the meeting. Make sure there's no plastic sheeting laid over the carpet! (I'd also be wary about accepting lifts from people offering to drive you home )

    Seriously though, it must be infuriating. R&D is always frought, being subject to bugetary constraints and marketablity, given possible returns on the deveoper's investment in time, resources and money.

    Hope the project gets back on track and you can come up with a reliable solution to the problem. I also hope you feel better for having got it off your chest!

    Live and be well - H
  2. qimissung's Avatar
    lol Hawkman! Love the reference to plastic covering on the carpet.

    I'm sorry, Virgil, pobricitio. I, too, hope you feel a bit better. How frustrating to work with someone so, so arrogant! And it's not going to matter where you go or what you do-that person will follow you everywhere!

    I think that I could have told him that the stiff fixture would not work-it's an old analogy, but think of the oak the willow-sometimes making things stiffer makes them more brittle and more likely to break. And his lightening comparison-also bad. I'm not good with a quick response, but after some thought it seems that as you say there is a long statistical history with lightening, but with this product, it also has a history, albeit short and it has broken two times out of how many tries? Those are simply not good odds.

    I think you have handled this as best you can. You stood up to him very well. Good luck as you begin the proceedings-again.
  3. Buh4Bee's Avatar
    One boy! This is an excellent blog and I hope you feel better after having written it. Virgil, you are an excellent writer.

    Is the repaired sub element going to be tested further? Will you be able to establish some more statical history on this piece to, at least, determine if it is actually high risk or moderate risk?

    It really yanks my chain when people think they are above protocol. There is a particular way to do thing, particularly when you are confronting a high risk situation. Glad you are standing your ground.

    I can only imagine the conversation you and Puss are having before bed. Thanks for sharing!
    Updated 10-19-2012 at 08:06 PM by Buh4Bee
  4. Virgil's Avatar
    Ok, here's an update. I didn't get whacked...lol. We had an almost two hour discussion covering a lot of ground. It was actually enjoyable. They did not terminate the program. That's going to depend on the customer and how long and how costly this breakage delay will take and if the customer can tolerate it. Given the nature of the breakage it's unclear how long it will take to sort that out. But the failure analysis will be under high scrutiny. I did get what I want and more. I am getting from outside my team an experienced manager under me to head the failure sub team, another engineer from outside my team on that sub team (I didn't ask for this and not sure what her expertise is, but perhaps it's just to give her experience), and the best stress analyst I have ever worked with and possibly the best in the industry. It's been almost ten years since I last worked with him. He's from India, or at least his ethnicity; not sure if he was born there. He's got this way of letting you think your ideas are shaping his modeling, but in the end he does what he thinks is the right way. I'm curious to see how he and George will interact. I think he's going to let George talk and pontificate but in the end he will do what's right. Which is perfect.
  5. Virgil's Avatar
    Quote Originally Posted by Hawkman
    Eeek! It's the scenario every team leader dreads. That's the trouble with teams, it's only as strong as the weakest link but the leader carries the can. I would be very careful when you go into that room for the meeting. Make sure there's no plastic sheeting laid over the carpet! (I'd also be wary about accepting lifts from people offering to drive you home )

    Seriously though, it must be infuriating. R&D is always frought, being subject to bugetary constraints and marketablity, given possible returns on the deveoper's investment in time, resources and money.

    Hope the project gets back on track and you can come up with a reliable solution to the problem. I also hope you feel better for having got it off your chest!

    Live and be well - H
    Thank you Hawkman. After seeing how long the blog came out, and how complicated in working through my points, I didn't think anyone would actually read it.

    Actually teams, if they are working properly, are more powerful than individuals. I'm a strong proponent of teaming. If working properly there's a self correcting nature to them that corrects mistakes before they happen. But mistakes still can happen. The problem here is that George, who is a very smart guy with a lot of experience, has a problem accepting he might have made a mistake. And his personality dominates. I don't consider him a weak link, at least technically. In the end, they did assess the increased stiffness after the first failure as I wanted. Now whether the analysis was done correctly may be the issue. Or perhaps it's true that it's a minor contributor, and they missed something else. No question though, they did something wrong.
  6. Virgil's Avatar
    Quote Originally Posted by qimissung
    lol Hawkman! Love the reference to plastic covering on the carpet.

    I'm sorry, Virgil, pobricitio. I, too, hope you feel a bit better. How frustrating to work with someone so, so arrogant! And it's not going to matter where you go or what you do-that person will follow you everywhere!

    I think that I could have told him that the stiff fixture would not work-it's an old analogy, but think of the oak the willow-sometimes making things stiffer makes them more brittle and more likely to break. And his lightening comparison-also bad. I'm not good with a quick response, but after some thought it seems that as you say there is a long statistical history with lightening, but with this product, it also has a history, albeit short and it has broken two times out of how many tries? Those are simply not good odds.

    I think you have handled this as best you can. You stood up to him very well. Good luck as you begin the proceedings-again.
    I am feeling better, thank you. In the engineering world there's a lot of arrogance. You can probably see a lot of arrogance in me in some of my old Lit Net discussions. I've actually been trying to keep that in check, or at least I try to avoid those discussions that bring out the worst in me.

    The oak/willow example illustrates stiffness but the best example I could think of to illustrate how increased stiffness transfers into other parts are the shocks on your car. When shocks are worn, the elasticity (springiness) is gone and when you hit a bump the force doesn't get absorbed by the spring, translates through the frame of the car, to your behind, and up the bones of your spine. The stiffness (lack of elasticity) in the shocks doesn't allow the force to dissipate and you feel the bump more, in each part of your body. Now before someone jumps on it, I know shocks are more complicated than that; they are a hybrid of spring and hydraulic. I simplified to make the point.
  7. Delta40's Avatar
    Virgil I was really engrossed in this blog. I too hope things get back on track. IMHO it is George who is the stiff fixture. Get rid of him and the sub element will work!
  8. Virgil's Avatar
    Quote Originally Posted by Buh4Bee
    One boy! This is an excellent blog and I hope you feel better after having written it. Virgil, you are an excellent writer.

    Is the repaired sub element going to be tested further? Will you be able to establish some more statical history on this piece to, at least, determine if it is actually high risk or moderate risk?

    It really yanks my chain when people think they are above protocol. There is a particular way to do thing, particularly when you are confronting a high risk situation. Glad you are standing your ground.

    I can only imagine the conversation you and Puss are having before bed. Thanks for sharing!
    Once we figure out the problem and we make the changes to fix it, the sub element will tested further. My biggest fear is that we have to scrap the whole thing and build a re-designed one from scratch. It was over a year to design and build the first one.

    Once we get the sub element working, it will see a lot of usage since it will be integrated with the other two sub elements. We have two years worth of testing with it when integrated. If it takes a year to rebuild it, you can see how that's a killer.

    LOL, actually I don't talk about the details of work much, either at home or with friends or family. You can see how long this blog was just to explain one part of a single problem. Risk management, stress analysis, root cause failure analysis - these are terms outside of common speech. If Puss was an engineer I might. I wonder if engineers who are married to other engineers talk shop at home. I'll have to ask some of them if they do. She does hear me cursing over work sometimes.
  9. Virgil's Avatar
    Quote Originally Posted by Delta40
    Virgil I was really engrossed in this blog. I too hope things get back on track. IMHO it is George who is the stiff fixture. Get rid of him and the sub element will work!
    Oh thank you Delta. I wasn't sure if the blog would generate any interest.

    IMHO it is George who is the stiff fixture.
    I'll have to remember that and use it as a joke somewhere.
  10. Bluebiird's Avatar
    Since you read all of my long blogs I couldn't not return the favour. Writing out your frustrations is very nice. I'm glad you're feeling better now.
    I don't have anything helpful or insightful to say so here are some phrases from my collection of fortune cookie fortunes, see if any help;
    Challenges are what make life interesting; overcoming them are what make life meaningful.
    Victory belongs to the most persevering.
    Always forgive your enemies - nothing annoys them so much.
  11. Virgil's Avatar
    I fully endorse the first proverb. I have decorative quote hanging on my office wall with something similar. I fully agree with the second proverb. And in time I always forgive everyone, and I truly mean that. However, I must add there really wasn't any animosity between George and me. We had a difference of opinion and in the heat of the moment of pushing our points of view our voices got loud. We've been working well together. It's just that he was wrong.

    Thank you and you didn't need to return the favor.
    Updated 10-21-2012 at 10:49 PM by Virgil
  12. Bluebiird's Avatar
    I just put the enemy one in for fun
  13. mtpspur's Avatar
    Wow oyu leave for awhile and log back in and the world has been redesigned. I hope to give this the attention it deserves over next two days. Management stories are always interesting to me as they seem to have lost touch with the ground level real work being done. Hang in there.
  14. LadyLuck's Avatar
    I know I'm waaaaay behind, but I hope that things are getting better by now. Work stress is never a good thing, and I hate it when I have it.
  15. mtpspur's Avatar
    With the new changes here I'm lost more then ever and work at my place is getting bad lately since the merger and the loss of the call takers. Hope things get better. Plus i can't find the onlibe library here. Sigh
  16. Virgil's Avatar
    Thanks Rich and LadyLuck.
  17. Joreads's Avatar
    Wow Virgil there is nothing worse than being under pressure at work. It is actually nice to see that there are others that feel the same way. Its great being a team leader sometimes isn't it

    I hope that everything works out for you.
  18. Virgil's Avatar
    Jo!!! It's been such a long time.

    Thanks. There's still a lot of friction, but we'll get there.
    Updated 11-29-2012 at 12:14 AM by Virgil