Announcement

**3WE** · 2016-09-13, 14:47

Originally posted by TeeVee View Post

...atlcrew, who apparently works for an airline, though i don't believe he/she has ever disclosed what exactly he/she does. i'm guessing flight crew but...

-Occupies seat 1B (or is it 0B or -1B?), consumes 02, generates C02.
-Monitors transponder reply light.
-Asks, "What's it doing now?"
-On extremely rare occasions makes inputs into a side-stick, allowing the FBW computers to interpret what they think he wants the plane to do and respond accordingly (within limits).

**TeeVee** · 2016-09-14, 01:59

Originally posted by Evan View Post

Schwartz, again, I defer to your experience. I'm not an IT guy as you can probably tell. But come on... You just stated the challenge there, so why can't we step up to it? Lack of funds? Are you kidding me? Delta enjoyed historic profits last year. They returned $2.6B USD to shareholders! How much of that $2.6B do we need to get a reliable network into shape over four years of development, testing and debugging? They could fund the entire thing with just one year's profit but of course they would spread it out over many years and it would begin to pay off almost immediately by preventing the next $10M meltdown. I'm talking about a fail-operative core system providing all the things necessary to keep planes dispatched with minimal delays. That doesn't need to cover point-of-sale networks, just confirmed booking records and operations (although, as TeeVee points out, POS is probably more important to them). No, of course you don't get it right the first time; that's what the four years are for.

The first challenge is of course to map out all the little things the system is supposed to do. That is how you begin. From there it's just the challenge you described. All it takes is money, which they have, and will, which they lack.

i have a friend who is solid upper-middle management at AA in miami. two weeks ago, as we were sitting in an airport restaurant sharing some sliders before my upcoming flight, i get an automated call from AA informing me my flight had been cancelled. i curse a bit, and tell him how shitty AA is cuz they cancelled my flight one hour before departure on a perfectly clear day. he was shocked and pulls out his iphone and starts tapping away furiously to see why the fight was cancelled. though not relevant to the point i will make in a bit, it was cancelled because AA literally ran out of aircraft and had to cancel a flight to bring home a planeload of folks from cuba (charter back then but scheduled now).

anyway, a few days later we are enjoying some drinks and he starts telling me how incredibly complex the industry is and how ALL of the systems are intertwined and why and wherefore.

i've been flying since 1979. i've flown 1.3 million miles on AA and over 2 million total. i never really gave much thought to how incredibly complex an airline is.

evan wants to believe that POS doesn't have to talk to "confirmed booking" which doesn't need to talk to maintenance, which doesn't need to talk to accounting, etc, etc etc.

sorry mate, you're dead wrong.

**TeeVee** · 2016-09-14, 02:00

Originally posted by 3WE View Post

-Occupies seat 1B (or is it 0B or -1B?), consumes 02, generates C02.
-Monitors transponder reply light.
-Asks, "What's it doing now?"
-On extremely rare occasions makes inputs into a side-stick, allowing the FBW computers to interpret what they think he wants the plane to do and respond accordingly (within limits).

i'm gonna guess a scarebus pilot. cuz no one ever asks what a boeing is doing

**Schwartz** · 2016-09-14, 03:14

Originally posted by TeeVee View Post

the other way to look at this is not by how much havoc a failure wreaks but on how much revenue the airline loses when there is a total system outage. it's not ust that planes don't take off. they also cannot sell new tickets, which is what they are all about.

lastly, evan's original post was questioning whether the network failure created a SAFETY issue, likely as a result of having to make-up for the missed flights. that question was answered in the negative, by atlcrew, who apparently works for an airline, though i don't believe he/she has ever disclosed what exactly he/she does. i'm guessing flight crew but...

I am sure the loss was big, and probably unacceptable given the risk of fire in a data center. I am pretty sure the hit to the brand will cost them even more but that is hard to measure. I am sure someone is busy reviewing their risk assessments and disaster recovery plans.

I totally agree it is not a safety issue, and if it was, then the same safety issue would arise from weather delays, natural disaster delays etc. Policy and process is the cure for that. I guarantee you that regulation would not help, and in fact would likely make the systems worse because a ton of money would go into useless documentation instead of into the system itself.

**Schwartz** · 2016-09-14, 03:15

Originally posted by TeeVee View Post

i have a friend who is solid upper-middle management at AA in miami. two weeks ago, as we were sitting in an airport restaurant sharing some sliders before my upcoming flight, i get an automated call from AA informing me my flight had been cancelled. i curse a bit, and tell him how shitty AA is cuz they cancelled my flight one hour before departure on a perfectly clear day. he was shocked and pulls out his iphone and starts tapping away furiously to see why the fight was cancelled. though not relevant to the point i will make in a bit, it was cancelled because AA literally ran out of aircraft and had to cancel a flight to bring home a planeload of folks from cuba (charter back then but scheduled now).

anyway, a few days later we are enjoying some drinks and he starts telling me how incredibly complex the industry is and how ALL of the systems are intertwined and why and wherefore.

i've been flying since 1979. i've flown 1.3 million miles on AA and over 2 million total. i never really gave much thought to how incredibly complex an airline is.

evan wants to believe that POS doesn't have to talk to "confirmed booking" which doesn't need to talk to maintenance, which doesn't need to talk to accounting, etc, etc etc.

sorry mate, you're dead wrong.

Yep, people regularly underestimate the complexity of this stuff. It is extremely difficult and expensive to build and run these systems.

**Schwartz** · 2016-09-14, 03:21

Originally posted by Evan View Post

Schwartz, again, I defer to your experience. I'm not an IT guy as you can probably tell. But come on... You just stated the challenge there, so why can't we step up to it? Lack of funds? Are you kidding me? Delta enjoyed historic profits last year. They returned $2.6B USD to shareholders! How much of that $2.6B do we need to get a reliable network into shape over four years of development, testing and debugging? They could fund the entire thing with just one year's profit but of course they would spread it out over many years and it would begin to pay off almost immediately by preventing the next $10M meltdown. I'm talking about a fail-operative core system providing all the things necessary to keep planes dispatched with minimal delays. That doesn't need to cover point-of-sale networks, just confirmed booking records and operations (although, as TeeVee points out, POS is probably more important to them). No, of course you don't get it right the first time; that's what the four years are for.

The first challenge is of course to map out all the little things the system is supposed to do. That is how you begin. From there it's just the challenge you described. All it takes is money, which they have, and will, which they lack.

No offense Evan, but I have worked across 6 different industries for or with companies ranging from very small (5) to over 100,000 employees. Not a single one has ever given me an unlimited budget, unlimited schedule, or unlimited resources to work with. In fact, every single one has been constrained by budget and the available people to do the work. Every company I have ever seen doesn't have enough money or properly skilled resources for their IT needs.

So your question is academic. So I'll give you an academic answer: Of course I could build a modern system to replace the existing ones with unlimited budget and resources and I would do a very good job of it. However, it would take a long time, and by the time it was done, I would have a lot of legacy code because you end up with legacy code after your first release. If you follow best practices, it happens really fast because you should build these systems in small increments.

**Evan** · 2016-09-14, 09:18

Originally posted by Schwartz View Post

No offense Evan, but I have worked across 6 different industries for or with companies ranging from very small (5) to over 100,000 employees. Not a single one has ever given me an unlimited budget, unlimited schedule, or unlimited resources to work with. In fact, every single one has been constrained by budget and the available people to do the work. Every company I have ever seen doesn't have enough money or properly skilled resources for their IT needs.

So your question is academic. So I'll give you an academic answer: Of course I could build a modern system to replace the existing ones with unlimited budget and resources and I would do a very good job of it. However, it would take a long time, and by the time it was done, I would have a lot of legacy code because you end up with legacy code after your first release. If you follow best practices, it happens really fast because you should build these systems in small increments.

Where exactly did I say unlimited time and unlimited budget? I said four years. If I give you four years and a mere $100M of that $2.5B in annual profits, are you telling me you can't create a fail-operational core network that will allow planes to be dispatched with minimal delays following a single point of failure? Because that is the challenge I'm describing (if you need another $100M, we have it).

**Gabriel** · 2016-09-14, 09:55

Originally posted by TeeVee View Post

i'm gonna guess a scarebus pilot. cuz no one ever asks what a boeing is doing or makes inputs with its sidestick

Fixed!

**TeeVee** · 2016-09-14, 13:10

Originally posted by Gabriel View Post

Fixed!

thanks!

**TeeVee** · 2016-09-14, 13:23

Originally posted by Evan View Post

Where exactly did I say unlimited time and unlimited budget? I said four years. If I give you four years and a mere $100M of that $2.5B in annual profits, are you telling me you can't create a fail-operational core network that will allow planes to be dispatched with minimal delays following a single point of failure? Because that is the challenge I'm describing (if you need another $100M, we have it).

you're right. you never said unlimited. you did say this though: "They returned $2.6B USD to shareholders! How much of that $2.6B do we need to get a reliable network into shape over four years of development, testing and debugging? They could fund the entire thing with just one year's profit but of course they would spread it out over many years and it would begin to pay off almost immediately by preventing the next $10M meltdown."

you also talk about "fail-operative core" which is not a legitimate industry term. so what exactly do you mean? redundancy? to what level? to what extent? two data centers? four? three server hubs?

how many redundant radios are on the latest aircraft? is that what you want airlines to have for their operating networks? triple redundancy? the manpower alone to staff these things would make wall street take a shit in their pants.

here's the thing: [I]most[I] hardware is pretty robust to begin with. my company used a dell small biz server from 2002 for 12 years before we retired it due to its inability to use the newer version of the windows small biz server OS. in 10 years of non-stop operating, a single hard drive failed, which was of no moment since the server had RAID 1. but getting the same kind of performance and reliability across a world-wide network comprised of multiple sub-networks (WAN, i believe they call it) is insane.

comparing my office network of 10 users to an immense network like Delta's with tens of thousands of users, is like comparing the Wright Flyer to the F-22 Raptor.

**elaw** · 2016-09-14, 13:23

Originally posted by Evan View Post

If I give you four years and a mere $100M...

Okay I know you hate numbers but I'm going to do it again...

The accepted definition of "modern" in the IT world is 5 years... that's the design lifetime of most servers and related equipment.

So you're basically saying airlines should spend $100M every 5 years to avoid an outage that would cost them $10M and happen once every 20 years? I'm pretty sure that plan isn't going to pass cost-benefit analysis...

**ATLcrew** · 2016-09-14, 13:50

Originally posted by elaw View Post

Okay I know you hate numbers but I'm going to do it again...

The accepted definition of "modern" in the IT world is 5 years... that's the design lifetime of most servers and related equipment.

So you're basically saying airlines should spend $100M every 5 years to avoid an outage that would cost them $10M and happen once every 20 years? I'm pretty sure that plan isn't going to pass cost-benefit analysis...

Keep in mind you're arguing with someone who on the one hand knows how to run airlines (and everything else) right, but on the other hand won't tell you when he's flown last, if ever.

**Evan** · 2016-09-14, 19:31

Originally posted by elaw View Post

Okay I know you hate numbers but I'm going to do it again...

The accepted definition of "modern" in the IT world is 5 years... that's the design lifetime of most servers and related equipment.

So you're basically saying airlines should spend $100M every 5 years to avoid an outage that would cost them $10M and happen once every 20 years? I'm pretty sure that plan isn't going to pass cost-benefit analysis...

Again, this an ARCHITECTURAL issue.

You start with architecture and you design a system that has fail-operational redundancy (Yes TeeVee, that is aviation industry terminology).

Then you maintain those networks, replacing hardware components only when they need replacing, but before they become failure-prone or detrimental to network performance. That hardware could also be third-party hardware, as long as it conforms to standards that as yet do not exist.

Will this cost you $100M a year? You know, I highly doubt it. The entire network might only cost a fraction of that amount and would be amortized over many years. I only threw that blue-sky figure out there to illustrate that ample funds are available to do this.

So no, I'm not saying any of that (A good rule-of-thumb to know what I'm saying is that if I didn't say it, I'm not saying it.)

Now, on the issue of cost-benefit, that cannot be the criteria for two reasons: 1) Large corporations lack—or perhaps avoid—the foresight to accurately predict costs; 2) Large corporations undervalue the benefit of reliable customer service and smooth operations. They see stock market performance gains and a near monopoly for customer retention and shrug off both cost and benefit on this issue. However, I think a fair and visionary cost-benefit analysis would favor restructuring these networks.

But since a fair cost-benefit analysis isn't in the cards, we will require regulators to impose standards.

**Schwartz** · 2016-09-15, 04:17

Originally posted by Evan View Post

Again, this an ARCHITECTURAL issue.

You start with architecture and you design a system that has fail-operational redundancy (Yes TeeVee, that is aviation industry terminology).

Then you maintain those networks, replacing hardware components only when they need replacing, but before they become failure-prone or detrimental to network performance. That hardware could also be third-party hardware, as long as it conforms to standards that as yet do not exist.

Will this cost you $100M a year? You know, I highly doubt it. The entire network might only cost a fraction of that amount and would be amortized over many years. I only threw that blue-sky figure out there to illustrate that ample funds are available to do this.

So no, I'm not saying any of that (A good rule-of-thumb to know what I'm saying is that if I didn't say it, I'm not saying it.)

Now, on the issue of cost-benefit, that cannot be the criteria for two reasons: 1) Large corporations lack—or perhaps avoid—the foresight to accurately predict costs; 2) Large corporations undervalue the benefit of reliable customer service and smooth operations. They see stock market performance gains and a near monopoly for customer retention and shrug off both cost and benefit on this issue. However, I think a fair and visionary cost-benefit analysis would favor restructuring these networks.

But since a fair cost-benefit analysis isn't in the cards, we will require regulators to impose standards.

Networking equipment is not a problem. Hard drive failures are not a problem. There are tons of easy solutions for those. From what I've read they lost power in their data center. That was the problem here. I'm going to venture a guess here, but without knowing any detail, I would guess their highest risks are probably threefold: Physical building (which includes power and physical network hookup, like the guy down the street cuts the cable), software screwup, and maybe hacking. This was not a networking issue, it was a physical electrical power issue. I'm surprised they didn't have a backup data center, or if they did, they weren't able to gracefully fail over. Now, if the system wasn't designed to easily setup for graceful fail over, it gets really hard, and testing for a disaster is also really hard (are you going to risk an outage to test your disaster recovery system?). If you've never tested your fail over, it is unlikely to work correctly.

Regulators would never practically be able to set the standards for this kind of complexity anyways. I also disagree that corporations lack the ability to predict costs. We predict costs quite accurately all the time. There are lots of corporations that don't but they either have poor operational insight (incompetent management IMO) or they are subject to too many external dependencies. (like exchange rates, price of oil etc) I also know many companies that value customer service. It usually happens in industries where there is good competition.

**Evan** · 2016-09-15, 10:08

Originally posted by Schwartz View Post

Now, if the system wasn't designed to easily setup for graceful fail over, it gets really hard, and testing for a disaster is also really hard (are you going to risk an outage to test your disaster recovery system?). If you've never tested your fail over, it is unlikely to work correctly.

So what if regulations were established that required airlines to:

a) have back-up systems in place and a network designed to gracefully fail over;

and

b) thoroughly test this aspect of the network before it goes online.

I think this is unlikely to work if the network is cobbled together from legacy networks left over from a litany of mergers and expansions and outsourcings. But a clean-sheet core network should easily meet these requirements, don't you think? Remember, you have four years and abundant financial resources...

Announcement

Is this an aviation safety issue?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment