Outage Nightmares

We host websites and email. We take our responsibility very seriously and we understand that the margin of error is slim -- 100% uptime is the only acceptable standard.

But, as we administer and manage these servers, to well over 99.99% uptime, we know that perfection is impossible and that the stumbles made by the biggest names in corporate America are legendary.

Here are a few...

Microsoft Email Crash Affects Millions Of Users
September 10, 2011 by NEWSCOPE Skype Still Struggling After Worldwide Outage
December 23, 2010 by Reuters TWITTER and FACEBOOK Investigating Service Disruptions
August 6, 2009 • By Brian Prince • eWeek.com News story

REGISTER.COM suffers further DOS attack: Millions of sites downed again
Thursday, 2 April 2009 by Dean Pullen

FAA Says Systems That Process Flight Plans Is Down, Mass Flight Delays
Aug. 26, 2008 By KATE BARRETT

APPLE DISCUSSES MOBILEME SHORTAGES
July 27th 2008 By Dee Chisamera

FACEBOOK CRASHES IN WAKE OF 'SCRABULOUS' TAKEDOWN
July 29, 2008 • C/NET NEWS Posted by Caroline McCarthy

AMAZON WEB SERVICES GOES DOWN, TAKES OUT SOME WEB 2.0 SITES
February 16, 2008 • The Guardian - Jack Schoefield

OUTAGE LEAVES MANY HOTMAIL USERS COLD
February 26, 2008 • Posted by Ina Fried, News.com

AMAZON’S S3 OUTAGE: IS THE CLOUD TOO COMPLICATED?
July 21st, 2008 • Posted by Larry Dignan @ 4:02 am

BLACKBERRY SUFFERS 'CRITICAL' OUTAGE
February 12, 2008 • Article from: NEWS.com.au by staff writers and wires

RIM OFFERS EXPLANATION FOR MASSIVE OUTAGE
April 19, 2007 by Marguerite Reardon , Staff Writer, CNET News.com

.MAC USERS MOCK APPLE SLOGAN DURING OUTAGE
July 31, 2006 • By Dawn Kawamoto, CNET News.com • Published on ZDNet News

CHRONOLOGY OF THE RECENT PAYPAL OUTAGE
October 12, 2004, as presented by eBay By Admin

EBAY OUTAGE BLAMED ON SOFTWARE FROM SUN AND ORACLE
June 14, 1999 • A Web Exclusive from WinInfo - Paul Thurrott • WinInfo InstantDoc #18741


Microsoft Email Crash Affects Millions Of Users
December 23, 2010 Reuters
By NEWSCORE

REDMOND, Wash. - The world's largest email provider, Microsoft, was struggling to restore its services Friday after outages that reportedly affected up to 365 million users worldwide.

The service disruptions affected a variety of Microsoft email products including Office 365, MSN.com, Live@edu and Windows Live Hotmail. The extent of the disruption was unclear, but Microsoft confirmed problems in Europe and the Asia-Pacific region.

"We're rolling out a fix that we believe will resolve issues with Hotmail, SkyDrive, and our other Live properties," the company tweeted around 2:00am ET. At about 5:00am ET, the company announced that its Office 365 service, a cloud computing product with email capabilities, had been restored. Other services, including Windows Live Hotmail, appeared to be coming back online, but online forums and blogs were still reporting issues at 5:00am ET.

While it was unclear what caused the outage, there was speculation that Microsoft was caught in a power blackout that hit large parts of America's southwest late Thursday. Friday afternoon, Microsoft said that it had restored its email servicest.

"Service was fully restored as of early this morning," Microsoft posted on Twitter around 1:30pm ET. Microsoft also said on Twitter that it did not believe the outages were related to the blackouts in California and Arizona that left up to five million without power Thursday. The company instead blamed a Domain Name Service (DNS) problem as the likely culprit.


Skype Still Struggling After Worldwide Outage
December 23, 2010 Reuters

A significant number of people are still unable to use the Skype Internet communication service nearly 24 hours after technical glitches brought down the company's network.

Around 9 a.m. EST Thursday morning, the company Tweeted that as many as 10 million users were now able to use the service, although the company recently announced a peak usage of 25 million. This means as many as 15 million people are still unable to access the telephony network -- and advanced services such as group video calling may take longer to restore.

"In the last hour, we've seen evidence of a significant increase in the number of people online," the company wrote in a blog post. "Because of the way the Skype software works, it's not possible for anyone to obtain an exact figure, but we now estimate it to be over 10 million."

Users across the globe reported issues accessing the service Wednesday morning, prompting the company to acknowledge the issue on Twitter: "Some of you may have problems signing in to Skype -- we're investigation, and we're sorry for the disruption to your conversations."

Skype followed up with another Tweet assuring users that their "engineers and site operations are working non-stop to get things back to normal."

The problem is unconnected to the hacking attacks that disabled popular websites such as MasterCard and Visa in recent weeks. Chaim Haas, a spokesman for Skype, explained to FoxNews.com that the company's telephony network relies on millions of individual connections between computers and phones to stay up and running, referencing a blog post by the company.

"Under normal circumstances, there are a large number of supernodes available," network features which act like phone directories for Skype, Haas said. "Unfortunately, today, many of them were taken offline by a problem affecting some versions of Skype."

Engineers created new "mega-supernodes" that solved the problem, Haas said. As of about 3:30 p.m. EST, normal services had started returning to Skype, the company said, acknowledging that it may take "several more hours" before all users can sign in again.

Later that day, the company Tweeted again -- this time a request for patience.
Luxembourg-based Skype was founded in 2003 as an alternative to the standard telephone network by transmitting voice, video, and text conversations the Internet.

Behind last night's Bing outage
December 4, 2009 9:36 AM PST by Ina Fried

Microsoft said that a configuration change that was mistakenly moved from testing onto the live Bing.com site was to blame for an outage Thursday that left Microsoft's search engine completely inaccessible for more than half an hour. A Microsoft representative told CNET on Friday that the problem appears to have come when something being tested was moved onto the live site.

"A configuration change was mistakenly propagated to production from staging," the representative said. "It was supposed to stay in the test environment--it was a mistake." In a blog posting that went up late on Thursday night, Microsoft Senior Vice President Satya Nadella said that a change made during testing had "unfortunate and unintended consequences."

"As soon as the issue was detected, the change was rolled back, which caused the site to return to normal behavior," Nadella said. "Unfortunately the detection and rollback took about half an hour, and during that time users were unable to use bing.com."

And here I thought Microsoft was just trying to be energy efficient by running Bing only 23 hours a day.
Nadella said that Microsoft is exploring what went wrong to make sure it doesn't happen again. The outage came just a day after Microsoft announced a variety of changes to Bing, including added detail for some results and improved mapping tricks.

 

 

 


Twitter, Facebook Investigating Service Disruptions
August 6, 2009 • By Brian Prince • eWeek.com News story

Twitter co-founder Biz Stone confirmed in his blog that the social media site had been hit by a denial-of-service attack that knocked it offline for nearly two hours during the morning of Aug. 6. There were also online reports that Facebook was hit by a denial of service attack. However, Facebook officials would only say the company was investigating the reports and would update users as soon as possible. Both sites appeared to be operating normally by around noon EDT.

Twitter is in the midst of defending itself against an ongoing denial-of-service attack, the micro-blogging service reported this morning. Few details have been released so far about the attack. The company, however, confirmed the attack today in a blog post, and noted that even though the site is back up, officials are still working to recover and defend against the attack.

“On this otherwise happy Thursday morning, Twitter is the target of a denial of service attack,” blogged Twitter co-founder Biz Stone. “Attacks such as this are malicious efforts orchestrated to disrupt and make unavailable services such as online banks, credit card payment gateways, and in this case, Twitter for intended customers or users. We are defending against this attack now and will continue to update our status blog as we continue to defend and later investigate.”

There were also reports that Facebook was attacked as well, though officials there have not confirmed the attack. Facebook spokesperson Malorie Lucich said the company was investigating the reports and would update users as soon as possible.

“Earlier this morning, we encountered issues within our network that resulted in a short period of degraded site experience for some visitors,” she said. “No user data was at risk and the matter is now resolved for the majority of users. We’re monitoring the situation to ensure that users continue to have the fast and reliable experience they’ve come to expect from Facebook.”

Ray Dickenson, CTOat security firm Authentium, noted that many denial-of-service attacks are launched from botnets.

“Twitter is such a high profile site, it may be just a bot-herder or one of their customers wanting to show off the power of their botnet,” he said.


Register.com suffers further DOS attack
Update Millions of sites downed again

By Dean Pullen Thursday, 2 April 2009, 19:46

EARLIER TODAY we reported that the domain name registrar behemoth Register.com suffered from wide-scale DNS nameserver problems last night.

The problems have resumed this evening, occurring shortly after 19.30 GMT. Names hosted by Register.com's nameservers are not resolving, and the company website is currently inaccessible.
A variety of INQ readers have commented on the original article, with further news of this new outage.
Though some pointed to a denial-of-service attack yesterday, Register.com support staff are now actively telling customers over-the-phone that the company's servers are definitely currently under fire from some form of DDOS.

From the sounds being heard from disgruntled customers, Register.com is poised to lose a lot of business from this fiasco.

Register.com has been pretty tight lipped regarding the problem so far, and few other news outlets have yet to pounce on the INQ's lead.

Names hosted by Register.com nameservers are still not resolving as this article goes to press, an hour later.

Computer Problem Causes Mass Flight Delays
FAA Says One of Two Systems That Process Flight Plans Is Down

By KATE BARRETT - Aug. 26, 2008

Travelers are facing mass flight delays today as the result of a computer problem at the Federal Aviation Administration. The FAA has two systems that process flight plans - one located in Atlanta and the other one in Salt Lake City. But the Atlanta system went down at 1:30pm today, and all flight plans are now being handled out of Salt Lake City. As a result, delays could pile up at airports across the country. Delays up to 90 minutes are already surfacing at several airports.

"This was a failure mode we have not seen before," said FAA chief operating officer Hank Krakowski on Tuesday afternoon.

According to the FAA, about 6,500 airplanes are in FAA system, though the aviation agency has not said how many were in the sky and how many were on the ground when the problem occurred. With such a heavy volume of air traffic typically converging on the East Coast, delays could spread depending on how much time it takes to iron out the problem. Krakowski said most of the delays were happening in the eastern portion of the United States, with none reported west of Dallas or Chicago.

'SCRABBLE' APP ON FACEBOOK CRASHES IN WAKE OF 'SCRABULOUS' TAKEDOWN
July 29, 2008 • C/NET NEWS Posted by Caroline McCarthy

When Scrabulous, a popular game on Facebook's developer platform, was shut down earlier on Tuesday because of copyright infringement issues with the manufacturer of the Scrabble board game, word game fans weren't totally left in the dark. After all, Electronic Arts (which handles the digital rights to Scrabble for the game's parent company, Hasbro) had recently created an official beta version of Scrabble for the platform.

Problem is, the servers that were hosting the "real" Scrabble app couldn't handle the load of new migrants, and the application crashed on Tuesday afternoon. Oops!

"We'll be back up shortly," an apologetic error message read. "We're working on some tech problems and Scrabble will be ready to play as soon as possible!" The game is slated to exit the beta phase in the middle of next month, and some (my colleague Rafe Needleman among them) initially found it to be a better-quality game experience than Scrabulous had been.

But in the wake of a server crash, Facebook users weren't too pleased, as the message wall for the Scrabble application revealed. "Wow, does this suck," one Facebook user wrote. "Why can't you guys work out a licensing deal with the Scrabulous boys? Now we're back to square one and have to go through all of your debugging process."

Well, to be fair, rumor has it that Hasbro put out an acquisition offer for Scrabulous, only to have it rebuffed because its creators thought the amount offered was insufficient.

"Sucks, sucks, sucks," another Facebook user said. "Locks up at 30 percent loading. Sucks. Oh, did I mention it sucks? Get a grip, Hasbro."

Too bad "FAIL" will net you only seven points.      [ BACK TO TOP ]


APPLE GETS UNUSUALLY CHATTY ABOUT MOBILEME SHORTAGES
July 27th 2008 By Dee Chisamera

This week we saw an unusual chatty Apple, if we consider the MobileMe service functionality problems. MobileMe had a rough start two weeks ago, when users reported issues with the “Internet service that takes the best of .Mac and more.”

Not only did Apple apologize last week, but it also offered customers a 30-day extension eligibility for their MobileMe subscription and promises to keep them updated about the repairing process, which they did.

“Be assured people here are working 24-7 to improve matters, and we're going to favor getting you new info hot off the presses even if we have to post corrections or further updates later,” Apple's blog said on Friday.

It appears that 1 percent of the MobileMe members reported a mail outage last Friday, when one of Apple's mail servers blocked their access to their MobileMe mail accounts. Apple reported fixing the problem, but unfortunately, the affected members will only be able to read mail they've received since last Friday, but not prior to that.

The company expects to restore full access to the accounts and estimated that it should take no longer than a week for that to happen. However, I appears that the affected users have lost 10 percent of their mail messages received between July 16 and July 18.

So what exactly happened on launch date? Apple blames more traffic than they had anticipated for the failure to access the web versions of the MobileMe applications – Mail, Contacts, Calendar, Gallery, iDisk.

However, “we've since added server capacity and tuned our software to scale better – i.e. behave more gracefully when traffic spikes.”

Overall, Apple reported 70 bugs fixed, including the one that prevented MobileMe IMAP mail folders from syncing correctly between the web app and Mac OS X Mail or Outlook. Further details are expected next week.      [ BACK TO TOP ]


OUTAGE LEAVES MANY HOTMAIL USERS COLD
February 26, 2008 10:22 AM PST
Posted by Ina Fried, News.com


Many Hotmail users got this message when trying to access their inboxes Tuesday: "SERVICE UNAVAILABLE"

Microsoft's Windows Live services experienced a significant outage Tuesday, leaving many users unable to get to their Hotmail inboxes. A company representative said all Windows Live services are affected, though not all users are reporting problems.

Microsoft said it is still trying to determine the cause of the problems.

"We are aware that some customers may be experiencing difficulty accessing their Windows Live accounts," the software maker said in a statement to CNET News.com. "We're actively investigating the cause and are working to take the appropriate steps to remedy the situation as rapidly as possible. We sincerely apologize for any inconvenience and disruption this may be causing our customers."      [ BACK TO TOP ]



Amazon Web Services goes down, takes out some Web 2.0 sites
July 21st, 2008
Posted by Larry Dignan @ 4:02 am

Some sites based on "cloud computing" got a wake-up call yesterday when the system failed. Amazon Web Services stopped working yesterday morning, which affected a number of Web 2.0 sites. TechCrunch was quick to point out that this blew a big hole in the "cloud computing" hype that seems to be prevalent in Silicon Valley at the moment. It said:

"This could just be growing pains for Amazon Web Services, as more startups and other companies come to rely on it for their Web-scale computing infrastructure. But even if the outage only lasted a couple hours, it is unacceptable. Nobody is going to trust their business to cloud computing unless it is more reliable than the data-center computing that is the current norm. So many Websites now rely on Amazon's S3 storage service and, increasingly, on its EC2 compute cloud as well, that an outage takes down a lot of sites, or at least takes down some of their functionality. Cloud computing needs to be 99.999 percent reliable if Amazon and others want it to become more widely adopted."

Amazon Web Services is nothing like that reliable: it seems it only aspires to 99.9% availability, which would have been unacceptable in an antique mainframe, let alone a specialised fault-tolerant server. If people really want "five nines" availability, they'll have to pay for it, and at the moment it doesn't come at anything like Amazon's prices.

One of the people promoting cloud computing is Greg Olsen, founder and chief technology officer of Coghead. Rather amusingly, the day before Amazon fell over, GigaOM published his guest column about adopting this stuff. He wrote:

"By leveraging service options like Amazon's EC2 and S3, a small company can deploy a complex, highly available and scalable multi-user software application -- without huge upfront investments in hardware or software infrastructure. Likewise, a very small company can build a simple, narrowly focused service and can cost-effectively sell it to a mass audience. Neither of these companies would have been possible only a short time ago."

Although I have a natural resistance to boosterism, I think Olsen is right and TechCrunch is wrong. Cloud computing does not need to be 99.999% reliable to get adopted by Web 2.0 companies. It makes sense to adopt it because it's cheap and because you don't need much technical competence to do it. It therefore meets Web 2.0 needs very nicely.

Of course, you'd have to be incompetent way beyond stupidity to build your banking, air traffic control, hospital or mission-critical corporate system on Amazon Web Services, because these do need to be reliable. Web 2.0 systems don't. Who really cares if Twitter goes down for a couple of hours, or even a couple of days, apart from the people who run Twitter?

There are, however, a couple of useful lessons from the debacle. The first is that "cloud computing" is still mostly hype. It will stop being mostly hype when service providers start to offer guaranteed service level agreements (SLAs) backed up by real financial guarantees.

The second is that relying on somebody else's unreliable system makes your system less reliable, not more reliable. You don't have "five nines" reliability in whatever it is you do if you're using a supplier that only has "three nines" reliability. And if you're relying on a beta Web 2.0 site that's relying on another beta service like Amazon Web Services, then you're just asking for trouble.

Web-based services are great, especially if they're free or very cheap, but it's insane to pretend they have the reliability of the electricity grid (which isn't wholly reliable) or a water utility (ditto, plus leaks). Web sites today don't guarantee reliability, availability or adequate performance, and there are lots of ways you can lose not just the service but also your data (as I wrote in a column this week). I'm not saying you shouldn't use them. I am saying that you should know what you're doing. Yesterday just showed that some people don't.      [ BACK TO TOP ]


AMAZON’S S3 OUTAGE: IS THE CLOUD TOO COMPLICATED?
July 21st, 2008
Posted by Larry Dignan @ 4:02 am

Over the weekend Amazon’s S3 storage service was down for an extended period and a bunch of Web 2.0 sites lost avatars, images and other items on their sites. Since enterprises haven’t totally jumped on the bandwagon Amazon’s outage didn’t have broader ramifications. But Amazon’s latest outage–the second big one this year–will hamper dreams of enterprise class services for the masses.

After all, the dream for cloud computing is enterprise reliability for pennies. In this view, the cloud will just work, uptime will always be there and we’ll tap into this architecture and always be tethered to the Web. Michael Krigsman gives Amazon props for transparency with its latest outage, but the larger issue is reliability and how much redundancy should we expect for a few pennies a gigabyte (Techmeme).

If Amazon can’t democratize cloud computing and bring us a bunch of “9s” reliability who can?

Om Malik writes:

The outage shows that cloud computing still has a long road ahead when it comes to reliability. NASDAQ, Activision, Business Objects and Hasbro are some of the large companies using Amazon’s S3 Web Services. But even as cloud computing starts to gain traction with companies like these and most of our business and communication activities are shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous web.

Om hits the mark. The problem: The Web is one big legacy system. And cloud computing relies on millions of connections and services. In other words, it’s a troubleshooting nightmare when the cloud goes bust.

And like any company wrestling with legacy systems cloud computing vendors will dust off a tired playbook. The solutions will be the usual: Relegate legacy systems to plumbing and create more services and applications to keep infrastructure current. In other words, the cloud will likely become more of a rat’s nest. What’s scary about that prognosis is the cloud is already too complicated since it’s built on creaky infrastructure.      [ BACK TO TOP ]


BlackBerry Suffers 'Critical' Outage
Article from: NEWS.com.au
By staff writers and wires
February 12, 2008 10:38am

A CRITICAL BlackBerry network outage in the US has hampered business deals and presidential campaign plans after users were left stranded without access to email.

The maker of BlackBerry handsets, which are ubiquitous in professional and political circles and are used to send and receive emails on the run, said its US network had experienced a "critical severity outage" today.

"This is an emergency notification regarding the current BlackBerry Infrastructure outage," said an email sent by company Research In Motion to its large BlackBerry clients.

The email said the outage affected business clients and "users of the Americas network".

Research In Motion did not say what caused the outage, when regular service was expected to be restored or how many people could be affected.

About one hour after the notification, some customers said a few emails were going through. Others said they continued to be without service.

Some BlackBerry users appeared to enjoy a respite from the device, which has been affectionately dubbed the "CrackBerry" due to its addictive nature.

On Parliament Hill in Ottawa, Liberal Party spokesman Jean-Francois Del Torchio said things seemed very relaxed for a while.

"It made my life a little bit easier, since I didn't have to reply. But when I arrived at my desktop and I saw all the e-mails I received, I was like, 'Oh, I still need to work'," he joked.

Carmi Levy of AR Communications, said service reliability was a serious concern for telecommunications companies because if problems became routine, they could turn customers away.

A massive outage in April last year crashed the BlackBerry network across the US, leaving thousands of users without access to wireless email.

Research In Motion CEO Jim Balsillie said at the time that such incidents were "very rare" and the Waterloo, Ontario-based company was taking steps to prevent such an outage from happening again.

Executives, politicians, lawyers and other professionals rely on the BlackBerry for its ability to send secure emails.

With Wojtek Dabrowski in Toronto for Reuters      [ BACK TO TOP ]


RIM OFFERS EXPLANATION FOR MASSIVE OUTAGE
April 19, 2007 by Marguerite Reardon , Staff Writer, CNET News.com

Research In Motion finally offered some details late Thursday about what caused a severe outage of its BlackBerry e-mail service from Tuesday evening until Wednesday morning.

The company said in a statement that it had ruled out security and capacity issues as a cause of the outage that left millions of so-called "CrackBerry" addicts without access to their e-mail for several hours. The company also said the incident was not caused by any hardware failure or core software issue.

Ruling out those causes, the company has "determined that the incident was triggered by the introduction of a new, noncritical system routine that was designed to provide better optimization of the system's cache." In computing terms, a cache is a temporary storage area for that allows data to be served up quickly.

RIM said the system routine had not been expected to affect the regular operations of the BlackBerry servers and infrastructure. Despite previous testing, the new system routine produced an unexpected effect that set off a chain reaction, triggering a series of interaction errors between the system's operational database and the cache.

After RIM isolated the database problem and tried unsuccessfully to fix the issue, it began its "failover" process to a backup system. But that also failed.

"Although the backup system and failover process had been repeatedly and successfully tested previously, the failover process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue," the company said in the statement.

RIM also said it has already identified several aspects of its testing, monitoring and recovery processes that it plans to improve as a result of the incident.

Since the outage's start--around 5 p.m. PDT Tuesday--the company had been quiet about its cause. But experts said they were convinced the issue had to do with RIM's network since subscribers were still able to make phone calls and send and receive text messages.

RIM's service is centralized and works by routing all BlackBerry e-mails through one of two main network operations centers, which are essentially large data centers. One center is located in Canada and primarily serves the Western Hemisphere as well as parts of Asia. The other data center, located in the U.K., handles e-mail traffic in Europe, Africa and the Middle East. Analysts had speculated that since most of the people affected by the outage were based in North America that it was likely the problem occurred in the center located in Waterloo, Ontario.

By Wednesday morning, RIM said, the e-mail had begun trickling into in-boxes across North America. The service was operating normally on Thursday, the company said.

RIM has built a strong reputation as a reliable service provider that has attracted bankers, lawyers and lawmakers as subscribers. The company has recently been trying to broaden its appeal to consumers with new products, such as the BlackBerry Pearl handheld and the BlackBerry 8800.

The new strategy has helped the company rapidly expand its subscribers. In its latest quarter, RIM reported it had added 1.02 million new subscribers, taking its total to 8 million. This is a huge increase from the 2 million subscribers the company reported a year ago, when it settled a patent infringement case with NTP. The company expects to add between 1.12 million and 1.15 million subscribers during the current quarter.      [ BACK TO TOP ]

 


.Mac users mock Apple slogan during outage
By Dawn Kawamoto, CNET News.com
Published on ZDNet News: July 31, 2006, 11:14 AM PT

Apple Computer's latest advertising campaign, pegged to the slogan "It just works" is irritating some .Mac users as they wonder when the service will become operational again.

Over the past four days, .Mac users have struggled to get its Web site publishing features, iWeb, and related file-share capabilities, iDisk, to work. Users have complained not only about the length of the outage, but also what they say is a tardy response from .Mac's technical support team, according to postings on Apple's discussion board.

"It is going on 96 hours for me. Completely Unacceptable," wrote a user named BK Broiler in a post to the discussion board. "The .76 IP now pings, since yesterday, but iDisk does not work still. It'll only work with the /etc/hosts trick, but not on its own. I got a canned e-mail from Apple today after 72 hours of silence from the time I sent the trouble call. Thanks, Apple, for making a joke out of long term customer loyalty, and for just not giving a ****. It may be time to switch away from Mac after 20 years."

Apple said Monday it is investigating the issue.      [ BACK TO TOP ]


eBay outage blamed on software from Sun and Oracle
June 14, 1999
A Web Exclusive from WinInfo - Paul Thurrott
WinInfo InstantDoc #18741

Online auction mega-site eBay was offline for over 24 hours this weekend, causing an estimated $2-3 million loss of business for the company. But the company was eager to spread the blame and offset some of the embarrassment by blaming the outage on its reliance on software from Sun Microsystems and Oracle. Ebay's site uses Sun Solaris and Oracle's server database. Particularly damning is the fact that eBay's Web site goes crashing down on a fairly regular basis. EBay is one of the most popular destinations on the Web, but the constant problems are causing customers to look elsewhere.

"We are sorry," wrote eBay CEO Meg Whittman in a letter to its users. "We know that you expect uninterrupted service from eBay. We believe that this is reasonable, and we know we haven't lived up to your expectations. We want to earn back your trust that we'll provide you with this level of service."      [ BACK TO TOP ]


CHRONOLOGY OF THE RECENT PAYPAL OUTAGE
as presented by eBay By Admin, lifted from the eBay announcement page.
Preserved for it's newsworthiness and to document the outage.
Created 10/12/2004


-----------------------------------------------
***PayPal Update***
Date: 10/12/04 Time: 07:19:37 PM PDT

We have made good progress in our efforts to restore the PayPal site functionality. The PayPal site performed well during peak traffic levels this evening, and the overall member experience has improved significantly. Most members are now able to log in to the PayPal site to access account information, use shipping functions, use PayPal debit cards, and pay for items online with no difficulty. Should you encounter any errors when attempting to log in or use different PayPal functions, please try again.

We are monitoring this situation closely, and we will continue to update you as new information is available. We appreciate your patience.

Regards,
eBay
-----------------------------------------------
***PayPal Site Update***
Date: 10/12/04 Time: 10:49:03 AM PDT

We understand the PayPal site issues may be impacting many of you and your ability to do business with PayPal on and off eBay, and we apologize for this situation. We would like to update you on the current status and the efforts being made to resolve these issues.

Today, access to PayPal continues to be intermittent. Some members are able to log in to the site and make payments and perform other activities, although they may be experiencing very slow system responses. Other members are not able to get in right away, or at all. PayPal users may also be having problems with their debit cards.

Sellers who use PayPal shipping functionality may be having problems shipping products to their buyers, and buyers may be experiencing difficulties paying sellers. We encourage members to be patient with trading partners as we work to improve PayPal access.

These PayPal issues are the result of unforeseen problems that resulted when a new code base to upgrade the site architecture was introduced to the PayPal platform on Friday morning. The code worked well when tested and during the first hours of launch. Unfortunately, problems handling peak levels of traffic developed later in the day that created intermittent availability and errors for members. These problems have continued in varying degrees since Friday.

Account data and personal information have not been compromised by these issues. eBay and PayPal technical teams are working at full force to fix the underlying problems and improve site access.

We will continue to update you on the status of this situation.

Regards,
eBay
-----------------------------------------------
***PayPal Site Issues Update***
Date: 10/11/04 Time: 07:25:16 PM PDT

A technical problem with the PayPal platform has caused intermittent errors and availability for members attempting to use the PayPal site since Friday 10/8. Activities such as paying for ended eBay listings, using the Immediate Payment feature, using PayPal shipping functionality, and accessing account information have been intermittently available. Offline use of PayPal debit cards has also been impacted intermittently, and some members have been unable to use them.

eBay and PayPal are continuing to work to resolve these issues, and we will continue to update you. We understand the inconvenience this issue has caused for some members, and we appreciate your patience.

Regards
eBay
-----------------------------------------------
***PayPal Update***
Date: 10/11/04 Time: 02:22:06 PM PDT

We are aware that members may continue to experience intermittent errors while accessing the PayPal site or when attempting to pay for eBay items with PayPal. We are aware of the problem and are currently working on a solution.

We sincerely apologize for the inconvenience this may have caused and we appreciate your continued patience.
Regards,
eBay
-----------------------------------------------
***Errors Accessing PayPal***
Date: 10/11/04 Time: 08:55:00 AM PDT

Members may be experiencing intermittent errors while accessing the PayPal site or when attempting to pay for eBay items with PayPal. We are aware of the problem and are currently working on a solution.

We appreciate your patience at this time.

Regards,
eBay
-----------------------------------------------
***Resolved: Errors Accessing PayPal***
Date: 10/10/04 Time: 09:55:58 PM PDT

Recently members have experienced intermittent errors while accessing the PayPal site or when paying for eBay items with PayPal. We have identified the problem and corrected it.
Regards,
eBay
-----------------------------------------------
***Errors Accessing PayPal***
Date: 10/10/04 Time: 06:36:49 PM PDT

Some members may currently be experiencing intermittent errors while accessing the PayPal site. We are aware of the problem and are working on a solution.

Regards,
eBay
-----------------------------------------------      [ BACK TO TOP ]