Facebook Back Online After System-Wide Outage

Users who attempted to access Facebook (NASDAQ:FB) on Friday were met not with their usual home screen or login page, but an error message instead.

The social network site suffered a brief outage in the early afternoon across both its web and mobile platforms, the cause of which was not immediately clear. When users visited the site online, they faced an error message that read “Sorry, something went wrong,” in some cases, while others saw a white “Connection error” page.

The site, which reassured users the company was working to fix the problem and restore service, appeared to come back online within about half an hour.

Facebook released a statement on the issue after the site was back up.

“Earlier this morning, some people had trouble accessing Facebook for a short time. We quickly investigated and have fully restored service for everyone. We’re sorry for the inconvenience," a spokesperson for the company said.

What Happened to Facebook?

On June 19, the site suffered an intermittent outage that lasted 30 minutes. Forbes reported that brief blackout that left the sites' 1.3 billion users in the cold cost the social star $500,000 in lost ad revenue.

Stuart Lipoff, IEEE fellow and president of IP Action Partners said Facebook is in a class of real time systems that all share a common database. Because of that, he said when a hardware, communications, or software failure strikes the common point at which those systems come together, it results in an outage.

What could prevent an outage like this for a website like Facebook is what the industry calls a failsoft system.

“Some of these real time systems are designed so that when there is a failure in this common point, they mitigate the outage by still allowing some functionality that can operate on the parts of the system that are still operating,” Lipoff said. “This is typically referred to as designing a “failsoft” system since making a real time system with a common point can never be made totally “failsafe.’”

In the case of Facebook, Lipoff said a failsafe kind of design would have allowed the website to continue to operate and instead of an outage, users would have experienced delays and trouble posting status updates or photos to the site.

“The key to Failsafe design is having more than one independent “hot” system with the end users distributed between these independent machines,” he said.

So what exactly caused the outage?

Lipoff noted there are a number of different things that might have gone wrong including something as simple as traffic overload or hardware failure. Other more complicated possibilities include a communications storm overloading the servers, or a software defect.

Shares of the social giant were 1.3% lower in afternoon trade.