Why Websites Go Down?
We can imagine how frustrating it is to see a well designed and developed website crash after all the time spent on it to deliver the perfect experience to your users.
It could be the most dreadful experience to get messages from your users, customers or managers -even while writing this blog post, just the thought of it was enough to give us the shivers. Affected customer experience, looking unprofessional, crossing SLA agreements, blames from senior management, lost money.. List is long -and is the reason for us building Uptimezen so that you’d:
- Learn it before anybody else - so that you can fix it faster.
- Know the reason behind - so that you would immediately know where to look and what to say.
Don’t worry as Uptimezen will support you throughout the hassle but these being said, let’s talk about why websites actually go down ?
Reasons could be many but we tried to cover the most common reasons behind why websites crash.
While choosing your hosting provider and the dedicated or shared server configuration, you’ve made some assumptions about the traffic that your site will get. Not only the bandwidth tier you’ve selected, based on the code running to serve your app or website, even CPU or RAM usage limits could be made given more load is received than your server could ever carry out. CPU, RAM or Bandwidth limit may expire given the load you are receiving -and one could have no idea without getting access to server via SSH or cpanel a-like CMS and checking each parameter one by one.
Traffic spike may occur due to many reasons, some of which could be due to “good reasons” such as your service going viral or overachieving inbound marketing campaign or “bad reasons” such as DDoS attacks, Botnets or Crawlers. For both cases you should consider deploying security solutions such as Intrusion Prevention Systems or Web Application Firewalls -this is for dealing with “bad reasons” and Load Balancing or High Availability or Auto-scaling Systems for giving your service a chance to adapt under heavy load. For instance, Uptimezen uses Kubernetes alongside with Docker CE to carry out excess traffic with multiple Load Balancers served over Geo-Balanced and Active-Active Server infrastructure. We will cover this in a future blog post so that it may enlighten the way for some of our readers. But long story short, you should never launch a website saying there won’t be any excess traffic -because whatever you serve, there will come a time when you see Traffic Spikes occur!
What to do if Traffic Spike Occurs:
- Check the traffic logs on your server and spot the sources of traffic
- Analyse whether these are attack events or legitimate user traffic
- If attack, update your firewall rules to block IP addresses spotted in your logs
- If legitimate traffic, consider deploying load balancing systems - if you are using a traditional hosting provider, get a backup and contact to increase resources
- Analyse which resource gets consumed the first, CPU, RAM, I/O, Storage or Bandwidth
Even if you are a coding superstar -say it due to bad QA, DevOps, no unit testing or lack of coffee, pressing deadlines- human beings may do mistakes, let’s accept it. And based on experience, this is the most likely reasons why websites crash. The reason for this case may come from anywhere between your databases to frontend code and may even extend to backend logic and algorithms that you employ. One would need to inspect every single bit to get ahold of the root cause of the issue. Key recommendation here is simple to say but hard to apply: TEST, TEST TEST! If you are a one man army designing, developing and deploying it would simply have you spend more time. But if you are from an engineering organization or startup, it is: Regression tests, UAT, a well documented and adopted Release Cycle, unit tests, branching, automation, DevOps.. Long story short it may even lead to update review your entire development approach.
Hosting Provider Issues
Hosting Providers advertise 99.9% uptime on their landing pages with aim of assuring their reliability. Putting aside the majority of players out there, some would literally fail to do so. Hardware problems, network outages, ISP related issues even natural disasters would have your Hosting Provider to have hard times to satisfy to cope with the SLA. And this would put you in a bad state, as your users wouldn’t care about the reason behind! As Uptimezen, we are in close co-operation with many of the players out there to help them help you cover the SLA set but before going ahead and choosing an Hosting Provider you may rely on following checklist:
- What is the SLA provided by my Hosting Provider?
- Is the SLA mentioned in the EULA?
- Upon request would my Hosting Provider provide me a copy of their SLA Agreement that is between them and me?
- Are there any known Websites/Apps using my Hosting Provider?
- What are the ratings given to my Hosting Provider in boards such as Capterra, G2Crowd and such?
- Does my Hosting Provider provide High Availability and Backup Services?
- Are there any Outage Mitigation Systems they employ in their infrastructure?
- How responsive is their Support Team?
The list could be extended even further but these would be enough to begin with. Before forget, always keep an eye on the Scheduled Maintenance mails coming from your Hosting Provider!
Overall Traffic Issues and ISP Outages
Well, even if you have everything perfectly setup, have 99,99% Uptime SLA with your Hosting Provider who never fails to deliver so, ISP that is literally connecting the internet with you website may have issues. There are fiber cables going beneath oceans to connect the continents’ data transactions so that a user in United Kingdom can access a website that sells event tickets in California, US. OSI Layer 1 -is the term employed by network engineers and without it there would simply be no internet! There have been many occasion around the World where internet got completely crippled and effects took days to mitigate. For interesting reading just search “2008 Submarine Cable Disruption” in Google. This is the case where would be nothing to than just sit and wait till your ISP fixes the issue. But there are actually some steps you can take to understand whether that is really the case or not -before going for the Hosting Provider or blaming the Code:
Run Trace Route to your website and confirm the network issue:
tracert mywebsite.com <or> tracepath mywebsite.com
Plugin & Extension Errors
If you are using Wordpress, Drupal, Joomla or Cpanel, plugins may give you hard time. Yes, it is way faster to launch your website and for sure is easier to handle it but you must be careful while making plugin selections because a problematic plugin can crash your website even though everything else is working perfectly. Best way to avoid this situation is to have a detailed inspection about the plugin you are willing to use. Read the comments, check the ratings, if possible install it into a staging environment and observe the performance and how it affects your test environment. Always keep backups of your plugin-enabled hosting services and consider backup services as the first plugin to install. Well, what happens if the backup plugin causes the outage? Well, you may even consider occasional manual SFTP backups just to make sure there will be a point where you can revert back to if every other option fails.
Domain Name Issues
We use DNS Servers to route users who are typing your Domain Name to the IP address of you servers. Namely, DNS is the Yellow Pages of the internet as it is impossible to memorize every target’s IP address that we are visiting hence comes into place DNS! As Recursive and Authoritative DNS are completely different things, failure in one may lead to website becoming unreachable. It could either be your visitor’s DNS or nameservers you are using to get your IP address routed - things to control is quite straigthforward:
- Try mxtoolbox.com or dnschecker.com to analyze whether your DNS records are correct and spread to all DNS servers around the globe.
- Try nslookup mywebsite.com in your command line to see whether you are getting the correct IP address for the domain name resolution. Even further you can try a dig command with the IP address to see whether you get the correct reverse lookup for your IP address.
Internet is not a safe place since Kevin David Mitnick got arrested for the first ever internet based hacking by wire fraud at 1995. Of course there is even a further history of such events but after that date Cybersecurity Industry started to emerge to create a billion dollar industry. Every day a new type of cyber attack unveils itself making the internet a non-safe place. One may think that his online assets won’t be hacked as it would not make any interest to a typical hacker but reality is not like that. There are even public crawlers who constantly search websites/apps with known vulnerabilities or weak credentials to list out the good targets for further attacks -this is called the Recon Stage of Attack. Check out following link for learning more about the Cybersecurity Kill Chain.
DDoS, Vulnerabilities, Backdoors, Malware, Amplification Attacks, Session Hijacking, Man-in-the-Middle attacks.. List is long but your website can be subject to them all. Below could be a good to-do list for coping with hackers lurking around your website:
- Check your server for known vulnerabilities with 3rd party Website Security Apps. Make sure your provider looks for OWASP Top 10 and notifies you if you have any
- Firewall is a must but you also need a WAF: Web Application Firewall to block application layer attacks occurring over tcp/80 - the port you can not block via your firewall as you would also be blocking legitimate traffic!
- Do your threat prevention search before deploying your website. Have a Malware Removal Tool identified and have it under your hands for times when such issues rise.
- Constantly check whether your website is Blacklisted! This would immediately say that your service has become part of a Botnet or even worse a Command and Control Server working for someone else!
- DDoS attacks are the “hit below the belt” of all cyber attacks as the ferocity of it could even take down entire internet. Github once got hit by a 1.35 Terabits per second attack in 2018 and collateral affect took down good portion of internet with it! First of all, consider employing an Intrusion Prevention System which could filter out amplification attacks. Second, ask your hosting provider whether they have an SLA with their ISP to increase bandwidth in such occurrences.
- Constantly get backups of your website! This is the cheapest yet most important action you must take as not if but once hackers hit you, you’d have a point in time to revert back into!
We hope this would be a good guide to start from somewhere - Uptimezen will be present at every single stage of what’s mentioned here as we grow our product, with the aim of giving you the Zen state of mind when it comes to your website!