Wednesday, November 09, 2011

My experiences with VPS.net


I have used a handful of hosting providers over the past 10 years. Most have been average - a couple have been great and a couple have been awful.

Recently I have been using VPS.net as my primary hosting provider. Using a Linux VPS is not for everyone because it places the entire burden (and it is a burden) of managing the system on you the owner. However I have had enough training and experience in the Linux ecosystem that I felt confident I could keep up with what was required. (I have been actively using Linux since 1995 and I am not referring to KDE or Gnome or even X but rather to the shell)

So what can you expect from VPS.net as a hosting provider - there are a lot of people recommending them. There are a couple of good reasons for that - they offer a nice bundle of services and quite a few discounts of various kinds. Under regular usage they do perform fairly well although they appear have a few more outages than some of the more established providers. The proof of any service provider however is in how they handle problems. Do they really care about their customers or are they only in business to make money. (This is an important thing to keep in mind when reading recommendations for hosting providers as well.)

Here is my experience (and why I am not able to recommend VPS.net to anyone even though I want to).
I have been a customer for a while (a little over a year) and while I am not a particularly large customer I do manage over 30 domains which together often pull over one thousand visitors per day (in the past 30 days I have seen over 30k unique visitors). I also want to say up front that I am not the most technically competent hosting provider. I am pretty good at what I do, I rarely find a problem I am not able to trace and eventually fix, however being a systems administrator is not my only hat so it can take me a while to track down the cause of a problem and I know there are mistakes that I make and things I overlook. If anyone reading this post has suggestions about things I could have done differently please let me know as I am always trying to improve.

About a month and a half ago my primary production server which hosts about 30 domains crashed. Well, not quite crashed - the file system switched to Read Only mode due to some errors and several files were corrupted. I had to reboot to single user mode and run fsck before I could continue using the server. That took about 45 minutes during which time all the websites hosted on this server were not available. After checking the logs I saw some disk errors so I contacted support and asked them to check their logs to see if they could suggest the cause of the problem (if it is on my end I want to fix it). They reported back that there were no errors which is puzzling - first of all I am running a VPS that is running on Xen - the disk in question is a SAN - between these two points you will notice that the system does not run on a literal disk but instead disk access is virtualized so any disk errors will be related to the infrastructure. These errors then will be occurring on a piece that is outside of my control and in theory should be logging errors inside Xen or the SAN (or probably both).

Within two days I had another crash identical to the first - by this point I had tracked the problem down to a specific server application and website application - MySQL was initiating the crash, and the queries were originating at a copy of Simple Machines Forum. This still didn't explain the I/O errors but at least I had something to go on. So although this server instance is not "full" I paid for another two nodes - setup a brand new server instance and moved that one site to the new server. It is significant to note that since this move the server that was originally affected has not had any problems so the problem is most definitely related to that application instance.

Now with a new server I started working with support more aggressively. I was seeing the system enter R/O mode as often as every other day (once even 2 times in one day). When I would open a ticket support would take a look - tweak something (I suppose) and tell me it should be fixed - often by the time the intermittent problem returned the ticket would be old enough that they would have it closed so I ended up opening 3 tickets for the same problem (apparently you cannot re-open a ticket). When I would open a new ticket I would reference the old ticket but it seems that did not do any good because I did have to re-explain details previously covered several times. Eventually they admitted there was a problem and "moved me to a quieter part of the network". Since then there have been two more crashes but it has been two weeks now since the last problem so I suspect things are stabilized. In the mean time I have had a production server up and down repeatedly, I have had to refund my customer his hosting fees because the service I was able to provide him was poor. And finally to add insult to injury I missed a sale on new nodes. They did not post a close date on the sale, but with all the problems I was having I did not want to buy nodes until I saw that they could fix their problem.

Everyone makes mistakes, and everyone deserves a second chance. My Technical experience was awful but my customer service experience was not necessarily over so I decided to contact management at VPS.net and see what they had to say. Their response was interesting
1.     I missed the sale end date (which date was kept a secret) so that was that - no sale for me
2.     The outage I experienced that was because of a SAN problem that they knew about but denied (which information I learned after I read in the forums) was my problem - I do not qualify for a refund or any kind of discount (I did get a weak apology but they are unwilling to put money where their mouth is)
3.     They are replacing some of their software (with OnApp) which will "reduce problems in the future" (however my complaint is not so much with the technical problems or even with the financial but almost entirely with the way they have treated me - claiming known issues were my fault - dragging their feet on any kind of fix - making me open more and more tickets)
I would have to say that while they are eager to acquire new customers, they have little interest in taking care of existing customers. Customer service is 90% of the customer experience - claiming that things will improve in the future will not prevent me from going out of business today because of their mess, and they have left me feeling very much that they do not care if they lose me as a customer - they will just get another one that is not so demanding.

In conclusion I still like some things about VPS.net and I dislike some things. Their customer service leaves a lot to be desired and this is a concern because they are not addressing the issue. They say they are addressing the issue – by hiring more people – but I used to work in customer service (actually because I own and operate a business – I am very much in the customer service business right now) so I speak from experience when I say that hiring new people impacts response time – it does not change the service offered. Service is a personal thing – implemented one rep and one customer at a time.

So I find myself open to the possibility of another provider – I gave my current provider multiple opportunities to either work with me or provide assurances that they were working towards being able to and the assurances I received were not reassuring. I do need VPS level access although I could make do without cloud scalability it is a nice feature and it is hard to go down the feature ladder. I have heard very good things about RackSpace and have even worked on a server hosted with them a couple of times – they are next on my list to check out. I am actively following the new OpenStack initiative and on top of that two of my friends even work for them so this is probably well past due.