Advertise on Bikeforums.net



User Tag List

Results 1 to 14 of 14
  1. #1
    Crazyguyonabike
    Join Date
    Nov 2003
    Location
    Albany, OR
    My Bikes
    Co-Motion Divide
    Posts
    586
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Crazyguyonabike down

    I woke up this morning to find the server completely down - no ssh even. I had to log in via the remote KVM to see that the server had been rebooted and was needing a manual file system check, which takes a while with ext2 (I use that rather than ext3 for speed). I called Chi Networks (our colo hosting company) and was told that there was a power outage at the XO datacenter last night, so a lot of stuff went down. He's still waiting for an explanation from XO Networks. This sort of thing should never happen, they have a UPS in there the size of a small bus. Anyway, I'm busy doing the filesystem check and then I'll have to check a bunch of other stuff to make sure it's ok (database integrity etc) as you do after a hard crash. This is just to let people know what's going on; hopefully we'll be back up as soon as possible.

    Neil

  2. #2
    Senior Member
    Join Date
    Jul 2008
    Posts
    2,133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the update. I was in the middle of reading about Greg White's 2004 tour. Good stuff, but I'll have to wait to finish it.

    Good luck with the restore. If you need any help, let me know. I'm a linux system administrator with over 10 years experience. I'd happily donate my time to get your site back up.

  3. #3
    Crazyguyonabike
    Join Date
    Nov 2003
    Location
    Albany, OR
    My Bikes
    Co-Motion Divide
    Posts
    586
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Jeff, I think we're back up now. Bad news is that the little external USB drive which I have on there for local backup isn't registering with the system any more. It's like it's just not there. I'm asking the datacenter guys to eyeball it to make sure it's physically present, if so then I guess I'll have to have them send it back to me so I can test it. Rats, these things always seem to happen together, don't they. Sigh.

    Anyway, the site should be up again now, barring more datacenter outages.

    Neil

  4. #4
    Senior Member
    Join Date
    Sep 2005
    Location
    Hollister, CA
    My Bikes
    Bianchi San Jose, Mercian King of Mercia
    Posts
    455
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Not quite yet

    Hi Neil,

    You're still not up from my end. Best of luck with the trouble-shooting. I only wish I could contribute something useful.

    Mark

  5. #5
    Crazyguyonabike
    Join Date
    Nov 2003
    Location
    Albany, OR
    My Bikes
    Co-Motion Divide
    Posts
    586
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I had to bring the server down again in order to add indexing and journaling to the filesystem (ext2 -> ext3 for geeks, plus dir_index). On reboot the filesystem has to be checked again, which takes a wee while. Sorry about that. Hopefully the ext3 system should be a bit more resistant to losing files after hard crashes - it seems that we lost yesterday's log file for crazyguyonabike, which is surprising to me and kind of a bummer. I knew ext2 was less robust than ext3, but I didn't realize you could lose whole files like that. Oh well, you live and learn.

    We should be back up shortly, sorry for the delay...

    Neil

  6. #6
    Senior Member
    Join Date
    Jul 2008
    Posts
    2,133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Neil, you may have had filesystem corruption on the USB drive too. In that case, your drive would fail to mount.

    To see if it detects a USB drive at all, you can run "lsusb -v" or "dmesg" (as root). With dmesg, you'll want to look for lines with "usb-storage" and the next dozen or so lines afterwards.

    If the drive is there, you can see if the partitions are in order by running "fdisk -l /dev/sdX" (replace sdX with your USB's device name like sdb, sdc, etc...).

    If the partitions look good, then try running a file system check on it. You can run "fsck /dev/sdX1" (again, replace sdX1 with your device name and partition number like sdb1, sdc1, etc...).

    Good luck!

  7. #7
    Hooked on Touring
    Join Date
    Mar 2004
    Posts
    2,140
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanx, Neil.

    Not to mention - - -
    That a lot of folks out here in blogland think you are incredible!

  8. #8
    Crazyguyonabike
    Join Date
    Nov 2003
    Location
    Albany, OR
    My Bikes
    Co-Motion Divide
    Posts
    586
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Jeff,

    I don't know how to interpret all the output from lsusb -v, but I had already done a simple lsusb and dmesg, and nothing comes up. The lsusb doesn't list any drives attached at all, and dmesg doesn't have anything mentioning /dev/sda, which is what the drive should come up as. So it's like it's not even plugged in. At this point, not being next to the machine, I'm not sure what to do about this except get the techs to unplug the drive and send it back to me... or maybe, if they are up for it, trying to plug it into one of their linux boxes and reformatting if it's not completely fried. But as it never even registers, I'm thinking this must be a more severe problem than corrupt files - the device itself is just never appearing. Maybe there was some kind of power surge or spike when the event happened last night, and maybe somehow that got transmitted through to the poor little USB drive. I mean, it's been working fine up to now, and it seems a bit of a coincidence that we should have this major power outage, and right after that the USB drive dies. Not sure what else to try except getting them to send the thing home to me and maybe get an RMA from NewEgg for an exchange - maybe it was just a bad drive, it does happen. Bummer either way. I'm open to other suggestions...

    Thanks,

    Neil

  9. #9
    Senior Member
    Join Date
    Jul 2008
    Posts
    2,133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmmm, that does sound bad. If the usb drivers are loaded in the kernel (you can use "lsmod | grep usb" to check), the drive hasn't been unplugged, and it has power, then the likely conclusion is that your drive is dead (as you've already guessed). If you're lucky, the hard drive itself may be okay and it's only the enclosure's circuit board that got fried. In that case, you could try putting the drive in another enclosure and (hopefully) get access to your backups again.

    Jeff

  10. #10
    Crazyguyonabike
    Join Date
    Nov 2003
    Location
    Albany, OR
    My Bikes
    Co-Motion Divide
    Posts
    586
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good news - the drive came back to life after a datacenter tech unplugged it and plugged it back in again. Now it registers again just fine as /dev/sda1 and I can mount etc. I'm guessing it got into some kind of weird state as a result of the power outage, and even with a reboot, the power to the USB was maybe never completely cut off. If this happens again, maybe I'll try a complete power down via the remote KVM interface in order to "reboot" the USB drive out of its funk. Strange stuff all round.

  11. #11
    Senior Member
    Join Date
    Jul 2008
    Posts
    2,133
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That's great, Neil! Thanks for getting the site back up. I was able to finish reading Greg White's "Leave of Absence" tour.

  12. #12
    Senior Member
    Join Date
    Aug 2007
    Location
    Delaware, OH
    My Bikes
    Giant OCR2, Puegeot Altitude 21 MTB
    Posts
    166
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by NeilGunton View Post
    This sort of thing should never happen, they have a UPS in there the size of a small bus. Anyway, I'm busy doing the filesystem check and then I'll have to check a bunch of other stuff to make sure it's ok (database integrity etc) as you do after a hard crash. This is just to let people know what's going on; hopefully we'll be back up as soon as possible.

    Neil
    You are correct, a power outage at a data center is something that must never happened! If they do not give an acceptable reason why this happened, you may best be served by finding a different center to host the site.

    I used to work for a company that developed and manufactured UPS equipment for data centers. While there are exceptions, the majority of times where there is a loss of power to the servers(called "dropping the load"), it is caused by the owner/operator.

  13. #13
    cyclopath vik's Avatar
    Join Date
    Apr 2006
    Location
    Victoria, BC
    My Bikes
    Surly Krampus, Surly Straggler, Pivot Mach 6, Bike Friday Tikit, Bike Friday Tandem, Santa Cruz Nomad
    Posts
    5,237
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Neil I've been following the CGOAB down threads out of interest sake. Frankly I barely comprehend what you are talking about specifically although I do understand the larger issues that you are dealing with to keep the site running reliably. In any case I have a new respect for the work that goes on behind the scenes at CGOAB.

    This summer I've met at least 10 groups of cycle tourists and in every conversation CGOAB has come up at some point. Either they had a journal there, found info for planning their trip on your site or recommended a journal to me that was interesting. It has become a funny common thread across the cycle tourist culture....kind of like "regular folks" meeting at the water cooler and talking about what happened this week on "Friends"...=-)
    safe riding - Vik
    VikApproved

  14. #14
    Crazyguyonabike
    Join Date
    Nov 2003
    Location
    Albany, OR
    My Bikes
    Co-Motion Divide
    Posts
    586
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ebrady View Post
    You are correct, a power outage at a data center is something that must never happened! If they do not give an acceptable reason why this happened, you may best be served by finding a different center to host the site.

    I used to work for a company that developed and manufactured UPS equipment for data centers. While there are exceptions, the majority of times where there is a loss of power to the servers(called "dropping the load"), it is caused by the owner/operator.
    I have been told by Chi Networks that XO Networks (they run the datacenter) was doing a failover from the main UPS to the backup, when the switch failed, causing the blackout. It wasn't clear if this was a planned failover test, or if they were just doing planned maintenance on the UPS and failed during the switchover then. In any case, it was basically a bad piece of equipment, a case of the system that provides failover itself failing. I guess that kind of thing happens occasionally, and is kind of hard to plan for... I'm not an expert in commercial grade UPS, but a bad switch is a bad switch, and if it fails during failover then I guess you're screwed.

    I am in the process of setting up a dedicated server with iWeb, who are based in Montreal CA. They seemed to have some pretty reasonable prices - a larger hard drive was a primary requirement, and many hosting services have fairly anemic offerings in that department, only going to bigger drives with correspondingly more CPU and RAM etc, which we don't really need. Anyway, they have a package of 2.4 GHz processor, 1 GB RAM, 320 GB SATA drive, 10 Mb port (upgradable to 100Mb for about $10), and 3000 GB transfer per month for $69, with $49 setup fee (I could have waived that if I signed up for 12 month contract, but I feel more comfortable paying up front and being able to cancel month-to-month if I want to). I was originally just looking for a box that I could set up as a MySQL slave and backup image repository, but with this server I could probably actually set it up as a warm spare in case my server completely blows up. They are setting it up with a basic install of Debian Etch, which I'll upgrade to Lenny to match the current server, then I'll build all the software so that it's the same as our current server. It can then be our backup while I am on the road over the next couple of weeks (as we move to Oregon from St Louis). It will feel good to know there is some kind of backup server while my own workstation is offline (usually it acts as the slave and image backup). I also have two external drives here at home which are rsync backed up nightly, but that wouldn't be happening on the road either. With RAID0, there is a bigger chance of a hard drive crash bringing us down completely, so it will be good to have a backup server available. Not sure how long it would take to actually get it up and running as the production, but at least it'll have all the data.

    Sorry to clog up the touring forum with this stuff, apologies to anyone who's not interested in the crazyguyonabike soap opera!

    Neil

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •