mercoledì 7 maggio 2014

XenServer 6.2 and Ubuntu 14.04 LTS

I'm currently using XenServer 6.1/6.2 on my production servers to manage my VM. I mostly need Linux on server side and I choose Ubuntu Server as distribution for many reasons, that are out of scope of this post ;-)

Anyway.. anyone knows that for production servers running Ubuntu it's better to use an LTS release: 10.04, 12.04 and, from a bit less than one month, 14.04

Today I need to test a new service and I decide to start testing 14.04 release. I download the .iso and create the new VM inside XenCenter when.. wait! Of course there's no template for 14.04 but only for 12.04!
A fast search with Google point me to this Ubuntu forum topic, which has the right solution.
Well, there are two solutions.. one that requires a manual change to the Ubuntu installation process and one that requires a (small) change to XenServer tools
I've decided to use the first one, and here it is:

  • install Ubuntu as usual, but stop before installing grub
  • choose "Back" and, from the menu, choose open a shell
  • now enter the following commands
chroot /target
apt-get install grub
grub-install /dev/xvda
update-grub


  • the last command require a confirm for creating menu.lst
  • quit the shell by entering "exit" command twice
What we've done is simply reverting from grub2 to the "old" grub, which is correctly supported by pygrub provided with this version of XenServer.

Now you should be back to the installation menu. Skip the "Install grub" option and choose to skip bootloader installation. There will be a warning but don't worry, the installer doesn't know it but you already install the bootloader manually.
The installation procedure requires a few more steps to complete but you're done.
Once the last step is completed, the VM will reboot and Ubuntu 14.04 LTS will boot perfectly with XenServer 6.2

Please also note that now you need to install the guest utilities. Again 14.04 is not recognized by the install script but this can be easily bypassed by entering two commands:

sudo mount /dev/xvdd /media/cdrom
sudo dpkg -i /media/cdrom/Linux/xe-guest-utilities_6.1.0-1031_amd64.deb

That's all!

venerdì 8 novembre 2013

Windos XP updates, svchost and IE8

There's something really weird in WinXP updates world.
Here is my story.

I manage nearly 25 clients, more than half based on WinXP Pro and the other with Win7 Pro (mixed 32 and 64 bit environment).

One week ago, one XP user calls me saying that her PC was too slow to let her work. I go there to take a look and I found that there was a svchost.exe eating 100% CPU. Killing that process frees all the resources and the PC was usable again.
The day after, so on next reboot, the same happes: 100% CPU by svchost.exe and PC unusable. Killing that process solve the issue for the whole working day.
"Unfortunately" the user was smart enough to kill that process on her own every day.
The even "worst" thing comes from the fact that I had to travel for a few days and, at the same time, the "trouble" spread around: 2-3 other clients (physically close to the first) experience the same problem, with the "same" solution working.

After coming back home I decide to take a look closely to the problem.
I was a bit worried by some virus spreading around, but I was also thinking about the fact that I have a good antivirus on all client and my users can be trusted.

I had no idea in how to solve this issue: looking around by googling for "svchost.exe 100% CPU" is like googling for "news" and try to find the exactly news you're looking out for.

Fortunately the same problem happens to another user, well a skilled developer in R&D department. He find the same issue on its XPMode VM inside Win7Pro, and it solve it! How? "By installing windows updates.."

Mmmmhhhh...

Windows updates are automatically installed on all clients (apart on XPmode VM, which are managed directly by the developers) so this give me no so much hints.. but was enough for Google!
Adding  "windows update" to the other keyword give me this as first result:





http://answers.microsoft.com/en-us/windows/forum/windows_xp-windows_update/latest-windows-xp-update-and-svchostexe-problems/57ff2a95-3a9c-4e85-a879-b340c65acfa5

It takes only a couple of minutes to do some statistics about my WinXP client world:

  • all the PC affected by svchost.exe problem has IE6 installed
  • all the PC unaffected by svchost.exe problem has IE8 installed
Installing IE8 as described in MS answer above (or by downloading it from this direct link) solved the issue!


IE8 updates was one of the "optional" updates of WinXP: IOW user can say "no" to it and continue live happily with IE6 (we all use firefox a default browser)

giovedì 18 luglio 2013

Ubuntu 12.04, USB3 and Suspend

The only BIG problem I have on my Dell Vostro 3550 laptop with Ubuntu (I use only LTS version, so I'm still stuck with 12.04) is with USB3 ports.

By looking deeper to some details of the problem I've found that USB devices connected on USB3 does not work (they are not even detected) only after resuming from suspend.. and my laptop is always suspend and never shutdown!

I've googled around and found this interesting topic in askubuntu.com which, itself, links to this solution, regarding Asus K52 / Asus A52.

I adapt the suspend/resume script to my laptop as follows:

root@kimbamon:~# cat /etc/pm/sleep.d/20_dell_vostro_usb3_suspend
case "${1}" in
        hibernate|suspend)
              # Switch USB3 bus off
              echo -n "0000:0b:00.0" > /sys/bus/pci/drivers/xhci_hcd/unbind
        ;;
        resume|thaw)
              # Switch USB3 buses on
              echo -n "0000:0b:00.0" > /sys/bus/pci/drivers/xhci_hcd/bind
        ;;
esac

After editing the file with your favorite editor, don't forget  to make it executable

chmod +x /etc/pm/sleep.d/dell_vostro_usb3_suspend

As far as I test it, this solve my USB3 issue and now USB3 peripheral works even after resuming from suspend

Please note that, in case you need it, you can also run the script manually (as root, of course) to simulate a suspend/resume sequence and, thus, reset the USB3 bus


lunedì 3 dicembre 2012

Don't let Ubuntu kill your laptop HDD!

A simple question: which "life style" make your (laptop) hard disk last longer?
  1. keep it running all of its life. No matter if you need to read it or not
  2. turn it off (well.. spindown) as soon as possible. Turn it on again as needed
If you have choose 2) well.. be prepared. You are silently killing your HDD!
Spinup, in fact, is quite expensive for and harddrive and, by design, it has been tested only for a limited number of spindown-up sequence.
So, if you don't need to strictly preserve your laptop battery, please leave it turned on!

Why Ubuntu can be a HDD serial killer?
Because, by design, it spins down your drive quite often. Open a terminal and try the following command line

 sudo smartctl -a /dev/sda | grep -i "Start_Stop_Count\|Load_Cycle_Count\|Power_On_Hours"
  4 Start_Stop_Count        0x0012   097   097   000    Old_age   Always       -       5238
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       719
193 Load_Cycle_Count        0x0012   095   095   000    Old_age   Always       -       55359


As you can see, in my 1-year old laptop, I got nearly 720 hours of work but 5238 (!!!) power sequence and 55359 load cycle.
It's like I've turned on (or put in standby) my laptop 15 times at day, each day (even Christmas) for the last year!

How to change this?
I've found this solution, which is also useful for manage your battery consciously
  • install laptop-mode tools
sudo apt-get install laptop-mode-tools
  • edit /etc/laptop-mode/laptop-mode.conf with your preferred application ad look for BATT_HD_POWERMGMT settings. If it's 1 be prepared for early disk death!
  • change to a reasonable value (like 100, 254 is the highest while 255 disable power management at all). man hdparm  at -B switch will help you in understanding this value
  • save the file and type
sudo service laptop-mode reload

This will apply the changes.
Just to be sure, type

sudo hdparm -B /dev/sda

To check the value applied.

Long life to your laptop's HDD!

giovedì 16 agosto 2012

Synology NAS Disaster Recovery


Good Sysadmin should do a good work especially for the worst things.

One of the things that usually new Sysadmin understand only after they already happens is Disaster Recovery.

Sooner or later, no matter how you work well or how much you pay for your hardware, something will fails or go wrong.
It may be a disk or a power supply, that's not so bad: if you are just something better that a young student you always use at least RAID1 and redundant power supply. Just hot/cold plug your spare part (because.. you have spare part right under your desk, don't you?) and you're done, back to your desk.

What if something worst happens? Don't thing about fire or earthquakes: if you have only one site and everything burs or fall down, well.. no one will work anyway even if you can restore your mail server in just half an hour!

Let's say that you have a NAS where you store all your server data.
What if it's motherboard fails?
Well.. I know that is quite hard that a CPU or chipset die nowadays but it can happen.
In this way you have all your redundant power supply and RAID5+hot spare disks that cannot be accessible..

The easy and costly solution is to have two of them: plug the disk from one to the other, restore the configuration and you're done: 10 min of downtime!
But it costs too much for something that probably will never happens in 10 years.

I face this problem a few days ago, when I decide to replace most of my storage with a centralised NAS. A SAN costs too much for my business, so my hardware vendor suggest me a good NAS: a Synology RackStation RS2212RP+.
That's a 10 SATA bay NAS, double Gigabit Ethernet with aggregation, redundant power supply and some really cool additional feature.
One of the coolest is that... it's Linux based!!!
What can you do with a Linux based business-line NAS without hardware RAID controller, in case of disaster recovery?
Just substitute it with a standard Linux PC until you receive the new NAS!!

It's pretty easy, if you know at least a bit of Linux administration via command line, to replace (at least for basic feature) this NAS. It will have a 0$ impact on your (always too small) IT budget (for sure you have a spare PC with 2-3 SATA port somethere.. don't throw it away too soon!!) and you can recovery your data in less than half an hour (if you are prepared!)

In my setup I have:
  1. 4 HDD configured with RAID5 + Hot Spare. In this way 2 disk are enough to have a working RAID
  2. Over this RAID5 I build a disk group, which allow me to create more that one iSCSI LUN and/or more than one NFS/SAMBA shares
  3. I also create 1 LUN (and 1 iSCSI target) plus 1 volume shared via NFS  
This is a quite complex setup, and, probably, the most complex you can do with this Synology NAS. Less complex configuration (RAID1 or single volume instead disk group) will require less work.
This is what I've done when simulating a really bad hardware failure of my brand new NAS:
  • plug 2 of the NAS disk (not the hot spare, of course!) in an empty Linux box. I used a standard Dell desktop, without any disk.
  • put a Ubuntu 12.04 LTS Desktop USB dongle to boot from a Live distro
  • once the boot is complete I need to get some more packages, which are not installed by default in the desktop edition:
sudo apt-get install mdadm lvm2

That's because I'll need to work with software RAID and LVM.

  • Now, a bit of scanning to find the RAID device

mdadm --assemble --scan

The above command will scan physical disk to find an already created array. It will work automatically. With just 2 disk the array is degraded but data is accessible.

  • It's time for LVM! Fortunately Synology guys didn't create something weird and custom but uses all Linux power to create the required flexibility of their "Disk Group" feature. Scanning for LVM is something like what I've done for RAID
  1. sudo pvscan
  2. sudo vgscan
  3. sudo vgchange -a y 
  4. sudo lvscan
Which
  1. scan for physical volume (PV)
  2. scan  PV to look for volume group (VG). This operation take a bit of time, but no more than a minute on my 2TB disks
  3. enable finded VG
  4. scan VG to look for logical volume (LV)
LV are the end of LVM stuff, if you are new to LVM let's say that it's like having a standard disk partition.
If you have just create "volumes" (speaking in Synology term) to export via NFS/samba, you're done. Just mount the LV and access your data!

sudo mount /dev/vg1/nfs_share /mnt/nfs

If you want to access iSCSI data you have two choice:
  1. configure the iSCSI target on your Linux box to export LV and mount it on a client. This is a bit longer and it's outside the scope of this article. Take a look for example here for a really good tutorial.
  2. mount the partition locally.
You cannot, AFAIK, mount the LV directly if it has been uses as iSCSI Target. In fact the client (initiator, in iSCSI terms) will see the Target as a disk, so it will, at least, build a partition table on it. So, before mount, you must scan for a partition table and create the right block devices. Nowadays this is pretty easy:

sudo kpartx -a -v /dev/vg1/iscsi_0

The above command will scan the LV and, if a partition table is found, create the corresponding devices. In my simple test-bed I had only one partition, so I can just mount it to have my files back on-line

sudo mount /dev/mapper/iscsi_0p1 /mnt/iscsi

That's all folks!
Now, order a new Synology NAS and replace the unlucky one!


domenica 1 luglio 2012

When a RAID1 fails on your critical server

As I already say: harddisk fails. Even branded SCSI HDD 15krpm failes. Of course in the worst way, the worst day.

I was enjoin my Saturday afternoon at home with my 2 years old baby when I look at my "sysadmin" mobile (I have one just to have monitoring alarms and other notification)
One message: not so bad, today is the "full backup" day, so system will be heavily loaded when updating the backup DB, a warning about cpu usage is usual, but..

"*** PROBLEM Service Alert: xxxxx RAID status is CRITICAL"

xxxxx is my "main" server, which serves most of the non-development services: DNS, ERP, DB server, wiki, administrative storage..
I connect immediately to my monitoring system via web browser to know more details about the failure. The RAID1 that fails is md0.. CRITICAL means that one of it's device has gone offline.
Of couse, md0 is the root device.. even worst, of the two hdd that compose my failed RAID1, the one that fails also have a partition that's used as swap.
What does this means? Filesystem is ok (RAID1 is degraded by functional) but processes goes crazy due the fact that cannot access their memory, if it's swapped-out.
Of course the main processes (oracle, mysql, apache, java) have some pages swapped, so the are locked.
But Linux is strength and I can access the server via SSH and do some useful things to prevent more system corruption. I kill some CPU intensive processes, remount all fs as read-only and, finally, try to reboot.
Well.. reboot is not working. Even init has some pages on swap, and it can't do it's work.
I got to go into the server room.
By pressing the power button I was able to turn off the failed server.

Luckily I got some used spare part (the server is pretty old, it's hard to find new spare part and they cost too much): two SCSI disks (larger than the one that fails, fortunately) are perfect.
After thinking about substituting the failed harddisk (which, of course, also holds GRUB MBR) I choose a different way.
The failed harddisk is not completely broken, it just fails a few SCSI transaction (probably due heavy swap usage) and SCSI stack kick it out of its stack.
To me I can still use the disk, at least for boot and nothing more. So I add the spare HDD to an empty slot, turn on the server and cross my fingers.
Everything boots fine! yeah!
After that I partition the spare HDD pretty much like the failed one, plus a bigger swap space and:

sudo swapon /dev/sde2 #the spare part
sudo swapoff /dev/sda2 #the failed one

A bit of change of /etc/fstab to apply the settings on next reboot.
I also add the spare part to the RAID1:

sudo mdadm --manage /dev/md --add /dev/sde2

Now cat /proc/mdadm says that it's rebuild the array, the CRITICAL state now says WARNING.
After half an hour of reconstruction, WARNING turns into OK.
For sure I'll have to do some maintenance on this server, but not before enjoin the rest of Saturday an the whole Sunday!!!

At the end I was lucky: my monitoring system works well, my used spare parts are useful and Linux so well structured that a major issue like this has been resolved before my pizza becomes cold but a few things has to be reminded:

  1. always has some spare part for your critical server
    • harddisk of the same technology (SATA, SCSI, SAS) at least bigger that the one you are using but still compatible with your hw/sw stack
    • power supply (specially if it's not ATX compatible)
    • RAM, even if it's not so critical. Usually a server can work without some memory bank, but it's better to have
    • the best is to have a perfect clone of you working machine, turned off and ready to turn on or be sacrifice to give spare part to the working one. This cannot usually be done (due budget limits) but can be done easily for old server and used hardware (e.g. on ebay). Be sure to heavily test used hardware you purchase, before says that it can be used for spare parts!
  2. always have a monitoring system. It's better to know that something fails Saturday evening, when nearly no one is working, that knowing on Monday morning after the first users notice that "there's something wrong"
  3. nearly everything, in hw and sw, should be redundant. It's not so good to have RAID1 for file system, when a failure on swap device hang your server! Also bootloader should be redundant: if you have mirrored the boot device (which usually has the root file system) you should also mirror bootloader (e.g. grub) installation on both mirror device
  4. be prepared and check periodically your monitoring, recovery system and spare parts

domenica 24 giugno 2012

How to re-install a WinXP downgrade from Seven (in 2012)

Harddisk fails, sometimes without any warning they just "die". Yesterday everything was fine and today, when you turn on yor workstation, is simply does not boot.
Fortunately, this time the user was smart and recognize that something was wrong: he report to me that his workstation was pretty slot, taking longer time to boot everyday. This slowness seems to be cause by huge activity on disk. Well.. WinXP (as all windows version) have the strange habit that they become slower when time passes.. this is ok if you're a human a get closes to 60 but a bit weird (IMOH) if you are a 2/3 years old PC.
Having smartmontools on my Sysadmin Swiss Knife USB key, I just run "smartctl -t short sda" (this is a WinXP machine, but smartmontool are available for Win too and the work pretty well there too). The tool reports and error on a sector: usually this means that the drive will fail soon.
Fortunately (again) I just buy a spare SATA disk a few days before: by speaking with the user and his manager, we decide to re-install the whole system on the brand new HDD ASAP.

Reinstalling WinXP should be pretty easy.. or not? Well.. this is a Win7Pro machine, one of the first we buy when WinXP was EOL and OEM does not sell WinXP anymore (does Canonical says that you cannot install Ubuntu 8.04 anymore????? I hate those politics on commercial software..)
We choose to downgrade it as WinXP Pro, due the fact that ALL our machine has WinXP on them, so why do I have to waste my sysadmin time in managing two Win version if I can manage only one? Why do my users have to learn another OS if the can work with a more comfortable one?

Let's go back to the install issue.. I cannot find the original CD provided with the PC, so I take another WinXP SP2 disk and install that.. When installer ask for Product Key I entered the one printed on PC chassis label.. it was wrong? Damn..
Looking a bit closes to the Product Key it says that it's a Win7 product.. Damn (again)
Finding solution for problem on closed source project it's harder that the same on FOSS.
After a bit of googling I just turn to the old fashion "phone call to the one how sell it". The answer was something like this "dont worry, enter a product key of another winxp instance".
mmmhhh..  "this doesn't broke the "another instance" installation activation?"
answer: "well.. it may, but usually it does not." - in my experience this is not true.. but someone of you my try this at home
me: "any other solution?"
answer: "enter the product key of the other instance but do not activate on-line. Choose to activate by phone" - in 2012?!?!?!?!?!!? arggggggggghhhhhhh!!! - " a recorded voice will ask you to enter the code displayed on screen. Do not enter anything, the voice will repeat a few times and later will turn you to an (human) operator. Tell the operator that you have an old instance of Win7 to downgrade to WinXP" - old because you cannot downgrade anymore, sic - "he will ask you the product code of Win7 and should generate a new key to activate WinXP correctly"

Damn.. this will take a lifetime.. I buy the license, I pay for it and now I have to waste a lot of time for activation process.. BTW Oracle does not do anything like that, even on its database license. They just kick your ass if they find you, but let you work without wasting time..
I did the whole process and enter a big (nearly 40, I think) number at the end of activation process.

So, if you have to re-install a Win7 to WinXP downgrade process as follows:

  1. install from any WinXP CD and enter any valid Product Key you have
  2. when activating the product DONT activate online, but do it on the phone
  3. when the recorded voice ask you to enter the number on screen, do NOT enter them. Even if recorded, the voice, sooner or later, will become bored about your inability of press phone digits. And will redirect you to a human operator
  4. the human operator is, of course, human, so you can speak with him/her and tell what you're trying to do
  5. if he/she says that downgrade is not possible in 2012, tell him/her that you're a sysadmin that's re-installing an old workstation: it HAVE to be a WinXP due policy restriction on your site. This should be enough to go ahead on activation process
  6. tell his/her your Win7 Product Key (which is a bit hard on phone.. but possible!)
  7. after a bit of working he/she will generate a new product key for WinXP and redirect you to the recorded voice (again!) that this time will just tell you the very-long-number to enter on-screen for activation
  8. you're done. Enjoy your eXPerience with licensed products!!!!

I'm a Linux sysadmin, but, of course, I have to work with Win too.
The Win lacks of a default decent command line shell is' already a big problem for me.
The lacks of a default decent REMOTE command line shell is a bigger problem for me.
The lacks of a default packet manager, capable of install/upgrade/update not only the OS but also all the installed software, is another problem for me.
But.. managing license and activation code is even worst. I bought the OS license (you have to do it: there's only a few OEM that give you the options of buy a workstation without OS!!) and now I cannot use them because MS just says that the product is EOL?

Cut the product support (which, BTW, I've never used), cut the bug/security hotfixes (which I do install, but without a good antivirus are meaningless) but, please, allow me to install a software that I already pay for easily!!!