VirtualQube
HOME  |  CLOUD SOLUTIONS  |   PRODUCTS & SERVICES  |  NEWS & EVENTS  |   ABOUT US  |  BLOG  |  PARTNERS

VirtualQube Blog

Author Archives: Sparlee

A First Look At XenClient

If you’ve following our blog for a while, you know that XenClient is the new client-side hypervisor from Citrix. It’s purpose is to allow you to take your virtual desktop with you and still have an elegant way to keep it up to date and to synch your important documents. We’ve been testing the “Release Candidate” that Citrix recently made available as a public beta.

Even though it is obviously not finished code, it’s pretty impressive!

Our Dell Latitude demo system is configured with two VMs – one Windows 7 and the other Windows XP. Further I have Access 2003 installed on the XP image and Access 2007 installed on the Win7 image and I’m “passing through” Access 2003 from the XP VM to the Win7 VM. In other words, I can “publish” an application from one desktop – in this case, I’m publishing Access 2003 from the XP desktop – and “subscribe” to it from the other desktop. In practice, this is similar in appearance to how a XenApp published application looks when it runs on the client device.

There are a couple of advantages to this. The obvious one is that an application that won’t run on Win7 can be installed on the XP desktop and made available to the Win7 desktop. A more subtle advantage is in the area of security. For example, let’s assume that the XP desktop is your “business desktop,” and is locked down such that the user has no administrative rights. Let’s further assume that the Win7 desktop is your “personal desktop,” and you have the rights to do whatever you want with it – which could include getting infected with malware. But the applications running on the business desktop cannot be affected by malware on the personal desktop – even if they’re being passed through.

In an earlier blog post, we linked to a Citrix TV video that demonstrated this “secure application sharing.” In that video, they’ve deliberately infected one desktop with a keylogger. You can see that any interaction with a browser running on that desktop is being logged by the keylogger. However, a browser session that is running on the other desktop, but being passed through to the infected desktop, is immune to the keylogger. Pretty cool.

With regards to functionality, I’m very hopeful that Citrix will fix some of the issues we’ve seen in the RC. Here are some of the things we’ve seen reported on the Citrix on-line forums, some of which we’ve seen ourselves:

  • Many people are finding hardware problems with simple devices such as mice even for hardware on the Hardware Compatibility List. Smart cards are also an issue.
  • XenClient requires that a few different Virtualization technologies be present in order to function correctly, so today the HCL is pretty limited. This should be improving each day but it is still something to watch out for so be sure to check the HCL carefully. There is an HCL included with the XenClient 1.0 RC User Guide.
  • HDX (High Definition) video/audio:
    • If you run both a corporate Desktop and a Personal desktop at the same time, only one VM can have HDX running at a time – and to switch HDX functionality between VMs you have to shut them down…it cannot be done on the fly. This is unfortunate because without HDX, video is really choppy and difficult to watch. Citrix has already said this will not change before RTM (Release to Manufacturing).
    • If you are taking advantage of the feature we described earlier where you publish an application from one desktop and subscribe to it from the other, you can have HDX running in the subscribing desktop, but not in the publishing desktop.
  • We’ve not yet been able to do a successful physical-to-virtual (“P2V”) migration of a desktop OS into the XenClient environment. Citrix has said it will release a version of XenConvert that will be able to do this, but they say it probably won’t be until after RTM.
  • Integrated video cams do not work. This could be a significant issue, since the product is aimed at “road warriors” and many of them will want to use a cam for meeting. It supposedly supports USB video cams, but we have not yet tested this. However, I’m concerned that many users will push back on having to carry an extra peripheral with them. We’ve been told by Citrix that this should be working by RTM.
  • OS Snapshots are not available yet but should be in a future release.
  • No support for 64 bit guests yet.
  • Graphic support for non-Intel graphic chip sets is limited.

Still, this is shaping up to be a great product that will make life easier for many a desktop administrator. If you’ve ever had to manage desktops, you’ve had to deal with this “Catch-22:”

  1. My users are breaking their desktops…I need to lock them down.
  2. When I lock them down, I end up with managers in my face because they can’t install their favorite (fill in the blank).
  3. I back off and give them local admin rights so they can install (fill in the blank).
  4. Return to Step 1, repeat ad nauseum.

XenClient gives us a glimmer of hope that we may be able, sometime soon, to break out of this cycle!

Citrix Branch Repeater VPX Licensing Tutorial

I recently implemented both the new Citrix Access Gateway (CAG) VPX and the Branch Repeater VPX within our development lab. Both are “virtual appliances” designed to run directly on a XenServer host. Both are impressive products and work great - in fact, we can use “live motion” to move the CAG between XenServers while running video in a XenDesktop session with not even a pause in the video playback. The CAG moves with no interruption in service. NONE!

But this isn’t just a post to sing the praises of the virtual appliances. Rather, it’s about LICENSING!!! Specifically, licensing the Branch Repeater VPX.

As with many Citrix products, obtaining the license and getting it properly installed is not necessarily easy and intuitive…and in many cases (particularly with new products), we’ve found that the Citrix licensing support team does not know all the ins and outs of licensing a specific product either. That is not intended as a slam on this team. They do the best they can – but Citrix is a big company now, and sometimes it takes a while for information on new products to filter down to the front-line troops. In this case they worked with me for quite some time until we got this figured out (so there is at least one guy on the Citrix support team who now knows how this works).

So…now that I’ve gone through the pain, I thought I’d try to spare you from it if I can. (You’re welcome.)

One complication you’ll encounter is that, depending upon what you’re attempting to accomplish, these appliances may require one license or two. For example, with the CAG, if you are only going to use it for running secured sessions to a web interface (the equivalent of the legacy Citrix Secure Gateway) then you only need a “platform license.” However, if you also plan to run SSL VPN sessions though the CAG, you will need Access Gateway Universal licenses for your users, which will be rolled into a second license file.

Access Gateway licensing isn’t new and it’s pretty well understood. But what about the Branch Repeater? Just as with the CAG, the Branch Repeater may require one license or two, depending upon the functionality you need. If you are going to use the Branch Repeater VPX to connect to another (physical or virtual) Branch Repeater then you only need a platform license. However, if you want to take advantage of its ability to support client PCs that use the Branch Repeater Plug-in, you will need a second license to enable that feature. So we finally come to the topic of this post: how do you get the license file(s) onto your new Branch Repeater VPX?

First, you must log onto the “MyCitrix” web site with your account credentials, and access the Licensing Tool Box to activate and allocate the license. That part of the process is well documented, and if you’re a Citrix customer, you’ve probably done it at least once. The tricky part is what you have to do to download the VPX license file, what you need to enter in the Repeater itself, where to put it, and what you should see.

Here’s what we learned (NOTE: Click on any graphic to view full-sized):

  1. On the Branch Repeater VPX Web-based management interface, access the “Manage Licenses” screen, and in the right panel, choose “local” as shown below, and click the “Apply” button.

    License Server Configuration

  2. Then click on the “License Information” tab and you will see something similar to this next image. What you will need from this screen is the “Local License Server Host Id:” Write down this information - you will need it in the next step.

    Information Used for License Management

  3. Now you can download the license file from your “MyCitrix” portal. Save it to your PC, and make a note of where you saved it. As part of the process of downloading the license, you must enter the license server ID. Traditionally, you would enter the name of the Citrix license server in this field (and it was case-sensitive, which tripped up a lot of users). But in this case, the system is expecting the MAC address of the Branch Repeater VPX itself…which is what you just copied in Step 2. Another difference is that in the past the License Server Host Type was always set to “HostName.” However, there is now a drop down box with a second choice, “ETHERNET.” For the Branch Repeater VPX, you want to select “ETHERNET,” and then enter the host id that you wrote down in Step 2:

    Downloading the License File from MyCitrix


    In case you’re wondering, the MAC address we’re using is the address of the first interface on the Branch Repeater VPX, as displayed in XenCenter. If you want to find it in XenCenter click on the VM in the left column and then select the Network tab in the right window and you should see it there:

    XenCenter Display

  4. Now that you have your license downloaded to your local PC, you need to add it to your Branch Repeater. Access the “Local Licenses” tab and click the Add button (note that you will not see all the content in the window as shown here until you’ve added your license):

    Local Licenses Display


    After you click Add, this screen will appear and you will need to browse to the location where you saved your license file, and click the “Install” button:

    Add License


    Now the “Local Licenses” tab should be populated with content:

    Local Licenses Display


    Next, go to the “Licensed Features” tab. You should see your features listed as shown below:

    Licensed Features

  5. As mentioned earlier, if you plan to support client PCs that have the Branch Repeater Plug-in, you will need another license to enable this feature. Once again you will need to go to your MyCitrix portal and follow the same procedure as you did for your platform license to obtain the Plug-in license. Once you have the Plug-in license you will need to add it to the Virtual Appliance in the same manner as you added the platform license. Once that’s done, if you click the down arrow under “Local Licenses” you will see both licenses:

    Manage Licenses Screen


    Finally, if you click the “Licensed Features” tab, both licenses should show up with the number of licenses available:

    Licensed Features

This should be all you need to get the Branch Repeater VPX licensed. Now you just need to get it configured correctly… but that’s another blog post.

Looking For the Citrix Acceleration Client for Win 7?

We’ve been working with the new Branch Repeater VPX virtual appliance, which supports the Branch Repeater client plug-in (unlike the hardware Branch Repeater appliances).

Since Moose Logic is a Microsoft Gold Partner, and we like to keep up with the latest releases, most of us have been running Windows 7 for a while now. But when we went looking for a Win7-compatible Branch Repeater plug-in for the Citrix Receiver, we had a tough time finding it.

It does exist, though, and now that we’ve tracked it down, we though we’d share with you just where it’s hiding in case you’ve been searching too.

The first thing to note is that, when you go to the Citrix download site, and search for downloads by product, you will see that the “Citrix Branch Repeater” and the “Citrix Repeater (formerly WANScaler)” are listed separately - and, since products are listed in alphabetical order, they’re quite a ways apart in the list (click on graphic to view full-size):

Downloads by Product


If you choose “Citrix Branch Repeater,” which is what we initially did, since we were working with the Branch Repeater VPX, the latest plug-in you will see listed is v5.0.34, which is not Win7-compatible:

v5.0.34


So the secret is to choose “Citrix Repeater (formerly WANScaler)” from the product selection drop-down. Then you’ll see several later versions of the plug-in, including v5.5.2, which is Win7-compatible:

v5.5.2


Oh, and if anyone from Citrix is reading this: Please - just get rid of the plug-ins listed under “Citrix Branch Repeater,” or, better yet, either have a redirect, or a line that says “Please see ‘Citrix Repeater (formerly WANScaler)’ for Branch Repeater plug-ins.” It will make life much simpler for everyone. Thank you.

XenServer Host Is In Emergency Mode

It’s 8 pm on a Sunday evening, and I get a panicked call from a customer because he cannot connect to his XenServersTM via the XenCenterTM management tool. However, as near as he could tell, all of the hosted virtual machines were up and running and in a healthy state. He had unsuccessfully tried to point the XenCenter management tool at another member of the XenServer pool but was unsuccessful.

So what happened and how do you fix it?

This situation can happen for several reasons but generally it happens when there are only two servers in the XenServer pool, and the pool master suddenly fails. In essence, what happens is the surviving server (let’s just call it the “slave”) can no longer see its peer, the pool master, so it assumes it has been stranded and goes into emergency mode to protect its own VMs. There are other ways this can happen (an incorrectly configured pool with HA turned on for example), but this is the most common reason that I have personally experienced.

Depending upon the situation, you may not be able to ping the master server because it is actually down, or you may be able to ping the server but it is in an inconsistent, “locked up”, state such that it cannot answer requests to it. If you are able to connect to the console of the master server either directly with a monitor, keyboard, and mouse (the old fashioned way) or through a remote management interface (DRAC, ILO, ILOM, etc) the server may appear to be running, but you may not be able to do anything with it.

At this point you may be thinking, “This is no big deal - just reboot the machine and it will be fine.” If you are lucky that may actually solve the problem, but in many cases it will not. What you might see is that after the master reboots you will be able to connect to the master but you will not see the slave. Or it may be that your master is truly broken and you are not able to simply reboot it due to a system or hardware failure. But, of course, you’ve still got to get your pool online and working again regardless.

During this period of time, if you try to use a tool such as Putty to connect to the slave via its management interface, you may not be able to connect to it either. If you try to ping the slave on the management interface you may not get any replies. But if you connect to the console of the slave (again, either the physical console or via a remote management interface) you will probably see that the machine is running, but if you look at XSconsole it will appear that the management interface is gone because there will be no IP address showing. By now you’ll probably be scratching your head because the strange thing is all the VMs are running.

So at this point your master appears to be down, or at least impaired, you’ve got no management interface on the slave, your pool is broken and you cannot manage the VMs. So what do you do?

Well, if this happens to you and your VMs are still up and running the first thing you should do is take a deep breath, because more than likely it is not as bad as you might think. XenServer is a robust platform and if the infrastructure is built correctly (and I’m going to quote a customer), “you can really slam the things around and they still work”.

After you take a deep breath and let it out slowly, from the console of the slave server, you will need to access the command line and start by typing:

xe host-is-in-emergency-mode

If the server returns an answer of “True” then you’ve confirmed that the server has gone into emergency mode in order to protect itself and the VMs running on it. (If the server returns an answer of “False” then you can stop reading, because the rest of this post isn’t going to help you.)

Assuming you receive the answer of “True” the slave server is in emergency mode because it cannot see a master – either because the master is actually down, or because the management interface(s) is(are) not working. Therefore, the next step is to promote the slave to master to get it out of emergency mode. We do this by typing:

xe pool-emergency-transition-to-master

At this point the slave server should take over as the pool master and the management interface should be available again. Now if you type the xe host-is-in-emergency-mode command again you should get an answer of “False”.

Now, open XenCenter again. It will first try to connect to the server that was the master, but after it times out it will then attempt to connect to the new master server. Be patient, because eventually it will connect (it may take several seconds) and you will again see your pool and be able to manage your VM’s. If some of the VMs are down because they were on the server that failed you’ll be able to start them on the remaining server (assuming you have shared backend storage and sufficient processor and memory resources).

Now what about the master if it has totally failed? What do I do after I’ve fixed, say, a hardware problem in order to return it to my pool?

If the following two conditions are true:

  1. You are using shared storage so that your VMs are not stored on the XenServer local drives, and
  2. You have built your XenServers with HBAs (fiber or iSCSI) rather than using Open iSCSI, which means the connectivity information to your backend SAN will be stored within the HBA,

…then it may be much simpler and quicker just to reload the XenServer operating system. (If you do not have shared backend storage, which means your VMs are on local storage, DO NOT DO THIS). I can rebuild my XenServers from scratch in about 20 - 30 minutes and have them back in the pool and running.

If either of those two conditions is not true then, depending upon your situation, recovery may be significantly more difficult. It could be as simple as resetting your Open iSCSI settings and connecting back to your SAN (still easy but takes more time to accomplish) or it could be as painful as rebuilding your VMs because you lost your server drives. (OUCH!)

Real world example: I recently had a NIC fail on the motherboard of my master server. Of course since the NIC was on the motherboard it meant the whole motherboard had to be replaced which significantly modified the hardware configuration for that server.

In this case, when I brought that XenServer back online it still had all the information about the old NICs showing in XenCenter, plus it had all the new NICs from the new hardware. Yes I could have used some PIF forget commands to remove the NICs that no longer existed and reconfigure everything but that would have taken me a bit of time to straighten out. Since I had iSCSI HBAs attached to a Datacore SAN (great product, by the way) for shared storage, all I did was reload XenServer on that machine, modify the multipath-enabled.conf file (that is a different blog topic for another day), and rejoin the server to the pool. Because the HBAs already had all the iSCSI information saved in the card, the storage automatically reconnected all the LUNs, the network interfaces took the configuration of the pool, and I was back online and running in less than 30 minutes.

After you repair the machine that failed and get it back online, you may want it to once again be the master server. To do this type:

xe host-list

You will get a list of available servers with their UUID’s. Record the UUID of the server that you want to designate as the new master and then type:

xe pool-designate-new-master host-uuid=[the uuid of the host you want]

After you type this your pool will again disappear from XenCenter, but after about 20 – 30 seconds (be patient) it will reappear with the new server as the master. Your pool should now be healthy, and you should again be able to manage servers as normal.

Implementing the New Citrix XenDesktop 4 Licensing Model

A couple of days ago, while reviewing some of the blog posts here, I happened to read Sid’s post regarding Citrix’s new per device or per user licensing model for XenDesktop 4. That led, in a somewhat convoluted way, to this post, which will focus on how you would implement this new model.

Even though I already knew some of the changes that were being incorporated into this licensing model, as soon as I read his post I immediately asked myself how, exactly, from a technical standpoint that was going to work? You see, at that exact point in time I was actively working on upgrading our XenDesktop (XD) and Provisioning Server (PVS) lab to XD4 and PVS5.1 sp1, so this topic really interested me - for the simple reason that what Citrix says is supposed to happen was not what I was seeing in my lab. At that point I was already running XenDesktop 4.0 in my lab, and I’d done nothing to put any per user or per devices licenses in place (I do however still have my previous XD Platinum Licenses from my XD 3.0 build on my 11.6.1 license server), but everything worked and I was not getting any license errors. Strange, you say? I agree!

So, like any curious tech, I started what turned out to be a long and exhaustive search for information regarding how the new license model should be implemented. But after a few hours, and a few emails between Sid and I, I had unfortunately turned up nothing, zilch, nada! In fact, the only thing I could find from Citrix - and this is pretty much common knowledge at this point because lots of people have already blogged on the topic - is the set of XenDesktop 4 documents located in the new Citrix eDocs Library. However, if you actually plow through the XenDesktop 4 documents, you will discover that there is no information on how, from a technical standpoint, this new license model is supposed to be implemented.

During my search I did run across one (yes, only one) blog post which had some insight regarding how it will actually work. That blog post was by Helge Klein of Sepago, a Citrix partner in Cologne, Germany. In that post, Helge states, “If what I have been told is true, the current version of XenDesktop 4 has no licensing enforcement built in.” (emphasis added) Now that statement really got me interested, because that was consistent with what I was actually seeing in my lab, but could it really be true?

Again, my curiosity required that I had to verify this information one way or the other. So today I picked up the phone, called the Citrix XenDesktop support team, and asked, “How does it work?” The initial answer (which I actually expected and would have bet money on) was, “I don’t know!”

To the credit of the Citrix technical support person I had on the phone, he did not just let this drop. Rather, he kept digging and reviewing information until he finally turned up an “internal only” document - which, of course, he could not share with me. However, based upon what he was reading in that document, his answer - specifically regarding the named user model - was that any user who is supposed to be assigned a license will need to be placed into the OU that was created during the install of the Desktop Delivery Controller (DDC). My reaction was, “What? You guys are going to require a business to move their users from their current OU(s), which may have group policies being applied, and place them into the XD OU? That’s crazy, because businesses, especially larger enterprises, are going to laugh at us!” Then I asked, “Can we at least nest the OUs to maintain GPO and AD structure?” to which the answer again was, “I don’t know.”

Once again, to the credit of the Citrix support person, I was asked to hold and give him a few minutes so that he could go talk to the escalation team and get a definitive answer. When he returned he confirmed what he had told me about how it was supposed to work…however, he also confirmed that today in XenDesktop 4.0 there is no license enforcement mechanism coded into the product. Basically, the license enforcement is based upon the honor system and what is written in the EULA.

That’s not necessarily bad – it’s worked reasonably well for Microsoft for many years. And our experience over the years has been that nearly all businesses want to be legally licensed, and will comply with license requirements as long as (1) they understand what constitutes compliance (which hasn’t always been easy), and (2) they don’t feel like they’re being ripped off. But it’s certainly a bit unexpected, to say the least.

So, finally, I had the answer directly from the Citrix XenDesktop support team regarding how it is implemented, which left only one more question: When will license enforcement be implemented? The answer: “I don’t know!” So, until Citrix decides to shed some light on this for us, we’ll just live with EULA-enforced licensing.

One last thought: With all due respect, wouldn’t you think that Citrix would want to tell their own internal support people the details about something like this BEFORE the product actually launches? Maybe they didn’t want the world to know about the lack of license enforcement – but things like that always come out…it’s just a matter of time. (Pssst! Hey, Citrix – it’s not a secret anymore!)

Customizing XenServer’s HA Pool Timeout Settings

Recently I wrote a post about the hazards of XenServer HA and how to avoid a couple of different pitfalls which lead to XenServer fencing. In that post I talked about the necessity of correctly setting the HA heartbeat timeout for your environment so that your XenServers will allow enough time for a storage failover to occur. The idea, of course, is to prevent your XenServer from going into a “fence” condition which can occur for many reasons. The reason we’re discussing here is triggered when the XenServer believes its storage has suddenly become unavailable and it is not able to recover its state quickly enough to prevent the HA timeout from fencing the server.

I frequently build environments that use a pair of replicated DataCore SANmelody nodes (two physical nodes) and configure my XenServer in a multipath configuration. With this configuration my XenServers see two active paths to their storage (the status of the multipath is shown in the image below) - one path to each of the two nodes. If, for example, one of the SANmelody nodes goes off line, the other node will immediately take over. However, the XenServers have to be given enough time to fully recognize a failover has occurred, and the storage is still available, in order to avoid a fence. The default HA timeout in XenServer is 30 seconds which means if it takes a XenServer more than 30 seconds to realize the storage is still healthy and available then the server will fence. If the storage was indeed still available, then more than likely there were still VM guests up and running on the XenServer, which have now been taken offline unnecessarily.

To test and tune this setting I first make sure HA is enabled on the pool, then I perform hard failover tests where, using a DRAC or iLO card if I have one, I suddenly power cycle one of the storage servers and watch to see if any XenServers fence. I run this hard power cycle test because this specific problem never comes up with simple storage stops and restarts; rather it only shows up when a storage server actually goes down suddenly, or “hard,” as we say. So I run these tests because I want to stress the system to simulate unfortunate things like power failures, sudden server reboots due to gremlins, and other things along those lines. If nothing happens then great - let’s go home and we can sleep well knowing HA is working correctly. But what if you do have one or more servers which do fence because they believe their storage is gone when in fact it is not?

The last time I had this happen to me I had to test my environment several times, and with each successive run through the hard failover test I used a different timeout setting. In the end I found that 120 seconds worked best for me. (Keep in mind I am doing this during a build and there are no live production workloads running on any of these servers.)

So what is the downside of setting your timeout this high? Well, if a XenServer really fails (for whatever reason) it will take about 120 seconds for the Pool to decide there is a problem and then take action to restart the VMs elsewhere based upon available resources and the restart priority of each VM. Personally, I’d rather wait the 120 seconds when something has really gone wrong than suffer an unnecessary fence/shutdown when all the VMs were actually still running fine.

So how did I set the timeout values? Like this:

Rather than enable HA from the GUI you’re going to have to do it from a command line. I use PuTTY when I’m not actually at the XenServer console. The command you will use is xe pool-ha-enable heartbeat-sr-uuids=your uuid goes here ha-config:timeout=however many seconds you want.

But in that command string, how do you know what the sr-uuid is? The way I find it is to start with XenCenter and locate the SR (storage repository) which is going to be used for the heartbeat status disk. I locate the SCSI ID of that SR and copy the number as shown in this image (click picture to view full-size):

Finding the SCSI ID of a Storage Repository

Finding the SCSI ID of a Storage Repository


After I have that number I next connect to the master XenServer using PuTTY (the master XenServer in a pool is always the top server shown in XenCenter) and run this command xe pbd-list device-config=SCSIid: 360030d903131325f48415f4865617274 where the number in RED is the ID just copied from Xencenter:
Finding the sr-uuid

Finding the sr-uuid


What is shown above is what the output should look like. The reason you see three sequences in this example is because there are three hosts in this pool, notice the host-uuids are all different. However also notice the sr-uuid value is the same in each grouping and this is the number we are after. Take the sr-uuid you just found and enter it into a command like this: xe pool-ha-enable heartbeat-sr-uuids=7a213624-1209-c467-42ed-6ef72a1b7699 ha-config:timeout=120

It may take a bit of time for the command to actually complete but once it does you should be able to refresh your Xencenter by using either the xe-toolstack-restart or the service xapi restart command and then when you look at the pool level on the HA tab you should see that HA is now turned on:

Verify that HA is now turned on

Verify that HA is now turned on


As I said previously I found 120 seconds worked best for me - but how did I determine that? Simple: I started by setting the HA timeout to 60 seconds (twice the default) and then ran the hard shutdown test again. One of the XenServers still fenced so I went to 90 seconds, and then finally 120 seconds. The point at which the XenServers do not fence is where you want to stop. But don’t just do this test on one side of the storage! You will want to recover your storage servers and once everything is back online and healthy run the same test again - but this time hard-shutdown the other storage node. Now if none of the XenServers fence then you are done…unless you disable and re-enable HA. As I pointed out in that earlier post, this manual timeout setting is not persistent - if you disable and re-enable HA on the pool, you will have to re-enable it from the command line again to insure that the timeout is set correctly. If it’s done from the GUI, it will revert to the 30-second default.

Using XenApp Prep to Clone a Windows 2008 XenApp 5 Server

I have been cloning Citrix servers since the days of MetaFrame XP. Over the years I’ve done hundreds of systems and taught a number of people a process for cloning servers that has worked 100% of the time. Unfortunately that process required removing registry keys, running tools to change the SID, and “sterilizing” the image to get it ready to clone. Then once this was done you had to make a copy of the server (in the Bad Old Days we used Symantec Ghost - today we have better imaging tools, which we’ll discuss below), and then move that copy to either different hardware or to a virtualization platform. Then, after copying it, you had to reverse the whole process by adding back registry keys, changing the server name, joining the domain, and finally running “chfarm” (change farm) to join the machine back to the Citrix farm.

About a year and a half ago, Citrix came out with a tool called XenApp Prep, which takes the whole process down from about 30 minutes to just a couple of minutes (not including the amount of time to copy the files). With Windows 2008, the process is simple, and I’m going to tell you exactly how I clone an image. But before I start, I want to stress that, while the process is nearly the same for using XenApp Prep to make a V-Disk image for use with Provisioning Server, there are some slight differences, so be sure to read the “readme” file and the FAQ that come in the XenApp Prep zipped download.

Here are the high-level steps I use to create the server that I’m going to turn into a “Gold” image that I can then use as the source of my cloned image(s):

  1. First I install Windows Server 2008 and apply all critical OS patches and any optional patches I deem necessary to bring the server up to current standards. (Most IT shops have their own policies and standards for approving and applying patches, so your list may be different from mine.)
  2. Install any extra pieces that will be required by your application set: j#, .NET (whichever versions you need) with the appropriate SP, Java, etc.
  3. Turn on the required Terminal Services roles, and, if you are going to place the Web Interface on the server (I don’t personally recommend this), turn on the IIS role.
  4. When all my prerequisites are met - and you may want to check the admin guide or the Citrix Web site to find the most recent requirements - I install XenApp 5.0.
  5. Install the most recent Citrix service packs, hotfixes, feature packs, etc.
  6. Apply any best practices and tweaks necessary. (This is a whole topic by itself, so we won’t try to cover it here.)
  7. Now, unless I’m using application streaming (another subject we’re not covering here), I install all of my applications. Generally I start with Microsoft Office, because nearly all the time, a customer requires that at least part of the Office Suite be installed. For specific “line of business” and third-party applications, I would always want to work with the customer’s Subject Matter Expert (“SME”) to verify proper operation.
  8. After the application is installed, I have the SME test the functionality to verify that the application is functioning as would be expected to do whatever it is the business needs the application to do.

If the customer’s SME agrees that the applications are working correctly, I am ready to transform this server into my Gold image. This couldn’t be easier, especially if you’re virtualizing the XenApp servers. (And you know that XenServer is the best virtualization platform for XenApp, right?) Here are the steps:

  1. Hopefully I was thinking ahead and used a generic name for the server when I built it…but if for some reason I forgot to do that, I change the server name to something generic and reboot.
  2. Now I download XenApp Prep and install it to the server by running the MSI file. By default, the XenApp Prep installation places its executables in the C:Program FilesCitrixXenAppPrep directory (click image to view full size):
  3. The XenApp Prep Directory

  4. If you are not creating an image for Provisioning Server - and we’re assuming here that you’re not - then all you do is navigate to the directory shown above and double click the XenAppPrep.exe to run it. (Again, refer to the readme and FAQ that come with XenApp Prep if you are creating an image for PVS.) A command window will appear, run a few commands, and close. That’s it - and that quick little process that took about 15 seconds saved you at least 10 minutes.
  5. Once XenApp Prep has completed, I next remove the IP address by either setting it to DHCP or to some static IP address. I prefer to set the address to something that’s not on its local subnet, so when it reboots, it cannot communicate until I want it to.
  6. I now navigate to the C:windowssystem32sysprep directory, and doubleclick the sysprep.exe file to run, select the “OOBE” option (that’s “Out Of Box” Experience, not “Out Of Body”), select the option to shut down the server (not reboot), then click “next,” and sysprep runs - taking only a few seconds to complete:
  7. The Sysprep Directory


    The Out-of-Box Experience


    Sysprep Runs

At this point, you have your Gold image and you’re ready to deploy it over and over again. How do you do that? Again, it couldn’t be any easier:

  1. Copy the image to a new physical server using whatever imaging tool you prefer - we generally use Ultrabac’s UBDR Gold or Acronis, but whatever tool you prefer should work fine. If you’re virtualizing on XenServer, Hyper-V, or VMware all you need to do is copy the image to another storage repository.
  2. After the copying process is done - which is the longest step in the process of creating your clone - boot the server up, and follow the sysprep utility prompts (as though you just ran “setup” on a brand new server - hence the “Out of Box Experience”) to give the server its final name. This may take several minutes to complete.
  3. Boot Your New Server

  4. When sysprep is done, you will need to change the password in order to log on to the system.
  5. Immediately set the correct IP address and verify that the machine can ping the domain name.
  6. Go to the system properties and join the machine to your domain.
  7. Reboot
  8. When the server comes up this time, and you log onto the domain, your server should have already joined the Citrix farm and be ready to go. Just to be sure, I open a command prompt and type “qfarm” to verify that the server is now a member of the farm.
  9. Once you’ve confirmed that the server is in the farm, run the Access Suite Console, and configure it to see the farm. Once it comes up, I simply drag and drop the published applications that should be assigned to the new server and it’s ready to go.
  10. After I drag the applications onto the server, just to be sure, I again run a qfarm command - “qfarm/app” - to verify that the farm sees the new server with the newly allocated published applications on it.
  11. After you test the new server, make sure you’ve enabled logons on it.

That’s it - you now have another server in your farm, and creating more servers should only take you a few minutes for each one. (Of course the copy process is the slowest part…but you can always use that time to refill your coffee cup, comment on our blog site, or otherwise multitask if you’re really ambitious.)

How to Ruin Your Weekend and Other Hazards of Mis-Configured HA

NOTE: This was originally posted in October, 2009, and may not be a problem any more with current versions of XenServer, as some of the more recent comments would tend to verify - but we will keep the post active for historical purposes. (added by blog administrator, March 16, 2012)

The Level 1 HA (High Availability) feature that comes with Citrix Essentials for XenServer may be one of the best ways to crash your whole virtual infrastructure if you don’t understand how it works and don’t design in an appropriate level of redundancy. This of course will lead to hours of down time, unhappy management, possible data loss, and lots of extra work for you (most likely on a weekend).

The basics -
HA is designed to monitor the XenServer virtualization environment. When HA is enabled, the administrator can specify which virtual machines (VMs) need to be automatically restarted if the host server they’re running on should fail. If there is a failure of a host server, HA should then automatically restart its designated guest VMs on another host in the XenServer “resource pool.” Note that the HA function does not “live migrate” the guest VMs, because when a host fails the VMs on that host also fail. Rather, it selects another host server and restarts the VMs on that host. For all of this to happen correctly, Citrix’s HA requires two things to be true at all times:

  1. Each XenServer must be able to communicate with its peers in the pool.
  2. Each XenServer in the pool requires access at all times to the HA heartbeat disk, which is shared by all the XenServers in the pool.

If either of these two items is not true for any given XenServer in the pool, that server will “fence.” The short definition of “fencing” is that the XenServer suspects – although it’s not absolutely sure – that it is experiencing some kind of failure, so to protect against possible data corruption it shuts itself down – essentially sacrificing itself to protect the data – until a human comes along and sorts things out. If the fenced server is in a correctly configured HA pool, guest VMs that were configured for HA restart will be restarted on a surviving XenServer.

Considerations -
So… you have two XenServers all set up and all your VMs configured just the way you like them, and you decide to turn on HA. Everything appears to be working until one of the hosts suffers a failure and goes off line. (Murphy’s Law says this will happen on a Saturday evening right before your BBQ party is starting.) With HA enabled, you would expect, based on the whole “High Availability” concept, that everything would be OK. Critical VMs should get restarted on the other host and you should be able to deal with the failed host on Monday.

Oh, but wait, remember HA rule #1? The XenServer host that is still running suddenly does not have any peers to talk to. It no longer knows whether or not it’s healthy so, in the interest of protecting your data from corruption, it does what it’s designed to do – it fences, and now both of your XenServers are down. They may try to reboot, but you are now in an endless loop of fencing, and to get it resolved, you’re going to have to know how to use the “xe host-emergency-ha-disable force=true” command to resolve your problems. (And if you don’t understand that last sentence, you’re in for a long weekend.)

This results in a situation that we in IT refer to as “not good,” with a chance of “career altering,” and you’re going to miss your BBQ party.

Here’s another scenario that will spoil your party: What if both XenServers are actually healthy, and all the virtual servers are up and functioning, but the network link for the management communications between the XenServers fails? Again, each XenServer would think it was stranded from the pool and fence itself in an attempt to correct the issue. With both servers fencing, this would again create an endless loop of server fencing. In essence, one server would start to come back online and would still not see the other XenServer and would fence again, and so on, and so on.

So for those reasons a two-XenServer pool cannot successfully run HA! Just don’t do it - even though you can configure HA on a two-server pool the result can be disastrous and ruin your weekend…not to mention your next performance review.

Well, what about HA in a three node XenServer pool? Based upon the previously described scenarios, you now have a valid “pool,” in which HA will function. So you configure and enable HA, and when you test the HA functionality by killing one of the XenServers, everything works like it is supposed to. The guest VMs are restarted on the surviving XenServer hosts and you’re happy that everything is working correctly.

But here is another “gotcha!” If you have only one Ethernet interface per XenServer assigned to management, and they’re all plugged into one switch, what happens if the management link fails because a NIC fails – or even worse, the switch fails? If it’s just a NIC in one server, then that XenServer will fence – not too bad but still not what you want. If you were using a different set of NICs (as you always should) for the guest VMs to communicate with the rest of the world, then the guests on that server were probably up and working just fine until the server fenced. Sure, the critical ones will restart on the remaining servers, but you’ve lost a third of the resources in your pool unnecessarily.

Now let’s consider what would happen if the switch should fail and you had only single management ports on each XenServer all plugged into just that one switch. If this happens, it may be time to dust off the old resume, because you have just lost your entire XenServer pool. Why? Because when the switch went down, all the XenServers lost communication with one another, and each assumed that, because it was suddenly isolated from the pool, it must be experiencing some kind of failure. Therefore the whole pool fenced.

Conclusions -
Citrix’s HA does not work in a two host pool, period. With a pool of three or more XenServers you’ll be OK if you design the infrastructure correctly so that there is no single point of failure in your peer communications. How? Simply by bonding together two NICs, dedicating them to the management communication function, and then splitting the bonded pairs between two separate Ethernet switches. That way you’re protected against both a NIC failure and a switch failure.

But you’re not out of the woods yet! Don’t forget HA rule #2 – servers need to see the HA heartbeat disk. This is equally important, and you must consider the topology of that side of the network (iSCSI, Fiber, etc.) and be sure it is also redundant. And if you’re using iSCSI multi-pathing (e.g., with a pair of mirrored DataCore iSCSI SAN nodes), be sure to manually bump up the HA timeout interval so that if one of the SAN nodes should fail, the multi-pathing function has time to fail over to the other node before the XenServers all conclude that the HA heartbeat disk is gone – otherwise, again, they will all fence. Our testing indicates that a two minute timeout appears to have an adequate margin of safety. The default setting of one minute (oops - the default is actually 30 seconds) is definitely too short. Unfortunately, this setting does not appear to be persistent, so if you turn HA off and then back on, you’ll need to manually reset the timeout interval again. (This is probably a job for Workflow Studio, but we just haven’t had time to work through the process yet.)

NO Single Points of Failure
HA will do a fine job of protecting you, if you build the network correctly. So make sure you’ve built in enough redundancy that you have no single point of failure, and enjoy your BBQ.

P.S.: If you can’t justify more than two XenServers, but you still have one or more critical guests that need to be highly available, there is a solution: Marathon Technologies’ everRun VM. But that’s another post for another day.