Tuesday, April 21, 2009

An Interesting Day with Citrix Load Evaluators


Ever have one of those days? Well today was it for me! We run XenApp 4.5 in 2 geographic production Zones. I get an urgent page from support that EVERYONE hitting one of our Zones was getting "... servers are reporting full load and cannot accept your connection ..." EEK!

For our Physical Servers, since our full migration to 4.5 a year ago, we have use an 80/20 evaluator (which I have mentioned before), but to review:
  • CPU Utilization: No load at <= 20%, Full Load > 80%
  • Memory Utilization: No load at <= 20%, Full Load > 80%
Pretty straight forward. Just an hour before the emergency, I checked my QFARM /ZONELOAD and saw the servers were running with load values between 7000 and 8000, the highest at 8140. Ok, running hot and heavy, but manageable. Then BAM!

In a closer analysis, the servers were all on the tipping point of that 80% threshold for memory consumption. So one server running a 7000 with the load split evenly between Memory and CPU, while another server could be running at 7000 with the load entirely Memory bound. All it takes is one more heavy process to tip the scales. This is what happened to us. Of course, as soon as one server reports full load, the others get hammered, creating a domino effect.

In looking at the servers, they were running between 85% and 93% physical memory utilization and 90% page file utilization. OUCH.

To ease the immediate burden, we changed the load evaluator from 80/20 to DEFAULT (100 Users Max plus Load Throttling). Although this does not fix the memory consumption, it at least allows new connections. They may suffer from lack of memory, but a little is better than nothing in a pinch.

What are we doing about it? We are ramping up our Citrix to VMs process, deploying 8 new VMs (more on that later). This will help us spread the load across more units. We have decided to keep the 80/20 rule in place on the physical servers for now since it has worked well for 18 months. We feel a resource-based evaluator (Mem, CPU) is better than an arbitrary one (User Count). We have also discussed using the built-in ADVANCED evaluator or creating a Memory only load evaluator, but have decided to keep status-quo.

What to take away from this? Understand your environment and applications to make sure you have a proper load evaluator in place. Also understand that looking at /ZONELOAD or /LOAD counts is not enough (this was our mistake). You need to look at the different components of the load counter to make sure you are not operating under a false sense of security.

Use the Load Monitor! I have to admit, since going to XenApp 4.5, I live in the Citrix Access Management Console (AMC) and rarely if ever go into the older Presentation Server Console (PSC). In the PSC, under the servers, you can look at the Load Monitor, which shows you exactly how the load value is calculated. So if you are using 2 criteria, say CPU and RAM 80/20 like me, you can see that a 70% load might be 67% Memory Usages and 27% CPU Usage. In this case, you do not have 3000 load to give before hitting the full 10000 mark, you only have 13% Memory to give before you hit the magic 80% mark, forcing a 10000 load report.

1 comment: