Monitoring in today’s data centre

Monitoring in today’s data centre

The international law firm depicted in this case study owns no data centres, despite the large amount of data it stores and retrieves. Like many other high-tech companies, it rents space for its servers in data centres owned by others, Paul Gay reports.

“The sites we use are just hosts,” commented the law firm’s data centre support analyst. “The racks we have there are ours, but we rent the environment, power and the bandwidth.” Each of the firm’s offices has one or two racks of servers in a local data centre to service that site’s needs. In addition, the company has a centralised US server location, a secondary centre for backup and redundancy, and plans for new locations overseas.

At present, the analyst uses two Fluke instruments to monitor the firm’s data centres and the status of its servers in those centres: The Fluke 975 AirMeter can record 10 fundamental parameters associated with indoor air quality. Of special importance to data centres are air temperature, relative humidity, and airflow (air velocity). Since servers generate considerable heat, they must be cooled to manufacturer-specified temperatures and subjected to no more than 45% to 50% relative humidity.

The Fluke Ti400 Thermal Imager makes two-dimensional representations of the surface temperatures of objects in an infrared image. The support analyst uses the imager to monitor data centres for general cooling efficiency and to inspect the law firm’s servers in those centres. Software that comes with the imager allows him to change key parameters, optimise images, and extract maximum details from collected data.

Cooling requirement
As the law firm seeks to expand its data handling capabilities, the biggest problem for the support analyst is that data centres lack the required power and cooling capabilities to support new technologies. “We want to take advantage of the latest blade servers, but it’s difficult to find data centres that can support them,” he said. “Blade servers are much more efficient (than traditional servers). We can pack six or seven virtual servers onto a single blade. An enclosure or rack holds eight blades, but each full enclosure requires a lot of power. Many data centres simply can’t supply that power. They want us to use an older technology so they can support our needs.”

Blade servers also produce considerable heat and require more cooling than many centres can supply. In such cases, the analyst uses the instruments to map cooling patterns in data centres, find faulty cooling arrangements and determine when cooling, air flow and air temperatures are inadequate to guard against breaches of server warranties.

Mapping cooling patterns allows company personnel to see the overall effects of cooling within a data centre. This may seem like a duplication of effort, since data centres themselves monitor the environment — power, air, humidity and cooling. “Our primary use for these tools is to check what we are told by the data centres,” the analyst argued. “Then, if necessary, we can suggest corrective measures to keep our servers functioning efficiently.”

The support analyst cited the local data centre as an example: “The way the room is set up right now greatly restricts the air flow going to some of the devices positioned across the room from the air-conditioning unit. Using the thermal imager, we were able to take temperature readings of surfaces in each area of the room. Then, using those readings, we were able to plot out where the cool air travels. We found that while one area of the room is cold, as we worked our way around the room, areas were gradually warmer and warmer.”

He says that the AirMeter used in conjunction with the imager allowed for a more in-depth analysis of the local data centre: “Temperature and humidity readings indicated that the hot air that should be exhausted from the room is actually being dumped back into the room in an endless cycle. The air-conditioner cools itself down and shuts off. But the circulation fans keep working. They actually kick the hot air back into the room. Overall, it’s a very poorly designed room. We’re looking to the data centre to restructure the cooling system.”

Finding faulty cooling arrangements for the law firm’s servers is another possibility. “Because of the cooling requirements for blade servers, we have been using the thermal imager to monitor the temperatures of the air flowing into the fronts and out the backs of our blade racks,” he said.

A common problem experienced by their blade servers is hot air entering the front of the blades. Only cooling air should be entering the rack fronts. Two situations that lead to this problem are missing blanks on empty rack slots, and server aisles set up with servers arranged front to back. The first situation usually occurs because a user of the data centre does not need all eight slots in a rack or because the data centre lacks the capability to provide power to a full rack. In either case, there are empty slots. Blanking panels should cover those unused slots so that they are not open to the environment.

The analyst claimed that he has documented instances of missing blanking panels at the firm’s secondary centre. “There were empty spaces on the top four slots,” he said. “Nobody could believe it, but hot air from the backs of the servers was circulating over the tops of the racks and coming right back in the front. That greatly increased the temperature of the blades and decreased their efficiency. We needed to fill the tops of those racks with blanking plates.”

The best strategy for a data centre — especially a data centre with blade servers — is to install servers back-to-back in rows facing the fronts of servers in adjacent rows. This creates alternating cold aisles and hot aisles.

“A lot of data centres are set up with one row of servers after another—back to front, back to front, back to front,” he said disapprovingly. “The hot air from one row of servers blows onto the fronts of the next row, and that’s continued throughout the centre. In Europe, we are looking for a centre with alternating hot and cold aisles.”

Safeguarding server warranties is the analyst’s principal impetus for monitoring the law firm’s servers. “Our blade manufacturer has a recommended maximum temperature that servers can reach. If a server gets above that threshold, it is no longer covered under our warranties and contracts. That would be a huge problem for us.”

Because of these warranty considerations, company personnel need to verify what data centre owners tell them about the cooling in their facilities. And while the blades themselves have internal monitors that track their temperatures, the analyst needs to know how effectively the cooling supplied by the data centre is doing the job. He uses the Ti400 to collect thermal images of the fronts of the blades to determine the temperature of the air flowing in. Then, he compares this temperature to the temperature of the air coming out the back of the rack. Finally, he compares these temperatures to the blade manufacturer’s recommended temperature threshold for the servers.

“If necessary,” he suggested, “we can go back to a data centre’s owners and say, ‘This is the airflow that you are telling us we’re getting, and this is what we see. You’re not meeting our requirements. We need you to upgrade your systems to make sure you meet our specifications.’”

Additional capabilities
The Fluke 975 AirMeter allows the analyst precisely to measure air temperatures and convert the air-meter data into reports for superiors or data centre operators. Using the imager and the AirMeter together “provides results from different angles,” he said. “If we use the imager to reveal the surface temperatures, then we can use the airflow meter to register the actual temperature of the air going into racks.”

The AirMeter also measures relative humidity (RH) and airflow. Excessive humidity in a data centre can lead to condensation on equipment and places an undue load on the air-conditioning system. RH readings played a significant role in uncovering difficulties at the local data centre. Regarding airflow, “In our primary and secondary data centres, the air comes in through the floors,” he commented. “We can use the airflow sensor on the 975 to make sure that airflow is at the rate required by our servers.”

Creating tracking databases
The analyst’s job includes creating and maintaining tracking databases. To do this, he periodically uses the Ti400 to make images at specific points in the primary and secondary data centres. He then logs the data into the database for review as necessary. “We can graph the data for each location and see if the temperature is rising, falling or staying the same over time,” he explained.

Equipment frequently moves within data centres, as clients expand their server capacity or stop using the centre altogether. If the law firm brings a new piece of equipment into a data centre or reacts to a temperature problem by moving an existing server, the tracking database allows them to assess how the change affected ambient temperatures.

From such findings, the support analyst, in cooperation with data centre personnel, can determine when the centre needs more cooling or when equipment is packed in too densely. In general, the database lets the law firm and data centre personnel pinpoint areas where the air is more or less cool or where there is more or less airflow. The analyst speculates that as he becomes more familiar with the AirMeter and its data logging capabilities, it will play a greater role in his tracking databases.

Full details are available online by clicking here.