Virtual Southwest
  • Blog
  • About
  • Presentations

Hosts Disconnecting from Cluster, Storage Issues?

4/22/2013

7 Comments

 
Here is an interesting issue I ran into recently-
-Alarms showing loss of path redundancy to storage
-Several hosts disconnect from a cluster
-Cannot access the host via vSphere client or SSH
-One or more datastores shows dead and cannot be accessed
-CPU on several hosts is at or near 100 percent
I saw these issues just after several hosts reported the loss of redundant path to storage alarms.
The storage is managed by a separate team, so I had them check the fabric and storage presented to the cluster, they didn’t see any issues except an alarm around the same time as the first loss of redundant path alarms… So what is the next step, try a rescan of the storage, well I did that and the rescan ran for several minutes and timed out, then that host disconnected from the cluster!  Going back to the storage team I had them check the one of the LUN ID of the datastore that showed dead, they said it showed on line and didn’t see a problem.  Finally they removed and re-presented the LUN to the clusters hosts.
I tried another rescan and again it took forever and failed.  So the next step, reboot a hosts?  I had one that only had one VM on it, rebooted it and the previous dead datastore was back.  A few minutes later the hosts that were previously disconnected from the cluster reconnected and appeared fine..??  
I remember back on a 4.0 environment when someone powered off an iSCSI array the hosts disconnected from the cluster, so I assumed that having the storage pulled out from under the hosts is still an issue in vSphere 5.0.
After doing some research and opening a case with VMware, this still can be an issue.
The link below is to a KB that explains a Permanent Device Loss and  All Paths Down error.  One note on the KB is-
“As the ESXi host is not able to determine if the device loss is permanent (PDL) or transient (APD), it indefinitely retries SCSI I/O, including:
  • Userworld I/O (hostd management agent)
  • Virtual machine guest I/O”
That explains why the hosts disconnected and why the CPU on some showed 100 percent. The hostd process just peaks trying to retry I/O, that slows the management agents so you can’t connect directly, and of course running a rescan of the storage just compounds the problem.
Click here for a link to the KB article.
The KB also notes that the only way to recover is to resolve the storage access issue and reboot the hosts.  Nice…
It turns out there are some settings that can be added to alleviate this issue from happening in 5.1 and in 5.0
Update 2. 
For more details see Cormac Hogans great info on the storage features in 5.1 starting here-
(Hope he doesn't mind me sharing this link)
Another KB states that if Storage I/0 Control is enabled, a host cannot remount the datastore.
In my case SIOC was enabled on all of the datastores.
The KB details steps to stop the SIOC service on a host to allow the removal of the datastore.
Access this KB here-
In my case I think rebooting the hosts was the only option to clear the I/0 to the lost datastore.  Of course what caused the issue on the storage side is still a mystery.
I have since added the settings to each of the hosts and to the cluster, if there is another issue like this one I am hoping it makes a difference.
If you have experienced this or a similar issue please share your experiences.....
7 Comments

PEX 2013

4/15/2013

3 Comments

 
Well I have recovered from attending my first VMware Partner Exchange!   I thought it was great and the breakouts were full of valuable technical information.  I also attended a boot camp, which meant being in class from Saturday till Monday, not the funnest way to spend a weekend in Las Vegas, but definitely worthwhile.
I attended several break outs that focused on virtualizing business critical applications, such as Microsoft SQL Server.  One demo showed the use of a second or standby VM for patches and upgrades.  The demo can be seen here-
All of the presenters in the breakouts made plenty of time to answer questions during and after their presentations.  It was great to ask questions from one of the actual developers of an area or product.
The hands on labs were another great area to see and learn new technologies!
For anyone else who was able to attend PEX this year, let me know your thoughts. 

3 Comments

Virtual Machine Backups with EMC Avamar

12/15/2012

18 Comments

 
We have been evaluating backup systems for our virtual environment the past few months.  The requirements are to back up all virtual machines, over 1,000 and growing, have an option for DR replication, can quickly restore a virtual machine, and include the storage to house the backups.

I have used most of the popular backup products in various size environments, but this was the first time I got hands on use of EMC’s Avamar.

I don’t want to go into a bakeoff comparison of the other backup products, and one thing to note is that price was not a major requirement.  Avamar is not the least expensive, but you know the saying, you get what you pay for!

Avamar is a complete backup solution, including the backup server, data storage, all client backup licenses such as Exchange, Oracle, SQL and file level.  There is a single interface for administration either using a web browser, console client or command line interface.

Avamar is made up of two components, a utility node and a storage node.

Utility nodes are dedicated to providing internal Avamar server processes and services, including the administrator server, cron jobs, external authentication, Network Time Protocol (NTP) and web access.

Storage nodes include the Avamar Data Server software and are dedicated to store the actual backup data. Multi-node servers include a spare node that can be manually activated in the event of a node failure.

Avamar includes the following key features   (copied from the sales literature):

Global data deduplication ensures that data objects are only backed up once across the backup environment.

Systematic fault tolerance, using RAID, RAIN, checkpoints, and replication provides data integrity and disaster recovery protection
  • Highly reliable, inexpensive disk storage for primary backup storage.
  • Standard IP network technologies.  Optimizes use of network for backup; dedicated backup networks are not required.  Daily full backups are possible using existing networks and infrastructure.
  • Scalable server architecture, provides security and expandability.  Additional storage nodes can be added to an Avamar multi-node server to accommodate increased backup storage requirements.
  • Flexible deployment options include Avamar Virtual Edition and Avamar Data Store. Avamar supports a wide-variety of client operating systems and applications, including: Windows, Linux, Unix, NDMP, Microsoft SQL, Microsoft Exchange, SharePoint, and Oracle.  With its global deduplication technology, --Avamar is an efficient backup choice for VMware and remote office backup environments.
  • Centralized management. Avamar Enterprise Manager and Avamar Administrator interfaces enable remote management of Avamar servers from a centralized location.
The deduplication with Avamar is quite impressive.  The system we implemented has 32 TB of disk storage and it is backing up 75 TB of data!

You can also set up replication to another Avamar server.

Replication can be configured in multiple ways to meet your requirements.  For example, replication can be used to provide disaster recovery protection of data from multiple single-node servers to a central multi-node server in a remote, branch office to home office scenario. It can also provide peer-to-peer disaster recovery protection from a single-node to single-node server and multi-node to multi-node servers.

The two basic kinds of Avamar replication are standard (normal) and full copy (root-to-root):

Standard or normal replication copies backup data from one or more source Avamar servers to a destination Avamar server. With standard replication, an Avamar server can be both a replication source and a target for replication. And, multiple source Avamar servers can replicate to the same target Avamar server.

Full copy or root-to-root replication creates a complete logical copy of an entire source server on the destination Avamar server.

If you happen to have multiple sites and datacenters with good network connectivity, you can back up your servers, virtual or physical, to a single Avamar server.  Avamar utilizes proxies to run the backup jobs of your virtual machines.  The proxies can be deployed using an OVA file from your vCenter.

Avamar allows various options for restoring a virtual machine that includes:

  • Restoring to a new virtual machine
  • Restoring to the original virtual machine
  • Restoring to a different virtual machine

You also have the options to select what host or cluster to restore to, and if you have multiple VMDK files, to select only the one you need.

So far I am very impressed with the Avamar solution, and in the tech support engineers I have worked with.

More details on Avamar and the restore options to come…

18 Comments

vSphere 5.1 GA Release

9/12/2012

11 Comments

 
Well it's September, and that means another VMware vSphere version release!
Yes, vSphere 5.1 is GA, along with vCloud Director 5.1.
You can check out the What's New in 5.1 here:
http://www.vmware.com/files/pdf/products/vsphere/vmware-what-is-new-vsphere51.pdf
Several interesting new features,
vSphere Distributed Switch – Enhancements such as Network
Health Check, Configuration Backup and Restore, Roll Back and
Recovery, and Link Aggregation Control Protocol support and
deliver more enterprise-class networking functionality and a more
robust foundation for cloud computing.
 vSphere Data Protection – Simple and cost effective backup
and recovery for virtual machines. vSphere Data Protection is a
newly architected solution based EMC Avamar technology that
allows admins to back up virtual machine data to disk without
the need of agents and with built-in deduplication. This feature
replaces the vSphere Data Recovery product available with
previous releases of vSphere.
Interesting since I am in the process of implementing Avamar at my current customer. A more detailed post on that to come.
11 Comments
<<Previous
Forward>>
    View my profile on LinkedIn
    Follow @virtsouthwest

    RSS Feed

    Archives

    December 2024
    October 2024
    August 2024
    September 2023
    September 2022
    June 2022
    August 2021
    December 2019
    September 2019
    January 2019
    August 2018
    June 2018
    October 2017
    September 2017
    March 2017
    September 2016
    February 2016
    November 2015
    March 2015
    May 2014
    January 2014
    July 2013
    April 2013
    December 2012
    September 2012
    August 2012
    July 2012
    June 2012

[email protected]