Any help is greatly appreciated. Thanks
I am trying to add an ESX Server to the default pool. The ESX Server is version 3.0.1 without any Patches.
The ESX Server virtual networking is configured as follows:
1) There is a virtual switch with the following:
a) a service console port with a VLAN ID of 160
b) a vmkernel port with a VLAN ID of 160
c) a virtual machine portgroup (the "default network") with a VLAN ID of 168
This virtual switch has one outbound NIC. The port on the physical switch to which this NIC is cabled is configured as follows:
interface GigabitEthernet6/11
description SSHDTEDBESX01 Mgmt SCC_10-4
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 160,168
switchport mode trunk
spanning-tree bpdufilter enable
2) a second virtual switch with a single virtual machine portgroup (the "trunked network") with a VLAN ID of 4095. This second virtual switch has one outbound NIC. The port on the physical switch to which this NIC is cabled is configured as follows:
interface GigabitEthernet5/27
description SSHDTEDBESX01 Prod SCC_10-4
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2000-2099
switchport mode trunk
spanning-tree bpdufilter enable
The VCS Server is in VLAN 160. (The Surgient agent is installed on the ESX Console - also in VLAN 160. The Agent is running and the ESX Server is showing up in the Surgient Management Console as a Host which can be added to a Pool. We have only one pool; namely the default pool.) The NAIL Server is in VLAN 168 and is configured to use an IP Address (and the subnet mask and gateway) from the Pool. All ports are (temporarily) opened between the 2 VLANs in both directions.
When I start the process of adding the host to the pool, the NAIL Server is powered up and within a minute or so, the login prompt appeared on the console of the NAIL Server. There is no error message on the NAIL Server console; everything is OK from this screen.
In the NAIL Server guestagent log, the following appear (these may be benign):
20080124 12:14:17.204 [ERROR] NIC - can't open /etc/resolv.conf: No such file or directory
20080124 12:14:17.213 [ERROR] NIC - can't open /etc/resolv.conf: No such file or directory
20080124 12:14:17.221 [ERROR] NIC - can't open /etc/resolv.conf: No such file or directory
The process was aborted about 10 mins after starting. When I clicked on View Error in the Surgient Management console, the following popup appeared:
"00050011 Command 'Engine.Script.initialize-nail-vm' did not complete successfully. Address: 172.16.160.103. Result: Failed. Message: 00350084 The NAIL server agent on SSHDTEDBESX01 has not registered with the VCS after 600 seconds. This error is most often the result of one of the two following problems: 1. The NAIL server's assigned pooled IP address is incompatible with its host's default network (if the 'Use Pooled IP Address' option was chosen when the host was added to the pool). 2. The NAIL server was not able to obtain an IP address from a DHCP server (if the 'DHCP' option was chosen when the host was added to the pool)."
A) The following is from the ServiceHost.log on the VCS Server:
1/24/2008 11:50:08 AM|Warning| 23 |[EventDispatcher] Caught exception while processing handler for message type Surgient.Platform.AgentDocumentParser.AgentReadyMessage : Surgient.Platform.Exceptions.NailException: 00570014 Failed to initialize NAIL VM SSHDTEDBESX01-NailServer-1: 00220007 The specified server (#SSHDTEDBESX01) is not currently pooled.
at Surgient.EventDispatcher.NailRecoveryEventModule.AgentReadyHandler(BaseMessage msg)
at Surgient.EventDispatcher.EventMessageQueueProcessor.Process(MessageRequest request)
at Surgient.Common.Utility.QueueProcessor`1.ProcessQueue()
B) The following is from the Console.log on the VCS Server:
1/24/2008 11:49:04 AM|Info| 16 |[Surgient.App.Console] Thread to add host 2 to pool 1 started.
1/24/2008 11:50:05 AM|Info| 1P|[Surgient.Platform.Persistence] Session (#94) saved.
1/24/2008 11:52:05 AM|Info| 8P|[Surgient.Platform.Persistence] Session (#94) saved.
1/24/2008 11:53:07 AM|Info| 1P|[Surgient.Platform.Persistence] Server (#2) saved.
1/24/2008 11:54:08 AM|Info| 6P|[Surgient.Platform.Persistence] Session (#94) saved.
1/24/2008 11:56:09 AM|Info| 6P|[Surgient.Platform.Persistence] Session (#94) saved.
1/24/2008 11:58:09 AM|Info| 6P|[Surgient.Platform.Persistence] Session (#94) saved.
1/24/2008 11:59:32 AM|Severe| 16 |[Surgient.App.Console] Surgient.Platform.Commands.CommandException: 00050011 Command 'Engine.Script.initialize-nail-vm' did not complete successfully. Address: 172.16.160.103. Result: Failed. Message: 00350084 The NAIL server agent on SSHDTEDBESX01 has not registered with the VCS after 600 seconds. This error is most often the result of one of the two following problems: 1. The NAIL server's assigned pooled IP address is incompatible with its host's default network (if the 'Use Pooled IP Address' option was chosen when the host was added to the pool). 2. The NAIL server was not able to obtain an IP address from a DHCP server (if the 'DHCP' option was chosen when the host was added to the pool).
at Surgient.Deployment.PoolServiceImpl.InitializeNailServerVm(Host h, NailServer nailServer, PooledIpAddress nailServerIp, Int32 poolId)
at Surgient.Deployment.PoolServiceImpl.AddHostToPool(RequestContext ctx, Int32 hostId, Int32 poolId, Int32 ramMb, Int32 vmCount, NailServerAddressType addressType)
at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.PrivateProcessMessage(RuntimeMethodHandle md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext)
at Surgient.Platform.Services.Internal.ServiceProxy.Invoke(IMessage msg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Surgient.Platform.Services.PoolService.AddHostToPool(RequestContext ctx, Int32 hostId, Int32 poolId, Int32 ramMb, Int32 vmCount, NailServerAddressType addressType)
at Surgient.Platform.Services.DynamicSingletons.PoolService_Proxy_4.AddHostToPool(RequestContext , Int32 , Int32 , Int32 , Int32 , NailServerAddressType )
at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.PrivateProcessMessage(RuntimeMethodHandle md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext)
Server stack trace:
at Surgient.Deployment.PoolServiceImpl.InitializeNailServerVm(Host h, NailServer nailServer, PooledIpAddress nailServerIp, Int32 poolId)
at Surgient.Deployment.PoolServiceImpl.AddHostToPool(RequestContext ctx, Int32 hostId, Int32 poolId, Int32 ramMb, Int32 vmCount, NailServerAddressType addressType)
at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.PrivateProcessMessage(RuntimeMethodHandle md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext)
at Surgient.Platform.Services.Internal.ServiceProxy.Invoke(IMessage msg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Surgient.Platform.Services.PoolService.AddHostToPool(RequestContext ctx, Int32 hostId, Int32 poolId, Int32 ramMb, Int32 vmCount, NailServerAddressType addressType)
at Surgient.Platform.Services.DynamicSingletons.PoolService_Proxy_4.AddHostToPool(RequestContext , Int32 , Int32 , Int32 , Int32 , NailServerAddressType )
at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.PrivateProcessMessage(RuntimeMethodHandle md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.SyncProcessMessage(IMessage msg, Int32 methodPtr, Boolean fExecuteInContext)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Surgient.Platform.Services.PoolService.AddHostToPool(RequestContext ctx, Int32 hostId, Int32 poolId, Int32 ramMb, Int32 vmCount, NailServerAddressType addressType)
at Surgient.App.Console.Controls.AddHostToPool.DoAddHostToPool(Object args)
1/24/2008 11:59:32 AM|Info| 16 |[Surgient.App.Console] Thread to add host 2 to pool 1 stopped.
C) The following is from the Scripting.log on the VCS Server:
1/24/2008 11:50:08 AM|Severe|initialize-nail-vm.80 |[ScriptEngine] Script 80 "initialize-nail-vm", SSHDTEDBESX01 (name="SSHDTEDBESX01-NailServer-1", type="NailServer") failed with exception: Surgient.Platform.Exceptions.InvalidArgumentException: 00220007 The specified server (#SSHDTEDBESX01) is not currently pooled.
at Surgient.Automation.Nail.InitializeNailVmBase.GetTargetPool()
at Surgient.Automation.Nail.InitializeNailVmBase.Run(String vmName)
at Surgient.Automation.InitializeNailVm.Start()
1/24/2008 11:50:08 AM|Info|initialize-nail-vm.80 |[Surgient.Platform.Persistence] ScriptInstance (#117) deleted. Fundamental properties of the deleted object: ScriptInstance #117 [CommandId=28; LastSavedCheckpoint=0; SerializedRequest=System.Byte[]; Request=; SerializedData=; Data=; Id=117; IsVolatile=False; LastUpdateOn=1/24/2008 4:50:08 PM; IsDcgNode=False]
1/24/2008 11:50:08 AM|Info|initialize-nail-vm.80 |[ScriptEngine] Script instance 80 "initialize-nail-vm", SSHDTEDBESX01 (name="SSHDTEDBESX01-NailServer-1", type="NailServer") has completed with status Failed.
1/24/2008 11:50:13 AM|Warning| 38P|[initialize-nail-vm-78] STP config for SSHDTEDBESX01-NailServer-1 in advanced mode is incorrectly inactive; disabling bridge interface
1/24/2008 11:50:13 AM|Severe| 38P|[Messaging] Asynchronous invocation of method NailAgentReadyHandler on object type InitializeNailServerBase threw an uncaught exception. Surgient.Platform.Exceptions.NailException: 00570049 STP state of NAIL server SSHDTEDBESX01-NailServer-1 has incorrect root bridge; check that the switch is properly configured to pass BPDU for advanced mode
at Surgient.Automation.Nail.InitializeNailServerBase.SetStpMode()
at Surgient.Automation.Nail.InitializeNailServerBase.NailAgentReadyHandler(BaseMessage msg)
at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.PrivateProcessMessage(RuntimeMethodHandle md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)
Server stack trace:
at Surgient.Automation.Nail.InitializeNailServerBase.SetStpMode()
at Surgient.Automation.Nail.InitializeNailServerBase.NailAgentReadyHandler(BaseMessage msg)
at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.PrivateProcessMessage(RuntimeMethodHandle md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase)
at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData)
at Surgient.Messaging.MessageHandler.EndInvoke(IAsyncResult result)
at Surgient.Messaging.MessageBusProvider.AsyncCallbackMethod(IAsyncResult ar)
1/24/2008 11:59:27 AM|Severe|initialize-nail-vm.78 |[ScriptEngine] Script 78 "initialize-nail-vm", SSHDTEDBESX01 (name="SSHDTEDBESX01-NailServer-1", type="NailServer", poolid="1") failed with exception: Surgient.Automation.AutomationException: 00350084 The NAIL server agent on SSHDTEDBESX01 has not registered with the VCS after 600 seconds. This error is most often the result of one of the two following problems: 1. The NAIL server's assigned pooled IP address is incompatible with its host's default network (if the 'Use Pooled IP Address' option was chosen when the host was added to the pool). 2. The NAIL server was not able to obtain an IP address from a DHCP server (if the 'DHCP' option was chosen when the host was added to the pool).
at Surgient.Automation.Nail.InitializeNailServerBase.ConfigureVm()
at Surgient.Automation.Nail.InitializeNailVmBase.Run(String vmName, Int32 poolId)
at Surgient.Automation.InitializeNailVm.Start()
1/24/2008 11:59:27 AM|Info|initialize-nail-vm.78 |[Surgient.Platform.Persistence] ScriptInstance (#116) deleted. Fundamental properties of the deleted object: ScriptInstance #116 [CommandId=28; LastSavedCheckpoint=0; SerializedRequest=System.Byte[]; Request=; SerializedData=; Data=; Id=116; IsVolatile=False; LastUpdateOn=1/24/2008 4:49:09 PM; IsDcgNode=False]
1/24/2008 11:59:27 AM|Info|initialize-nail-vm.78 |[ScriptEngine] Script instance 78 "initialize-nail-vm", SSHDTEDBESX01 (name="SSHDTEDBESX01-NailServer-1", type="NailServer", poolid="1") has completed with status Failed.
Questions:
1. Should the ESX Server be patched? If yes, then with which ones?
2. Later on we'll wish to lock down our network configuration. What are all the ports that need to be open to facilitate all the required traffic from the Nail Server to the VCS Server? Also, what are all the ports that need to be open to facilitate all the required traffic in the opposite direction?
3. How should we configure our switch ports?
(To highlight, the following pieces of text are in the logs: a) STP config for SSHDTEDBESX01-NailServer-1 in advanced mode is incorrectly inactive; disabling bridge interface and b) STP state of NAIL server SSHDTEDBESX01-NailServer-1 has incorrect root bridge; check that the switch is properly configured to pass BPDU for advanced mode)
Many thanks once again