vSwitch0, Native VLANs and Host Building
I was just reading Scott’s posting regarding ESX Server and the Native VLAN. It got me started on a post to describe my logic regarding vSwitch0 design for ESX host building.
I can tell you that I suggest native VLAN use for many ESX installations. I suggest this for ease of building the server. It is not required and many network guys hate this. I understand, but it is a convenience during the building process. Depending on the environment, it may be a requirement (lights out data center for example).
A sample vSwitch0 network looks something like this:![]()
The matching Cisco configuration would be something like this:
interface GigabitEthernet4/17 switchport trunk allowed vlan 15,254 switchport trunk encapsulation dot1q switchport mode trunk switchport trunk native vlan 15 no ip address spanning-tree portfast trunk
It is important to note that for all port groups that need VLAN 15, you must use VLAN 0 (native/un-tagged). A VLAN either tagged or not tagged, but not both. I am not sure if it is a Cisco thing or a VMware thing, but if you were to add a port group to this vSwitch and tag it VLAN 15, the traffic will not work.
Build Problems
I have been unsuccessful building an ESX host using kickstart on a trunk without the use of the native vlan. The problem is that there is no vlanid option on the boot prompt. After we read the ks file in, we might be able to immediately start tagging packets using the –vlanid= parameter on the network line in your ks file. It would look something like this:
network --device eth4 --bootproto static --ip 192.168.15.107 --netmask 255.255.255.0 --gateway 192.168.15.1 --nameserver 192.168.18.5 --hostname esx01.whatever.com --addvmportgroup=0 --vlanid=15
As I understand it, immediately after reading in the ks file, the network stack will be reloaded with the settings configured here. But you have to get to the file first.
To deal with this, I see three options.
- Put the kickstart cfg file on local media (floppy, cdrom, usb, …).
- Use a “build port” temporarily. Usually a port on the patch panel in the server rack that is an “access” port only used to build the host. You will need to apply the tagging to the service console post build.
- Use the native VLAN option.
Of those options I typically prefer to use the native VLAN. So far, I have not found it to be a problem.
Building is still frustrating
I have found that even with the native VLAN, I still have too many problems with network. It’s more so that the build environment does not have many troubleshooting tools. You have to rely on the messages on tty 3 and 4; if they don’t fly off your screen. Then to rerun to build, you have to wait for post and such. Also, usually waiting for a network person to check or change settings in between each attempt.
In an effort to minimize frustration and increase productivity, I now recommend using the Ultimate Deployment Appliance in VMware player or workstation on a laptop. For me this virtual appliance ties some pieces together, such as a web server, dhcp and tftp for PXE. It has limited ESX support, which is fine because I use the exact same %post scripts as I did in the past.
Here is the high level build process:
- Take your laptop to the data center and plug a cat5 cable between your laptop and the back of the server.
- Hard code the IP on your laptop and UDA. Put it on the same subnet as where the service console will be. Be sure to undo this before plugging into the real network. Also, be sure to turn off the dhcp server whenever you are not using it to be sure that you are not breaking anything on the real network.
- Create a “template” for your server in UDA
- PXE boot the new ESX server and select the template name.
Here is the best part. After the build is complete, unplug your laptop and plug in the real network. Now you can use the full suite of tools to troubleshoot the network. You get fancy newfangled fancy tools like ping and such.
So, to go full circle, if you are using UDA with a laptop, you don’t really need to use the native VLAN for the service console. If you want to tag, then post-build, you will need to add the VLAN tag for the service console. I personally dislike manual tasks and prefer to use the native VLAN all the time.
Jeremy,
Great write-up! The build problems you discussed when not using the native VLAN are exactly what prompted my article, as I have a number of customers that are experiencing problems with builds when they do not use the native VLAN.
Hi,
The network security best practice is to set the Native VLAN to something that is not used anywhere else in the network. Your advice here seems to be to set the Native VLAN to be the same as the VLAN of the PXE boot server.
If you set the Native VLAN on the Cisco switch trunk port in this manner, do you expose the network to the VLAN hopping attack via double encapsulation? This is described here:
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper09186a008013159f.shtml
and also mentioned by Scott Lowe here:
http://blog.scottlowe.org/2008/03/05/vmotion-and-vlan-security/
Thanks,
Quick, thanks for bringing this up. I have been meaning to get to this since Scott put together his post in March. I need to put together a post to address it fully.
The short story is that using the native vlan does open you to some additional risk. An attack does need to originate from an access port on the same vlan as the native vlan. In my case, normally I will suggest a “management” vswitch used only for ESX things like the service console, vmotion, and nfs mounts for ISOs. The only vlan’s exposed to the risk would be any vlans except the native. This pretty much comes down to the vmotion vlan. Following best practice, this should be a non-routed vlan. If I am fully understanding this, the return traffic from the vmotion network to the attacker should not make it as there would be no return route. The attack would need to consist of a single packet or packets with no return necessary.
Also, many clients I work with have dedicated vlans just for ESX service console and vmkernel interfaces. This would also help narrow down the attack sources.
You can assess your own risk vs reward. Deploying ESX without the native vlan leaves you the following options.
1. Designate a port per rack for deployment. This would be an access port on the same vlan as your service console. Plug in the deployment cable for building. Post build, add the tagging to the service console with esxcfg-vswitch.
2. Same as #1 but build with UDA.
3. Dedicate network interfaces to the service console and use access ports. With HA, it is a best practice to have to vmnic’s on this switch for heartbeat redundancy.
4. Install by hand and script the post installation configuration.
So if you use blades in a lights out datacenter. Good luck.
Also, your comment somehow was held in moderation by wordpress until I approved it. It may have been anti-spam. Anyway, that explains the delay in your comment appearing.
Jeremy,
Thanks for the reply. It seems you’re under the impression that our provisioning VLAN is the same as the kernel VLAN.
We have a VLAN for the Kernel, another for Service Console, and once the ESX server is built, there will be a separate vswitch just for them with multiple uplinks. We have a different VLAN for provisioning. Plus others for production traffic. Unfortunately, the same provisioning VLAN will be used for building new virtual machines and other physical host OSes as well, so setting the native VLAN to the provisioning VLAN is probably not going to fly.
Although I was hoping to avoid it, I’m pretty sure the only acceptable solution to our security folks will be to configure additional uplinks as access ports to use for provisioning and swap things around after the OS is deployed (although we’re using HP virtual connects, so we may do it there).
Perhaps eventually, the install kernels will jump into the 21st century and figure out a way to support VLANs.
Thanks,