您的位置:首页 > 运维架构

Solaris Xen Drop 66 – Xen System Administration

2007-07-24 11:32 381 查看

Overview

Introduction to the Hypervisor

The machine monitor within the Solaris Operating System can securely execute multiple virtual machines simultaneously, each running its own operating system, on a single physical system. Each virtual machine instance is called a domain. There are two kinds of domains. The control domain is called domain0, or dom0. A guest OS, or unprivileged domain, is called a domainU or domU. Unlike virtualization using zones, each domain runs a full instance of an operating system.

A hypervisor is also known as a Virtual Machine Monitor (VMM).

How Hypervisors Work

A hypervisor is a software system that partitions a single physical machine into multiple virtual machines, to provide server consolidation and utility computing. Existing applications and binaries run unmodified.

The hypervisor controls the MMU, CPU scheduling, and interrupt controller, presenting a virtual machine to guests.

The hypervisor separates the software from the hardware by forming a layer between the software running in the virtual machine and the hardware. This separation enables the hypervisor to control how guest operating systems running inside a virtual machine use hardware resources. A hypervisor provides a uniform view of underlying hardware. Machines from different vendors with different I/O subsystems appear the same, which means that virtual machines can run on any available computer. Thus, administrators can view hardware as a pool of resources that can run arbitrary services on demand. Because the hypervisor also encapsulates a virtual machine's software state, the hypervisor layer can map and remap virtual machines to available hardware resources at any time and also live migrate virtual machines across computers. These capabilities can also be used for load balancing among a collection of machines, dealing with hardware failures, and scaling systems. When a computer fails and must go offline or when a new machine comes online, the hypervisor layer can simply remap virtual machines accordingly. Virtual machines are also easy to replicate, which lets administrators bring new services online as needed.

Containment means that administrators can suspend virtual machines and resume them at any time, or checkpoint them and roll them back to a previous execution state. With this general-purpose undo capability, systems can more easily recover from crashes or configuration errors. Containment also supports a very general mobility model. Users can copy a suspended virtual machine over a network or store and transport it on removable media. The hypervisor can also provide total mediation of all interactions between the virtual machine and underlying hardware, thus allowing strong isolation between virtual machines and supporting the multiplexing of many virtual machines on a single hardware platform. The hypervisor can then consolidate a collection of virtual machines with low resources onto a single computer, thereby lowering hardware costs and space requirements. Strong isolation is also valuable for reliability and security. Applications that previously ran together on one machine can now be separated on different virtual machines. If one application experiences a fault, the other applications are isolated from this occurrence and will not be affected. Further, if a virtual machine is compromised, the incident is contained to only that compromised virtual machine.

Resource Virtualization

As a key component of virtual machines, the hypervisor provides a layer between software environments and physical hardware that is programmable and transparent to the software above it, while making efficient use of the hardware below it.

Virtualization provides a way to bypass interoperability constraints. Virtualizing a system or component such as a processor, memory, or an I/O device at a given abstraction level maps its interface and visible resources onto the interface and resources of an underlying, possibly different, real system. Consequently, the real system appears as a different virtual system or even as multiple virtual systems.

Virtualization Types

There are two basic types of virtualization, full virtualization and paravirtualization. The hypervisor supports both models.

In a full virtualization, the operating system is completely unaware that it is running in a virtualized environment. In the more lightweight paravirtualization, the operating system is both aware of the virtualization layer and modified to support it, which results in higher performance.

The paravirtualized domU operating system is ported to run on top of the hypervisor, and uses virtual network, disk, and console devices.

Since dom0 must work closely with the hypervisor layer, dom0 is always paravirtualized. DomUs can be either paravirtualized or fully virtualized, and a system can have both varieties running simultaneously.

A hardware virtual machine (HVM) domU runs an unmodified operating system. These hardware-assisted virtual machines take advantage of Intel-VT or AMD Secure Virtual Machine (SVM) processors.

About Domains

Dom0 and domU are separate entities. Other than by login, you cannot access a domU from dom0. A dom0 should be reserved for system management work associated with running a hypervisor. This means, for example, that users should not have logins on dom0. Dom 0 provides shared access to a physical network interface to the guest domains, which have no direct access to physical devices.

A Solaris domU works like a normal Solaris Operating System. All of the usual tools are available.

Domain States

A domain can be in one of six states. States are shown in virt-manager screens and in xm list displays:

Name            ID   Mem VCPUs      State   Time(s)
Domain-0              0  2049     2     r-----   4138.5
sxc18                 3   511     1     -b----    765.5

The states are:

r, running
The domain is currently running on a CPU.

b, blocked
The domain is blocked, and not running or able to be run. This can be caused because the domain is waiting on IO (a traditional wait state) or it has gone to sleep because there was nothing running in it.

p, paused
The domain has been paused, usually through the administrator running xm pause. When in a paused state, the domain will still consume allocated resources like memory, but will not be eligible for scheduling by the hypervisor. Run xm unpause to place the domain in the running state.

c, crashed
The domain has crashed. Usually this state can only occur if the domain has been configured not to restart on crash. See xmdomain.cfg(5) for more information.

s, shutdown
The domain is shut down.

d, dying
The domain is in the process of shutting down or crashing.

SMF Hypervisor Services

In Solaris, all of the properties from xend-config.sxp have been put into SMF xctl/xend (config/*).



To modify an existing property:



# svccfg -s xctl/xend listprop
# svccfg -s xctl/xend setprop config/dom0-cpus = 1
# svcadm refresh xctl/xend



To create a new property.



# svccfg -s xctl/xend setprop config/vncpasswd = astring: /"password/"
# svcadm refresh xctl/xend
# svcadm restart xend
# svcprop xctl/xend

Verify That the xctl Hypervisor Services Are Started

Become superuser, or assume the Primary Administrator role..

Verify that the xctl services are running.
# svcs -a | grep xctl

If the system displays the following, the services are not running:

disabled       12:29:34 svc:/system/xctl/console:default
disabled       12:29:34 svc:/system/xctl/xend:default
disabled       12:29:34 svc:/system/xctl/store:default



If the services are not running, verify that you booted an i86xpv kernel.
# uname -i
i86xpv

Reboot if necessary.



If the correct kernel is running, enable the services.
# svcadm enable xctl/store
# svcadm enable xctl/xend
# svcadm enable xctl/console

You are now ready to create guest domains (domUs).





How To Manage Guest (DomU) Domains

Example

Create a domU that uses the following .py file.



: p5b-vm[1]#; cat guest.py
name = "solaris"
vcpus = 2
memory = "512"

extra = "-k"

root = "/dev/dsk/c0d0s0"
disk = ['file:/tank/guests/solaris/disk.img,0,w']

vif = ['']

on_xend_start = "start"
on_xend_stop = "shutdown"

on_shutdown = "destroy"
on_reboot = "restart"
on_crash = "destroy"



Notice the on_xend_start and on_xend_stop entries. Either of the two entries can be defined. Both default to "invalid" if not defined.

on_xend_start = "start"

on_xend_stop = "shutdown"






Create the domain, but don't start it.

: p5b-vm[1]#; xm new -f <path to the py file>
: p5b-vm[1]#; xm list
Name                                      ID   Mem VCPUs      State   Time(s)
Domain-0                                   0  2254     2     r-----    113.3
solaris                                        512     1                 0.0
: p5b-vm[1]#;

Now you can start, suspend, and resume the domain. If it is shut it down, it will still be in the list. And, it is set to auto boot in dom0 poweron.

: p5b-vm[1]#; xm start solaris
: p5b-vm[1]#; xm list
Name                                      ID   Mem VCPUs      State   Time(s)
Domain-0                                   0  2254     2     r-----    116.4
solaris                                    5   512     1     r-----      4.2
: p5b-vm[1]#; xm suspend solaris
: p5b-vm[1]#; xm list
Name                                      ID   Mem VCPUs      State   Time(s)
Domain-0                                   0  2254     2     r-----    129.4
solaris                                          1     1                31.2
: p5b-vm[1]#; xm resume solaris
: p5b-vm[1]#; xm list
Name                                      ID   Mem VCPUs      State   Time(s)
Domain-0                                   0  2254     2     r-----    132.6
solaris                                    6   511     2     -b----      0.1
: p5b-vm[1]#; 
: p5b-vm[1]#; xm shutdown solaris
: p5b-vm[1]#; xm list
Name                                      ID   Mem VCPUs      State   Time(s)
Domain-0                                   0  2254     2     r-----    134.2
solaris                                        511     2                 0.5
: p5b-vm[1]#;



When you suspend a domain, the state is saved in /var/lib/xend/domains. This can fill up / quickly. There will be aa SMF property in xend to change the base dir where domains live. You might want to create a link for now. You can also still use save/restore to specify where to save and load the guest image to and from.



If you modify cpus/memory from xm or virsh, these changes will be saved in the configuration file and persist across reboots.



If you want to modify other parameters on a domain that is shutdown, you can add the domains uuid to the original .py file and re-run the xm new command.



: p5b-vm[1]#; echo 'uuid = "6dd59cf5-a17c-f7dc-255e-4efddfffb008"' >> <path to py file>
: p5b-vm[1]#; xm new -f <path to the py file>





Enable Live Migration

By default, xend listens only on the loopback address for requests from the localhost. If you want to allow other machines to live migrate to the machine, you must do the following:



Listen on all addresses (or you can specify a particular interface IP)
# svccfg -s xend setprop config/xend-relocation-address = /"/"



Create list of hosts from which to accept migrations:
# svccfg -s xend setprop config/xend-relocation-hosts-allow = /"^flax$ ^localhost$/"



Update the config:
# svcadm refresh xend && svcadm restart xend





How to Debug On Xen

Debugging a Hung domU

First, connect to the domU console and verify the domain is not in kernel debugger (kmdb) or a similar state.



If a domU appears hung, always use xm dump-core' to take a dump file. Place this in /net/mdb.eng/cores/ and report it when filing a bug. You can look at this file with mdb.



If you can reproduce the hang, make the following changes in /etc/system of the domU:
set cpr_debug=0x3
set xen_suspend_debug=1
set xdf:xdfdebug=0x40

and reproduce. Some debugging output should go to the dom0 console. This is useful for hangs involving save, restor, migrate, shutdown, and reboot operations. It's a good idea to do all testing with these set.



Try sending the domU an interrupt to get it to drop into kmdb. Gentle method is xm sysrq mydomu b. Or, you can use 'q' on the Xen console as described below.



Xen Console

Currently, the Xen console should be set to a serial port for this to work. Type 3 consecutive ctrl-A's on the xen console. You should see the following output on the console.



(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to 
dom0).



To exit the xen console (and get back to the solaris console), type 3 more ctrl-a's.



The following menu.lst example sets both Xen and dom0's console to serial port ttya.



title Solaris dom0
kernel /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1
module /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -k -B console=ttya
module /platform/i86pc/$ISADIR/boot_archive



The following commands are supported in the Xen console. Commonly used keys are:

C - force a Solaris dom0 crash dump (/var/crash/...)

q - put solaris dom0 and all solaris domUs at the kmdb prompt (assuming you booted with -k)

R - force reboot dom0 (i.e. machine is hung)






(XEN) 'h' pressed -> showing installed handlers
(XEN)  key '%' (ascii '25') => Trap to xendbg
(XEN)  key 'C' (ascii '43') => trigger a crashdump
(XEN)  key 'H' (ascii '48') => dump heap info
(XEN)  key 'N' (ascii '4e') => NMI statistics
(XEN)  key 'R' (ascii '52') => reboot machine
(XEN)  key 'a' (ascii '61') => dump timer queues
(XEN)  key 'd' (ascii '64') => dump registers
(XEN)  key 'h' (ascii '68') => show this message
(XEN)  key 'i' (ascii '69') => dump interrupt bindings
(XEN)  key 'm' (ascii '6d') => memory info
(XEN)  key 'n' (ascii '6e') => trigger an NMI
(XEN)  key 'q' (ascii '71') => dump domain (and guest debug) info
(XEN)  key 'r' (ascii '72') => dump run queues
(XEN)  key 't' (ascii '74') => display multi-cpu clock info
(XEN)  key 'u' (ascii '75') => dump numa info
(XEN)  key 'v' (ascii '76') => dump Intel's VMCS
(XEN)  key 'z' (ascii '7a') => print ioapic info





Event Channels

To dump out info on the event channels.

> ::evtchns
Type          Evtchn IRQ IPL CPU Masked Pending ISR(s) 
ipi           1      256 15  0   0      0       xc_serv
ipi           2      257 13  0   0      0       xc_serv
ipi           3      258 11  0   0      0       poke_cpu
virq:debug    4      259 15  0   0      0       xen_debug_handler
pirq          5      9   9   0   0      0       acpi_wrapper_isr
virq:timer    6      260 14  0   0      0       cbe_fire
ipi           7      261 14  0   0      0       cbe_fire
pirq          8      19  5   0   0      0       ata_intr
pirq          9      16  9   0   0      0       pepb_intx_intr
virq:console  10     262 9   0   0      0       xenconsintr_priv
pirq          11     18  1   0   0      0       uhci_intr
pirq          12     23  1   0   0      0       uhci_intr
pirq          13     17  6   0   0      0       rge_intr
ipi           14     258 11  1   0      0       poke_cpu
ipi           15     257 13  1   0      0       xc_serv
ipi           16     261 14  1   0      0       cbe_fire
ipi           17     256 15  1   0      0       xc_serv
virq:timer    18     260 14  1   0      0       cbe_fire
device        19     263 1   0   0      0       evtchn_device_upcall
evtchn        20     264 1   0   0      0       xenbus_intr
device        21     263 1   0   0      0       evtchn_device_upcall
device        22     263 1   0   0      0       evtchn_device_upcall
pirq          23     22  9   1   0      0       audiohd_intr
device        24     263 1   0   0      0       evtchn_device_upcall
evtchn        25     265 6   0   0      0       intr
evtchn        26     266 5   1   0      0       xdb_intr
evtchn        27     267 5   0   0      0       xdb_intr
>


Do get more information for Type=device: Pass in the event channel number for the array index. For this example, I'm looking at the following:

Type          Evtchn IRQ IPL CPU Masked Pending ISR(s) 
device        19     263 1   0   0      0       evtchn_device_upcall



Using event channel 19 (0t19), dump evtsoftdata

> *(port_user+(0x8*(0t19)))::print struct evtsoftdata
{
    dip = 0xfffffffec08afd68
    ring = 0xfffffffec5a10000
    ring_cons = 0x185
    ring_prod = 0x185
    ring_overflow = 0
    evtchn_wait = {
        _opaque = 0
    }
    evtchn_lock = {
        _opaque = [ 0 ]
    }
    evtchn_pollhead = {
        bsys_version = 0xc757f840
        boot_mem = 0
        bsys_alloc = 0
        bsys_free = 0x1ec
        bsys_getproplen = 0xfffffffec757f608
        bsys_getprop = 0
        bsys_nextprop = 0xfffffffec08afd68
        bsys_printf = 0
        bsys_doint = 0xfffffffec73b4dc8
        bsys_ealloc = 0xde00000000
    }
    pid = 0x1ec
}
>



Also determine which user process is using this event channel.

> *(port_user+(0x8*(0t19)))::print struct evtsoftdata pid | ::pid2proc | ::print proc_t p_user.u_psargs
p_user.u_psargs = [ "/usr/lib/xenstored --pid-file=/var/run/xenstore.pid" ]
>



Current Issues and Potential Solutions

xend fails to start:



[2007-05-04 14:46:08 100668] ERROR (SrvDaemon:353) Exception starting xend (not w
ell-formed (invalid token): line 19, column 0)
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 345,
 in run
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvServer.py", line 254,
 in create
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in
 __init__
  File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 82, in get
  File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvNode.py", line 30, in
 __init__
  File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 658, in inst
ance
  File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 87, in __ini
t__
  File "/usr/lib/python2.4/site-packages/xen/xend/XendStateStore.py", line 104, i
n load_state
  File "/var/tmp/pkgbuild-gbuild/SUNWPython-extra-2.4.2-build/usr/lib/python2.4/s
ite-packages/_xmlplus/dom/minidom.py", line 1915, in parse
  File "/var/tmp/pkgbuild-gbuild/SUNWPython-extra-2.4.2-build/usr/lib/python2.4/s
ite-packages/_xmlplus/dom/expatbuilder.py", line 926, in parse
  File "/var/tmp/pkgbuild-gbuild/SUNWPython-extra-2.4.2-build/usr/lib/python2.4/s
ite-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile
ExpatError: not well-formed (invalid token): line 19, column 0
[2007-05-04 14:46:09 100676] INFO (SrvDaemon:331) Xend Daemon started
[2007-05-04 14:46:09 100676] INFO (SrvDaemon:335) Xend changeset: Tue May 01 17:1
2:19 2007 -0700 15014:66538ef9ecc5.
[2007-05-04 14:46:09 100676] INFO (SrvDaemon:342) Xend version: Unknown.

The failure to start is due to xend>/tt>'s state becoming corrupted. The solution is to do the following:



% rm -rf /var/lib/xend/state
% svcadm clear xend





Debugging a Lost Disk Interrupt

In this case, a Linux guest is running. The guest hangs trying to read/write the disk. Nothing looks wrong in ::evtchns, so look at the disk backend driver. You can see below that the frontend's (xdf) producer index is req_prod = 0xb083, and that the backend's (xdb) consumer index isxr_sring.br.req_cons = 0xb063. So, there is work to do but the backend driver doesn't know about it. Dropping down to kmdb and forcing the backend's interrupt routine to run gets the domU going again.

[0]> xdb_intr::call 0xfffffffed1e52000



# mdb -k
Loading modules: [ unix genunix specfs dtrace xpv_psm scsi_vhci ufs ip hook neti sctp arp usba fctl nca lofs zfs random emlxs md crypto fcp ptm sppp ipc ]
> ::evtchns
Type          Evtchn IRQ IPL CPU Masked Pending ISR(s) 
ipi           1      256 15  0   0      0       xc_serv
ipi           2      257 13  0   0      0       xc_serv
ipi           3      258 11  0   0      0       poke_cpu
virq:debug    4      259 15  0   0      0       xen_debug_handler
pirq          5      9   9   0   0      0       acpi_wrapper_isr
virq:timer    6      260 14  0   0      0       cbe_fire
ipi           7      261 14  0   0      0       cbe_fire
pirq          8      16  5   0   0      0       mpt_intr
virq:console  9      262 9   0   0      0       xenconsintr_priv
pirq          10     20  1   0   0      0       ehci_intr
pirq          11     21  1   0   0      0       ohci_intr
ipi           12     258 11  1   0      0       poke_cpu
ipi           13     257 13  1   0      0       xc_serv
ipi           14     261 14  1   0      0       cbe_fire
ipi           15     256 15  1   0      0       xc_serv
virq:timer    16     260 14  1   0      0       cbe_fire
ipi           17     258 11  2   0      0       poke_cpu
ipi           18     257 13  2   0      0       xc_serv
ipi           19     261 14  2   0      0       cbe_fire
ipi           20     256 15  2   0      0       xc_serv
virq:timer    21     260 14  2   0      0       cbe_fire
ipi           22     258 11  3   0      0       poke_cpu
ipi           23     257 13  3   0      0       xc_serv
ipi           24     261 14  3   0      0       cbe_fire
ipi           25     256 15  3   0      0       xc_serv
virq:timer    26     260 14  3   0      0       cbe_fire
ipi           27     258 11  4   0      0       poke_cpu
ipi           28     257 13  4   0      0       xc_serv
ipi           29     261 14  4   0      0       cbe_fire
ipi           30     256 15  4   0      0       xc_serv
virq:timer    31     260 14  4   0      0       cbe_fire
ipi           32     258 11  5   0      0       poke_cpu
ipi           33     257 13  5   0      0       xc_serv
ipi           34     261 14  5   0      0       cbe_fire
ipi           35     256 15  5   0      0       xc_serv
virq:timer    36     260 14  5   0      0       cbe_fire
ipi           37     258 11  6   0      0       poke_cpu
ipi           38     257 13  6   0      0       xc_serv
ipi           39     261 14  6   0      0       cbe_fire
ipi           40     256 15  6   0      0       xc_serv
virq:timer    41     260 14  6   0      0       cbe_fire
ipi           42     258 11  7   0      0       poke_cpu
ipi           43     257 13  7   0      0       xc_serv
ipi           44     261 14  7   0      0       cbe_fire
ipi           45     256 15  7   0      0       xc_serv
virq:timer    46     260 14  7   0      0       cbe_fire
pirq          47     17  6   0   0      0       e1000g_intr_pciexpress
pirq          48     18  6   1   0      0       e1000g_intr_pciexpress
evtchn        49     264 1   3   0      0       xenbus_intr
device        50     263 1   0   0      0       evtchn_device_upcall
device        51     263 1   0   0      0       evtchn_device_upcall
pirq          52     40  1   4   0      0       emlxs_msi_intr
pirq          53     41  1   5   0      0       emlxs_msi_intr
device        54     263 1   0   0      0       evtchn_device_upcall
device        55     263 1   0   0      0       evtchn_device_upcall
evtchn        56     265 5   7   0      0       xdb_intr
evtchn        57     266 6   0   0      0       xnb_intr
> ::prtconf ! grep xdb
        fffffffec3955008 xdb, instance #0 (driver name: xdb)
> fffffffec3955008::print struct dev_info devi_driver_data
devi_driver_data = 0xfffffffed1e52000
> 0xfffffffed1e52000::print xdb_t xs_ring | ::print xendev_ring_t xr_sring.br
{
    xr_sring.br.rsp_prod_pvt = 0xb063
    xr_sring.br.req_cons = 0xb063
    xr_sring.br.nr_ents = 0x20
    xr_sring.br.sring = 0xfffffffed0802000
}
> 0xfffffffed1e52000::print xdb_t xs_ring | ::print xendev_ring_t xr_sring.br.sring | ::print comif_sring_t
{
    req_prod = 0xb083
    req_event = 0xb064
    rsp_prod = 0xb063
    rsp_event = 0xb064
    pad = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
    ring = [ '/001' ]
}
>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: