Intel IOMMU在Linux上的实现架构
2007-08-24 16:58
375 查看
1.检测平台是否支持DMAR设备
./drivers/pci/dmar.c->int __init early_dmar_detect(void){
acpi_status status = AE_OK;
/* if we could find DMAR table, then there are DMAR devices */
status = acpi_get_table(ACPI_SIG_DMAR, 0,
(struct acpi_table_header **)&dmar_tbl);
if (ACPI_SUCCESS(status) && !dmar_tbl) {
printk (KERN_WARNING PREFIX "Unable to map DMAR/n");
status = AE_NOT_FOUND;
}
return (ACPI_SUCCESS(status) ? 1 : 0);
}
该函数在内存初始化的时候调用:
./arch/x86_64/mm/init.c:528: pci_iommu_alloc();
通过读取 DMA Remapping table,来判断判断是否支持DMAR设备。
./include/acpi/actbl1.h:64:#define ACPI_SIG_DMAR "DMAR" /* DMA Remapping table */
/*******************************************************************************
*
* FUNCTION: acpi_get_table
*
* PARAMETERS: table_type - one of the defined table types
* Instance - the non zero instance of the table, allows
* support for multiple tables of the same type
* see acpi_gbl_acpi_table_flag
* ret_buffer - pointer to a structure containing a buffer to
* receive the table
*
* RETURN: Status
*
* DESCRIPTION: This function is called to get an ACPI table. The caller
* supplies an out_buffer large enough to contain the entire ACPI
* table. The caller should call the acpi_get_table_header function
* first to determine the buffer size needed. Upon completion
* the out_buffer->Length field will indicate the number of bytes
* copied into the out_buffer->buf_ptr buffer. This table will be
* a complete table including the header.
*
********************************************************************************/
2.初始化Intel IOMMU设备
./drivers/pci/intel-iommu.c:int __init intel_iommu_init(void)
{
int ret = 0;
if (no_iommu || swiotlb || dmar_disabled)
return -ENODEV;
if (dmar_table_init())
return -ENODEV;
iommu_init_mempool();
dmar_init_reserved_ranges();
init_no_remapping_devices();
ret = init_dmars();
if (ret) {
printk(KERN_ERR "IOMMU: dmar init failed/n");
put_iova_domain(&reserved_iova_list);
iommu_exit_mempool();
return ret;
}
printk(KERN_INFO
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O/n");
force_iommu = 1;
dma_ops = &intel_dma_ops;
return 0;
}
该函数在arch/x86_64/kernel/pci-dma.c的
static int __init pci_iommu_init(void)
{
#ifdef CONFIG_CALGARY_IOMMU
calgary_iommu_init();
#endif
intel_iommu_init();
#ifdef CONFIG_IOMMU
gart_iommu_init();
#endif
no_iommu_init();
return 0;
}
中被调用,同时在该文件中注册为初始化函数:
/* Must execute after PCI subsystem */
fs_initcall(pci_iommu_init);
2.1 dmar_table_init
解析DMAR table。逐一打印每个dmar项,dmar_table_print_dmar_entry(entry_header);
类似如下的信息在dmesg中出现:
ACPI DMAR:Host address width 36
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
switch (entry_header->type) {
case ACPI_DMAR_TYPE_HARDWARE_UNIT:
ret = dmar_parse_one_drhd(entry_header);
break;
case ACPI_DMAR_TYPE_RESERVED_MEMORY:
ret = dmar_parse_one_rmrr(entry_header);
break;
default:
printk(KERN_WARNING PREFIX
"Unknown DMAR structure type/n");
ret = 0; /* for forward compatibility */
break;
}
解析如下两个表项:
DRHD - DMA Engine Reporting Structure
RMRR - Reserved memory Region Reporting Structure
对于DRHD表项,通过register函数,将每个DMA的物理设备放到一个list中。对于每个RMRR,同样放到一个全局列表中。
2.2 iommu_init_mempool
创建几个常用结构的slab_cache:struct iova
struct iommu_domain
struct device_domain_info
2.3 dmar_init_reserved_ranges
初始化保留的区域。下面两种range是需要保留的:1. IOAPIC ranges shouldn't be accessed by DMA
2. Reserve all PCI MMIO to avoid peer-to-peer access
2.4 init_no_remapping_devices
Graphics driver workarounds to provide unity mapDigg This
Most GFX drivers don't call standard PCI DMA APIs to allocate DMA buffer,
Such drivers will be broken with IOMMU enabled. To workaround this issue,
we added two options.
Once graphics devices are converted over to use the DMA-API's this entire
patch can be removed...
a. intel_iommu=igfx_off. With this option, DMAR who has just gfx devices
under it will be ignored. This mostly affect intergated gfx devices.
If the DMAR is ignored, gfx device under it will get physical address
for DMA.
b. intel_iommu=gfx_workaround. With this option, we will setup 1:1 mapping
for whole memory for gfx devices, that is physical address equals to
virtual address.In this way, gfx will use physical address for DMA, this
is primarily for add-in card GFX device.
2.5 init_dmars
初始化dmar数据结构。
TBD:数据结构关系图
dma_ops = &intel_dma_ops;
static struct dma_mapping_ops intel_dma_ops = {
.alloc_coherent = intel_alloc_coherent,
.free_coherent = intel_free_coherent,
.map_single = intel_map_single,
.unmap_single = intel_unmap_single,
.map_sg = intel_map_sg,
.unmap_sg = intel_unmap_sg,
};
3. DMAR ACPI Table结构
The system BIOS is responsible for detecting the remapping hardware functions in the platform and for locating the memory-mapped remapping hardware registers in the host system address space. The BIOS reports the remapping hardware units in a platform to system software through the DMA Remapping Reporting (DMAR) ACPI table described below.3.1 DMA Remapping Reporting Structure
Field | Byte Length | Byte Offset | Description |
Signature | 4 | 0 | “DMAR”. Signature for the DMA Remapping Description table. |
Length | 4 | 4 | Length, in bytes, of the description table including the length of the associated DMAremapping structures. |
Revision | 1 | 8 | 1 |
Checksum | 1 | 9 | Entire table must sum to zero. |
OEMID | 6 | 10 | OEM ID |
OEM Table ID | 8 | 16 | For DMAR description table, the Table ID is the manufacturer model ID. |
OEM Revision | 4 | 24 | OEM Revision of DMAR Table for OEM Table ID. |
Creator ID | 4 | 28 | Vendor ID of utility that created the table. |
Creator Revision | 4 | 32 | Revision of utility that created the table. |
Host Address Width | 1 | 36 | This field indicates the maximum DMA physical addressability supported by this platform. The system address map reported by the BIOS indicates what portions of this addresses are populated. The Host Address Width (HAW) of the platform is computed as (N+1), where N is the value reported in this field. For example, for a platform supporting 40 bits of physical addressability, the value of 100111b is reported in this field. |
Flags | 1 | 37 | • Bit 0: INTR_REMAP - If Clear, the platform does not support interrupt remapping. If Set, the platform supports interrupt remapping. • Bits 1-7: Reserved. |
Reserved | 10 | 38 | Reserved (0). |
Remapping Structures[] | - | 48 | A list of structures. The list will contain one or more DMA Remapping Hardware Unit Definition (DRHD) structures, and zero or more Reserved Memory Region Reporting (RMRR) and Root Port ATS Capability Reporting (ATSR) structures. These structures are described below. |
3.2 Remapping Structure Types
每个Remapping Structure的开始部分包含type和length两个字段。其中,type表示DMA-remapping structure的类型,而length表示该structure的长度。下表定义了type的可能值:Value | Description |
0 | DMA Remapping Hardware Unit Definition (DRHD) Structure |
1 | Reserved Memory Region Reporting (RMRR) Structure |
2 | Root Port ATS Capability Reporting (ATSR) Structure |
>2 | Reserved for future use. For forward compatibility, software skips structures it does not comprehend by skipping the appropriate number of bytes indicated by the Length field. |
3.3 DMA Remapping Hardware Unit Definition Structure
A DMA-remapping hardware unit definition (DRHD) structure uniquely represents a remapping hardware unit present in the platform. There must be at least one instance of this structure for each PCI segment in the platform.Field | Byte Length | Byte Offset | Description |
Type | 2 | 0 | 0 - DMA Remapping Hardware Unit Definition (DRHD) structure |
Length | 2 | 2 | Varies (16 + size of Device Scope Structure) |
Flags | 1 | 4 | Bit 0: INCLUDE_PCI_ALL l If Set, this remapping hardware unit has under its scope all PCI compatible devices in the specified Segment, except devices reported under the scope of other remapping hardware units for the same Segment. If a DRHD structure with INCLUDE_PCI_ALL flag Set is reported for a Segment, it must be enumerated by BIOS after all other DRHD structures for the same Segment. A DRHD structure with INCLUDE_PCI_ALL flag Set may use the ‘Device Scope’ field to enumerate I/OxAPIC and HPET devices under its scope. l If Clear, this remapping hardware unit has under its scope only devices in the specified Segment that are explicitly identified through the ‘Device Scope’ field. Bits 1-7: Reserved. |
Reserved | 1 | 5 | Reserved (0). |
Segment Number | 2 | 6 | The PCI Segment associated with this unit. |
Register Base Address | 8 | 8 | Base address of remapping hardware register-set for this unit. |
Device Scope [] | - | 16 | The Device Scope structure contains one or more Device Scope Entries that identify devices in the specified segment and under the scope of this remapping hardware unit. |
3.3.1 Device Scope Structure
The Device Scope Structure is made up of one or more Device Scope Entries. Each Device Scope Entry may be used to indicate a PCI endpoint device, a PCI sub-hierarchy, or devices such as I/OxAPICs or HPET (High Precision Event Timer). In this section, the generic term ‘PCI’ is used to describe conventional PCI, PCI-X, and PCI-Express devices. Similarly, the term ‘PCI-PCI bridge’ is used to refer to conventional PCI bridges, PCI-X bridges, PCI Express root ports, or downstream ports of a PCI Express switch. A PCI sub-hierarchy is defined as the collection of PCI controllers that are downstream to a specific PCI-PCI bridge. To identify a PCI sub-hierarchy, the Device Scope Entry needs to identify only the parent PCI-PCI bridge of the sub-hierarchy.Field | Byte Length | Byte Offset | Description |
Type | 1 | 0 | The following values are defined for this field. • 0x01: PCI Endpoint Device - The device identified by the ‘Path’ field is a PCI endpoint device. This type must not be used in Device Scope of DRHD structures with INCLUDE_PCI_ALL flag Set. • 0x02: PCI Sub-hierarchy - The device identified by the ‘Path’ field is a PCI-PCI bridge. In this case, the specified bridge device and all its downstream devices are included in the scope. This type must not be in Device Scope of DRHD structures with INCLUDE_PCI_ALL flag Set. • 0x03: IOAPIC - The device identified by the ‘Path’ field is an I/O APIC (or I/O SAPIC) device, enumerated through the ACPI MADT I/O APIC (or I/O SAPIC) structure. • 0x04: MSI_CAPABLE_HPET1 - The device identified by the ‘Path’ field is an HPET device capable of generating MSI (Message Signaled interrupts). HPET hardware is reported through ACPI HPET structure. Other values for this field are reserved for future use. |
Length | 1 | 1 | Length of this Entry in Bytes. (6 + X), where X is the size in bytes of the “Path” field. |
Reserved | 2 | 2 | Reserved (0). |
Enumeration ID | 1 | 4 | When the ‘Type’ field indicates ‘IOAPIC’, this field provides the I/O APICID as provided in the I/O APIC (or I/O SAPIC) structure in the ACPI MADT (Multiple APIC Descriptor Table). This field is treated reserved (0) for all other ‘Type’ fields. |
Start Bus Number | 1 | 5 | This field describes the bus number (bus number of the first PCI Bus produced by the PCI Host Bridge) under which the device identified by this Device Scope resides. |
Path | 2 * N | 6 | Describes the hierarchical path from the Host Bridge to the device specified by the Device Scope Entry. For example, a device in a N-deep hierarchy is identified by N {PCI Device Number, PCI Function Number} pairs, where N is a positive integer. Even offsets contain the Device numbers, and odd offsets contain the Function numbers. The first {Device, Function} pair resides on the bus identified by the ‘Start Bus Number’ field. Each subsequent pair resides on the bus directly behind the bus of the device identified by the previous pair. The identity (Bus, Device, Function) of the target device is obtained by recursively walking down these N {Device, Function} pairs. If the ‘Path’ field length is 2 bytes (N=1), the Device Scope Entry identifies a ‘Root-Complex Integrated Device’. The requester-id of ‘Root-Complex Integrated Devices’ are static and not impacted by system software bus rebalancing actions. If the ‘Path’ field length is more than 2 bytes (N > 1), the Device Scope Entry identifies a device behind one or more system software visible PCI-PCI bridges. Bus rebalancing actions by system software modifying bus assignments of the device’s parent bridge impacts the bus number portion of device’s requester-id. |
3.4 Reserved Memory Region Reporting Structure
BIOS may report each such reserved memory region through the RMRR structures, along with the devices that requires access to the specified reserved memory region. Reserved memory ranges that are either not DMA targets, or memory ranges that may be target of BIOS initiated DMA only during pre-boot phase (such as from a boot disk drive) must not be included in the reserved memory region reporting. The base address of each RMRR region must be 4KB aligned and the size must be an integer multiple of 4KB. BIOS must report the RMRR reported memory addresses as reserved in the system memory map returned through methods such as INT15, EFI GetMemoryMap etc. The reserved memory region reporting structures are optional. If there are no RMRR structures, the system software concludes that the platform does not have any reserved memory ranges that are DMA targets.The RMRR regions are expected to be used only for USB and UMA Graphics legacy usages for reserved memory. Platform designers must avoid or limit reserved memory regions since these require system software to create holes in the DMA virtual address range available to system software and its drivers.
Field | Byte Length | Byte Offset | Description |
Type | 2 | 0 | 1 - Reserved Memory Region Reporting Structure |
Length | 2 | 2 | Varies (24 + size of Device Scope structure) |
Reserved | 2 | 4 | Reserved. |
Segment Number | 2 | 6 | PCI Segment Number associated with devices identified through the Device Scope field. |
Reserved Memory Region Base Address | 8 | 8 | Base address of 4KB-aligned reserved memory region. |
Reserved Memory Region Limit Address | 8 | 16 | Last address of the reserved memory region. The reserved memory region size (Limit - Base + 1) must be an integer multiple of 4KB. |
Device Scope[] | - | 24 | The Device Scope structure contains one or more Device Scope entries that identify devices requiring access to the specified reserved memory region. The devices identified in this structure must be devices under the scope of one of the remapping hardware units reported in DRHD. |
3.5 Root Port ATS Capability Reporting Structure
This structure is applicable only for platforms supporting Device-IOTLBs as reported through the Extended Capability register. For each PCI Segment in the platform that supports Device-IOTLBs, BIOS provides an ATSR structure. The ATSR structures identifies PCI Express Root-Ports supporting Address Translation Services (ATS) transactions. Software must enable ATS on endpoint devices behind a Root Port only if the Root Port is reported as supporting ATS transactions.Field | Byte Length | Byte Offset | Description |
Type | 2 | 0 | 2 - Root Port ATS Capability Reporting Structure |
Length | 2 | 2 | Varies (8 + size of Device Scope Structure) |
Flags | 1 | 4 | • Bit 0: ALL_PORTS: If Set, indicates all PCI Express Root Ports in the specified PCI Segment supports ATS transactions. If Clear, indicates ATS transactions are supported only on Root Ports identified through the Device Scope field. • Bits 1-7: Reserved. |
Reserved | 1 | 5 | Reserved (0). |
Segment Number | 2 | 6 | The PCI Segment associated with this ATSR structure. |
Device Scope [] | - | 8 | If the ALL_PORTS flag is Set, the Device Scope structure is omitted. If ALL_PORTS flag is Clear, the Device Scope structure contains Device Scope Entries that identifies Root Ports supporting ATS transactions. All Device Scope Entries in this structure must have a Device Scope Entry Type of 02h. |
相关文章推荐
- Lvs + Ngnix + Haproxy + Keepalived + Tomcat 实现三种HA软负载均衡和Tomcat Session共享 分类: 系统架构 Linux 2015-06-09 21:50 168人阅读 评论(0) 收藏
- I2C的基本原理和linux中I2C架构的实现
- 基于Linux伙伴算法和DirecfFB架构的帧缓冲驱动层内存管理的一个实现
- Intel IOMMU on Linux kernel
- Intel芯片架构中TEE的实现技术之SGX开发环境简介及搭建
- linux下端口扫描的实现(TCP connect、TCP SYN、TCP FIN、UDP四种方式)2整体架构篇
- Linux操作系统基础 Intel32位系统架构总览
- Linux虚拟化: 虚拟 Linux 虚拟化方法、架构和实现概述
- Linux下mail服务器架构之源码实现postfix邮件基本功能 推荐
- Nginx+Lua+Redis整合实现高性能API接口 - 网站服务器 - LinuxTone | 运维专家网论坛 - 最棒的Linux运维与开源架构技术交流社区! - Powered by Discuz!
- xenomai-GNU/Linux上的RTOS模拟架构实现
- 在Linux和Xen中实现IOMMU
- redis的shell安装脚本,实现在linux下本机主从架构 推荐
- lenovo v480c(无线网卡型号:Intel 2230 BGN) backtrack linux下实现无线上网
- Intel芯片架构中TEE的实现技术之SGX初探
- Linux虚拟化: 虚拟 Linux 虚拟化方法、架构和实现概述
- Linux异步回调架构的实现
- I2C的基本原理和linux中I2C架构的实现 .
- linux下dhcp服务器的架构与实现
- linux运维进阶-基于RHCS+iSCSI+CLVM实现Web服务的共享存储集群架构