1Linux and the Device Tree 2------------------------- 3The Linux usage model for device tree data 4 5Author: Grant Likely <grant.likely@secretlab.ca> 6 7This article describes how Linux uses the device tree. An overview of 8the device tree data format can be found on the device tree usage page 9at devicetree.org[1]. 10 11[1] http://devicetree.org/Device_Tree_Usage 12 13The "Open Firmware Device Tree", or simply Device Tree (DT), is a data 14structure and language for describing hardware. More specifically, it 15is a description of hardware that is readable by an operating system 16so that the operating system doesn't need to hard code details of the 17machine. 18 19Structurally, the DT is a tree, or acyclic graph with named nodes, and 20nodes may have an arbitrary number of named properties encapsulating 21arbitrary data. A mechanism also exists to create arbitrary 22links from one node to another outside of the natural tree structure. 23 24Conceptually, a common set of usage conventions, called 'bindings', 25is defined for how data should appear in the tree to describe typical 26hardware characteristics including data busses, interrupt lines, GPIO 27connections, and peripheral devices. 28 29As much as possible, hardware is described using existing bindings to 30maximize use of existing support code, but since property and node 31names are simply text strings, it is easy to extend existing bindings 32or create new ones by defining new nodes and properties. Be wary, 33however, of creating a new binding without first doing some homework 34about what already exists. There are currently two different, 35incompatible, bindings for i2c busses that came about because the new 36binding was created without first investigating how i2c devices were 37already being enumerated in existing systems. 38 391. History 40---------- 41The DT was originally created by Open Firmware as part of the 42communication method for passing data from Open Firmware to a client 43program (like to an operating system). An operating system used the 44Device Tree to discover the topology of the hardware at runtime, and 45thereby support a majority of available hardware without hard coded 46information (assuming drivers were available for all devices). 47 48Since Open Firmware is commonly used on PowerPC and SPARC platforms, 49the Linux support for those architectures has for a long time used the 50Device Tree. 51 52In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit 53and 64-bit support, the decision was made to require DT support on all 54powerpc platforms, regardless of whether or not they used Open 55Firmware. To do this, a DT representation called the Flattened Device 56Tree (FDT) was created which could be passed to the kernel as a binary 57blob without requiring a real Open Firmware implementation. U-Boot, 58kexec, and other bootloaders were modified to support both passing a 59Device Tree Binary (dtb) and to modify a dtb at boot time. DT was 60also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that 61a dtb could be wrapped up with the kernel image to support booting 62existing non-DT aware firmware. 63 64Some time later, FDT infrastructure was generalized to be usable by 65all architectures. At the time of this writing, 6 mainlined 66architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 67out of mainline (nios) have some level of DT support. 68 692. Data Model 70------------- 71If you haven't already read the Device Tree Usage[1] page, 72then go read it now. It's okay, I'll wait.... 73 742.1 High Level View 75------------------- 76The most important thing to understand is that the DT is simply a data 77structure that describes the hardware. There is nothing magical about 78it, and it doesn't magically make all hardware configuration problems 79go away. What it does do is provide a language for decoupling the 80hardware configuration from the board and device driver support in the 81Linux kernel (or any other operating system for that matter). Using 82it allows board and device support to become data driven; to make 83setup decisions based on data passed into the kernel instead of on 84per-machine hard coded selections. 85 86Ideally, data driven platform setup should result in less code 87duplication and make it easier to support a wide range of hardware 88with a single kernel image. 89 90Linux uses DT data for three major purposes: 911) platform identification, 922) runtime configuration, and 933) device population. 94 952.2 Platform Identification 96--------------------------- 97First and foremost, the kernel will use data in the DT to identify the 98specific machine. In a perfect world, the specific platform shouldn't 99matter to the kernel because all platform details would be described 100perfectly by the device tree in a consistent and reliable manner. 101Hardware is not perfect though, and so the kernel must identify the 102machine during early boot so that it has the opportunity to run 103machine-specific fixups. 104 105In the majority of cases, the machine identity is irrelevant, and the 106kernel will instead select setup code based on the machine's core 107CPU or SoC. On ARM for example, setup_arch() in 108arch/arm/kernel/setup.c will call setup_machine_fdt() in 109arch/arm/kernel/devtree.c which searches through the machine_desc 110table and selects the machine_desc which best matches the device tree 111data. It determines the best match by looking at the 'compatible' 112property in the root device tree node, and comparing it with the 113dt_compat list in struct machine_desc (which is defined in 114arch/arm/include/asm/mach/arch.h if you're curious). 115 116The 'compatible' property contains a sorted list of strings starting 117with the exact name of the machine, followed by an optional list of 118boards it is compatible with sorted from most compatible to least. For 119example, the root compatible properties for the TI BeagleBoard and its 120successor, the BeagleBoard xM board might look like, respectively: 121 122 compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; 123 compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3"; 124 125Where "ti,omap3-beagleboard-xm" specifies the exact model, it also 126claims that it compatible with the OMAP 3450 SoC, and the omap3 family 127of SoCs in general. You'll notice that the list is sorted from most 128specific (exact board) to least specific (SoC family). 129 130Astute readers might point out that the Beagle xM could also claim 131compatibility with the original Beagle board. However, one should be 132cautioned about doing so at the board level since there is typically a 133high level of change from one board to another, even within the same 134product line, and it is hard to nail down exactly what is meant when one 135board claims to be compatible with another. For the top level, it is 136better to err on the side of caution and not claim one board is 137compatible with another. The notable exception would be when one 138board is a carrier for another, such as a CPU module attached to a 139carrier board. 140 141One more note on compatible values. Any string used in a compatible 142property must be documented as to what it indicates. Add 143documentation for compatible strings in Documentation/devicetree/bindings. 144 145Again on ARM, for each machine_desc, the kernel looks to see if 146any of the dt_compat list entries appear in the compatible property. 147If one does, then that machine_desc is a candidate for driving the 148machine. After searching the entire table of machine_descs, 149setup_machine_fdt() returns the 'most compatible' machine_desc based 150on which entry in the compatible property each machine_desc matches 151against. If no matching machine_desc is found, then it returns NULL. 152 153The reasoning behind this scheme is the observation that in the majority 154of cases, a single machine_desc can support a large number of boards 155if they all use the same SoC, or same family of SoCs. However, 156invariably there will be some exceptions where a specific board will 157require special setup code that is not useful in the generic case. 158Special cases could be handled by explicitly checking for the 159troublesome board(s) in generic setup code, but doing so very quickly 160becomes ugly and/or unmaintainable if it is more than just a couple of 161cases. 162 163Instead, the compatible list allows a generic machine_desc to provide 164support for a wide common set of boards by specifying "less 165compatible" values in the dt_compat list. In the example above, 166generic board support can claim compatibility with "ti,omap3" or 167"ti,omap3450". If a bug was discovered on the original beagleboard 168that required special workaround code during early boot, then a new 169machine_desc could be added which implements the workarounds and only 170matches on "ti,omap3-beagleboard". 171 172PowerPC uses a slightly different scheme where it calls the .probe() 173hook from each machine_desc, and the first one returning TRUE is used. 174However, this approach does not take into account the priority of the 175compatible list, and probably should be avoided for new architecture 176support. 177 1782.3 Runtime configuration 179------------------------- 180In most cases, a DT will be the sole method of communicating data from 181firmware to the kernel, so also gets used to pass in runtime and 182configuration data like the kernel parameters string and the location 183of an initrd image. 184 185Most of this data is contained in the /chosen node, and when booting 186Linux it will look something like this: 187 188 chosen { 189 bootargs = "console=ttyS0,115200 loglevel=8"; 190 initrd-start = <0xc8000000>; 191 initrd-end = <0xc8200000>; 192 }; 193 194The bootargs property contains the kernel arguments, and the initrd-* 195properties define the address and size of an initrd blob. Note that 196initrd-end is the first address after the initrd image, so this doesn't 197match the usual semantic of struct resource. The chosen node may also 198optionally contain an arbitrary number of additional properties for 199platform-specific configuration data. 200 201During early boot, the architecture setup code calls of_scan_flat_dt() 202several times with different helper callbacks to parse device tree 203data before paging is setup. The of_scan_flat_dt() code scans through 204the device tree and uses the helpers to extract information required 205during early boot. Typically the early_init_dt_scan_chosen() helper 206is used to parse the chosen node including kernel parameters, 207early_init_dt_scan_root() to initialize the DT address space model, 208and early_init_dt_scan_memory() to determine the size and 209location of usable RAM. 210 211On ARM, the function setup_machine_fdt() is responsible for early 212scanning of the device tree after selecting the correct machine_desc 213that supports the board. 214 2152.4 Device population 216--------------------- 217After the board has been identified, and after the early configuration data 218has been parsed, then kernel initialization can proceed in the normal 219way. At some point in this process, unflatten_device_tree() is called 220to convert the data into a more efficient runtime representation. 221This is also when machine-specific setup hooks will get called, like 222the machine_desc .init_early(), .init_irq() and .init_machine() hooks 223on ARM. The remainder of this section uses examples from the ARM 224implementation, but all architectures will do pretty much the same 225thing when using a DT. 226 227As can be guessed by the names, .init_early() is used for any machine- 228specific setup that needs to be executed early in the boot process, 229and .init_irq() is used to set up interrupt handling. Using a DT 230doesn't materially change the behaviour of either of these functions. 231If a DT is provided, then both .init_early() and .init_irq() are able 232to call any of the DT query functions (of_* in include/linux/of*.h) to 233get additional data about the platform. 234 235The most interesting hook in the DT context is .init_machine() which 236is primarily responsible for populating the Linux device model with 237data about the platform. Historically this has been implemented on 238embedded platforms by defining a set of static clock structures, 239platform_devices, and other data in the board support .c file, and 240registering it en-masse in .init_machine(). When DT is used, then 241instead of hard coding static devices for each platform, the list of 242devices can be obtained by parsing the DT, and allocating device 243structures dynamically. 244 245The simplest case is when .init_machine() is only responsible for 246registering a block of platform_devices. A platform_device is a concept 247used by Linux for memory or I/O mapped devices which cannot be detected 248by hardware, and for 'composite' or 'virtual' devices (more on those 249later). While there is no 'platform device' terminology for the DT, 250platform devices roughly correspond to device nodes at the root of the 251tree and children of simple memory mapped bus nodes. 252 253About now is a good time to lay out an example. Here is part of the 254device tree for the NVIDIA Tegra board. 255 256/{ 257 compatible = "nvidia,harmony", "nvidia,tegra20"; 258 #address-cells = <1>; 259 #size-cells = <1>; 260 interrupt-parent = <&intc>; 261 262 chosen { }; 263 aliases { }; 264 265 memory { 266 device_type = "memory"; 267 reg = <0x00000000 0x40000000>; 268 }; 269 270 soc { 271 compatible = "nvidia,tegra20-soc", "simple-bus"; 272 #address-cells = <1>; 273 #size-cells = <1>; 274 ranges; 275 276 intc: interrupt-controller@50041000 { 277 compatible = "nvidia,tegra20-gic"; 278 interrupt-controller; 279 #interrupt-cells = <1>; 280 reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >; 281 }; 282 283 serial@70006300 { 284 compatible = "nvidia,tegra20-uart"; 285 reg = <0x70006300 0x100>; 286 interrupts = <122>; 287 }; 288 289 i2s1: i2s@70002800 { 290 compatible = "nvidia,tegra20-i2s"; 291 reg = <0x70002800 0x100>; 292 interrupts = <77>; 293 codec = <&wm8903>; 294 }; 295 296 i2c@7000c000 { 297 compatible = "nvidia,tegra20-i2c"; 298 #address-cells = <1>; 299 #size-cells = <0>; 300 reg = <0x7000c000 0x100>; 301 interrupts = <70>; 302 303 wm8903: codec@1a { 304 compatible = "wlf,wm8903"; 305 reg = <0x1a>; 306 interrupts = <347>; 307 }; 308 }; 309 }; 310 311 sound { 312 compatible = "nvidia,harmony-sound"; 313 i2s-controller = <&i2s1>; 314 i2s-codec = <&wm8903>; 315 }; 316}; 317 318At .init_machine() time, Tegra board support code will need to look at 319this DT and decide which nodes to create platform_devices for. 320However, looking at the tree, it is not immediately obvious what kind 321of device each node represents, or even if a node represents a device 322at all. The /chosen, /aliases, and /memory nodes are informational 323nodes that don't describe devices (although arguably memory could be 324considered a device). The children of the /soc node are memory mapped 325devices, but the codec@1a is an i2c device, and the sound node 326represents not a device, but rather how other devices are connected 327together to create the audio subsystem. I know what each device is 328because I'm familiar with the board design, but how does the kernel 329know what to do with each node? 330 331The trick is that the kernel starts at the root of the tree and looks 332for nodes that have a 'compatible' property. First, it is generally 333assumed that any node with a 'compatible' property represents a device 334of some kind, and second, it can be assumed that any node at the root 335of the tree is either directly attached to the processor bus, or is a 336miscellaneous system device that cannot be described any other way. 337For each of these nodes, Linux allocates and registers a 338platform_device, which in turn may get bound to a platform_driver. 339 340Why is using a platform_device for these nodes a safe assumption? 341Well, for the way that Linux models devices, just about all bus_types 342assume that its devices are children of a bus controller. For 343example, each i2c_client is a child of an i2c_master. Each spi_device 344is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The 345same hierarchy is also found in the DT, where I2C device nodes only 346ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, 347etc. The only devices which do not require a specific type of parent 348device are platform_devices (and amba_devices, but more on that 349later), which will happily live at the base of the Linux /sys/devices 350tree. Therefore, if a DT node is at the root of the tree, then it 351really probably is best registered as a platform_device. 352 353Linux board support code calls of_platform_populate(NULL, NULL, NULL, NULL) 354to kick off discovery of devices at the root of the tree. The 355parameters are all NULL because when starting from the root of the 356tree, there is no need to provide a starting node (the first NULL), a 357parent struct device (the last NULL), and we're not using a match 358table (yet). For a board that only needs to register devices, 359.init_machine() can be completely empty except for the 360of_platform_populate() call. 361 362In the Tegra example, this accounts for the /soc and /sound nodes, but 363what about the children of the SoC node? Shouldn't they be registered 364as platform devices too? For Linux DT support, the generic behaviour 365is for child devices to be registered by the parent's device driver at 366driver .probe() time. So, an i2c bus device driver will register a 367i2c_client for each child node, an SPI bus driver will register 368its spi_device children, and similarly for other bus_types. 369According to that model, a driver could be written that binds to the 370SoC node and simply registers platform_devices for each of its 371children. The board support code would allocate and register an SoC 372device, a (theoretical) SoC device driver could bind to the SoC device, 373and register platform_devices for /soc/interrupt-controller, /soc/serial, 374/soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? 375 376Actually, it turns out that registering children of some 377platform_devices as more platform_devices is a common pattern, and the 378device tree support code reflects that and makes the above example 379simpler. The second argument to of_platform_populate() is an 380of_device_id table, and any node that matches an entry in that table 381will also get its child nodes registered. In the Tegra case, the code 382can look something like this: 383 384static void __init harmony_init_machine(void) 385{ 386 /* ... */ 387 of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); 388} 389 390"simple-bus" is defined in the ePAPR 1.0 specification as a property 391meaning a simple memory mapped bus, so the of_platform_populate() code 392could be written to just assume simple-bus compatible nodes will 393always be traversed. However, we pass it in as an argument so that 394board support code can always override the default behaviour. 395 396[Need to add discussion of adding i2c/spi/etc child devices] 397 398Appendix A: AMBA devices 399------------------------ 400 401ARM Primecells are a certain kind of device attached to the ARM AMBA 402bus which include some support for hardware detection and power 403management. In Linux, struct amba_device and the amba_bus_type is 404used to represent Primecell devices. However, the fiddly bit is that 405not all devices on an AMBA bus are Primecells, and for Linux it is 406typical for both amba_device and platform_device instances to be 407siblings of the same bus segment. 408 409When using the DT, this creates problems for of_platform_populate() 410because it must decide whether to register each node as either a 411platform_device or an amba_device. This unfortunately complicates the 412device creation model a little bit, but the solution turns out not to 413be too invasive. If a node is compatible with "arm,amba-primecell", then 414of_platform_populate() will register it as an amba_device instead of a 415platform_device. 416