1Generic Thermal Sysfs driver How To
2===================================
3
4Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com>
5
6Updated: 2 January 2008
7
8Copyright (c)  2008 Intel Corporation
9
10
110. Introduction
12
13The generic thermal sysfs provides a set of interfaces for thermal zone
14devices (sensors) and thermal cooling devices (fan, processor...) to register
15with the thermal management solution and to be a part of it.
16
17This how-to focuses on enabling new thermal zone and cooling devices to
18participate in thermal management.
19This solution is platform independent and any type of thermal zone devices
20and cooling devices should be able to make use of the infrastructure.
21
22The main task of the thermal sysfs driver is to expose thermal zone attributes
23as well as cooling device attributes to the user space.
24An intelligent thermal management application can make decisions based on
25inputs from thermal zone attributes (the current temperature and trip point
26temperature) and throttle appropriate devices.
27
28[0-*]	denotes any positive number starting from 0
29[1-*]	denotes any positive number starting from 1
30
311. thermal sysfs driver interface functions
32
331.1 thermal zone device interface
341.1.1 struct thermal_zone_device *thermal_zone_device_register(char *type,
35		int trips, int mask, void *devdata,
36		struct thermal_zone_device_ops *ops,
37		const struct thermal_zone_params *tzp,
38		int passive_delay, int polling_delay))
39
40    This interface function adds a new thermal zone device (sensor) to
41    /sys/class/thermal folder as thermal_zone[0-*]. It tries to bind all the
42    thermal cooling devices registered at the same time.
43
44    type: the thermal zone type.
45    trips: the total number of trip points this thermal zone supports.
46    mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable.
47    devdata: device private data
48    ops: thermal zone device call-backs.
49	.bind: bind the thermal zone device with a thermal cooling device.
50	.unbind: unbind the thermal zone device with a thermal cooling device.
51	.get_temp: get the current temperature of the thermal zone.
52	.get_mode: get the current mode (enabled/disabled) of the thermal zone.
53	    - "enabled" means the kernel thermal management is enabled.
54	    - "disabled" will prevent kernel thermal driver action upon trip points
55	      so that user applications can take charge of thermal management.
56	.set_mode: set the mode (enabled/disabled) of the thermal zone.
57	.get_trip_type: get the type of certain trip point.
58	.get_trip_temp: get the temperature above which the certain trip point
59			will be fired.
60	.set_emul_temp: set the emulation temperature which helps in debugging
61			different threshold temperature points.
62    tzp: thermal zone platform parameters.
63    passive_delay: number of milliseconds to wait between polls when
64	performing passive cooling.
65    polling_delay: number of milliseconds to wait between polls when checking
66	whether trip points have been crossed (0 for interrupt driven systems).
67
68
691.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz)
70
71    This interface function removes the thermal zone device.
72    It deletes the corresponding entry form /sys/class/thermal folder and
73    unbind all the thermal cooling devices it uses.
74
751.2 thermal cooling device interface
761.2.1 struct thermal_cooling_device *thermal_cooling_device_register(char *name,
77		void *devdata, struct thermal_cooling_device_ops *)
78
79    This interface function adds a new thermal cooling device (fan/processor/...)
80    to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
81    to all the thermal zone devices register at the same time.
82    name: the cooling device name.
83    devdata: device private data.
84    ops: thermal cooling devices call-backs.
85	.get_max_state: get the Maximum throttle state of the cooling device.
86	.get_cur_state: get the Current throttle state of the cooling device.
87	.set_cur_state: set the Current throttle state of the cooling device.
88
891.2.2 void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
90
91    This interface function remove the thermal cooling device.
92    It deletes the corresponding entry form /sys/class/thermal folder and
93    unbind itself from all the thermal zone devices using it.
94
951.3 interface for binding a thermal zone device with a thermal cooling device
961.3.1 int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
97	int trip, struct thermal_cooling_device *cdev,
98	unsigned long upper, unsigned long lower, unsigned int weight);
99
100    This interface function bind a thermal cooling device to the certain trip
101    point of a thermal zone device.
102    This function is usually called in the thermal zone device .bind callback.
103    tz: the thermal zone device
104    cdev: thermal cooling device
105    trip: indicates which trip point the cooling devices is associated with
106	  in this thermal zone.
107    upper:the Maximum cooling state for this trip point.
108          THERMAL_NO_LIMIT means no upper limit,
109	  and the cooling device can be in max_state.
110    lower:the Minimum cooling state can be used for this trip point.
111          THERMAL_NO_LIMIT means no lower limit,
112	  and the cooling device can be in cooling state 0.
113    weight: the influence of this cooling device in this thermal
114            zone.  See 1.4.1 below for more information.
115
1161.3.2 int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
117		int trip, struct thermal_cooling_device *cdev);
118
119    This interface function unbind a thermal cooling device from the certain
120    trip point of a thermal zone device. This function is usually called in
121    the thermal zone device .unbind callback.
122    tz: the thermal zone device
123    cdev: thermal cooling device
124    trip: indicates which trip point the cooling devices is associated with
125	  in this thermal zone.
126
1271.4 Thermal Zone Parameters
1281.4.1 struct thermal_bind_params
129    This structure defines the following parameters that are used to bind
130    a zone with a cooling device for a particular trip point.
131    .cdev: The cooling device pointer
132    .weight: The 'influence' of a particular cooling device on this
133             zone. This is relative to the rest of the cooling
134             devices. For example, if all cooling devices have a
135             weight of 1, then they all contribute the same. You can
136             use percentages if you want, but it's not mandatory. A
137             weight of 0 means that this cooling device doesn't
138             contribute to the cooling of this zone unless all cooling
139             devices have a weight of 0. If all weights are 0, then
140             they all contribute the same.
141    .trip_mask:This is a bit mask that gives the binding relation between
142               this thermal zone and cdev, for a particular trip point.
143               If nth bit is set, then the cdev and thermal zone are bound
144               for trip point n.
145    .limits: This is an array of cooling state limits. Must have exactly
146         2 * thermal_zone.number_of_trip_points. It is an array consisting
147         of tuples <lower-state upper-state> of state limits. Each trip
148         will be associated with one state limit tuple when binding.
149         A NULL pointer means <THERMAL_NO_LIMITS THERMAL_NO_LIMITS>
150         on all trips. These limits are used when binding a cdev to a
151         trip point.
152    .match: This call back returns success(0) if the 'tz and cdev' need to
153	    be bound, as per platform data.
1541.4.2 struct thermal_zone_params
155    This structure defines the platform level parameters for a thermal zone.
156    This data, for each thermal zone should come from the platform layer.
157    This is an optional feature where some platforms can choose not to
158    provide this data.
159    .governor_name: Name of the thermal governor used for this zone
160    .no_hwmon: a boolean to indicate if the thermal to hwmon sysfs interface
161               is required. when no_hwmon == false, a hwmon sysfs interface
162               will be created. when no_hwmon == true, nothing will be done.
163               In case the thermal_zone_params is NULL, the hwmon interface
164               will be created (for backward compatibility).
165    .num_tbps: Number of thermal_bind_params entries for this zone
166    .tbp: thermal_bind_params entries
167
1682. sysfs attributes structure
169
170RO	read only value
171RW	read/write value
172
173Thermal sysfs attributes will be represented under /sys/class/thermal.
174Hwmon sysfs I/F extension is also available under /sys/class/hwmon
175if hwmon is compiled in or built as a module.
176
177Thermal zone device sys I/F, created once it's registered:
178/sys/class/thermal/thermal_zone[0-*]:
179    |---type:			Type of the thermal zone
180    |---temp:			Current temperature
181    |---mode:			Working mode of the thermal zone
182    |---policy:			Thermal governor used for this zone
183    |---available_policies:	Available thermal governors for this zone
184    |---trip_point_[0-*]_temp:	Trip point temperature
185    |---trip_point_[0-*]_type:	Trip point type
186    |---trip_point_[0-*]_hyst:	Hysteresis value for this trip point
187    |---emul_temp:		Emulated temperature set node
188    |---sustainable_power:      Sustainable dissipatable power
189    |---k_po:                   Proportional term during temperature overshoot
190    |---k_pu:                   Proportional term during temperature undershoot
191    |---k_i:                    PID's integral term in the power allocator gov
192    |---k_d:                    PID's derivative term in the power allocator
193    |---integral_cutoff:        Offset above which errors are accumulated
194    |---slope:                  Slope constant applied as linear extrapolation
195    |---offset:                 Offset constant applied as linear extrapolation
196
197Thermal cooling device sys I/F, created once it's registered:
198/sys/class/thermal/cooling_device[0-*]:
199    |---type:			Type of the cooling device(processor/fan/...)
200    |---max_state:		Maximum cooling state of the cooling device
201    |---cur_state:		Current cooling state of the cooling device
202
203
204Then next two dynamic attributes are created/removed in pairs. They represent
205the relationship between a thermal zone and its associated cooling device.
206They are created/removed for each successful execution of
207thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
208
209/sys/class/thermal/thermal_zone[0-*]:
210    |---cdev[0-*]:		[0-*]th cooling device in current thermal zone
211    |---cdev[0-*]_trip_point:	Trip point that cdev[0-*] is associated with
212    |---cdev[0-*]_weight:       Influence of the cooling device in
213                                this thermal zone
214
215Besides the thermal zone device sysfs I/F and cooling device sysfs I/F,
216the generic thermal driver also creates a hwmon sysfs I/F for each _type_
217of thermal zone device. E.g. the generic thermal driver registers one hwmon
218class device and build the associated hwmon sysfs I/F for all the registered
219ACPI thermal zones.
220
221/sys/class/hwmon/hwmon[0-*]:
222    |---name:			The type of the thermal zone devices
223    |---temp[1-*]_input:	The current temperature of thermal zone [1-*]
224    |---temp[1-*]_critical:	The critical trip point of thermal zone [1-*]
225
226Please read Documentation/hwmon/sysfs-interface for additional information.
227
228***************************
229* Thermal zone attributes *
230***************************
231
232type
233	Strings which represent the thermal zone type.
234	This is given by thermal zone driver as part of registration.
235	E.g: "acpitz" indicates it's an ACPI thermal device.
236	In order to keep it consistent with hwmon sys attribute; this should
237	be a short, lowercase string, not containing spaces nor dashes.
238	RO, Required
239
240temp
241	Current temperature as reported by thermal zone (sensor).
242	Unit: millidegree Celsius
243	RO, Required
244
245mode
246	One of the predefined values in [enabled, disabled].
247	This file gives information about the algorithm that is currently
248	managing the thermal zone. It can be either default kernel based
249	algorithm or user space application.
250	enabled		= enable Kernel Thermal management.
251	disabled	= Preventing kernel thermal zone driver actions upon
252			  trip points so that user application can take full
253			  charge of the thermal management.
254	RW, Optional
255
256policy
257	One of the various thermal governors used for a particular zone.
258	RW, Required
259
260available_policies
261	Available thermal governors which can be used for a particular zone.
262	RO, Required
263
264trip_point_[0-*]_temp
265	The temperature above which trip point will be fired.
266	Unit: millidegree Celsius
267	RO, Optional
268
269trip_point_[0-*]_type
270	Strings which indicate the type of the trip point.
271	E.g. it can be one of critical, hot, passive, active[0-*] for ACPI
272	thermal zone.
273	RO, Optional
274
275trip_point_[0-*]_hyst
276	The hysteresis value for a trip point, represented as an integer
277	Unit: Celsius
278	RW, Optional
279
280cdev[0-*]
281	Sysfs link to the thermal cooling device node where the sys I/F
282	for cooling device throttling control represents.
283	RO, Optional
284
285cdev[0-*]_trip_point
286	The trip point with which cdev[0-*] is associated in this thermal
287	zone; -1 means the cooling device is not associated with any trip
288	point.
289	RO, Optional
290
291cdev[0-*]_weight
292        The influence of cdev[0-*] in this thermal zone. This value
293        is relative to the rest of cooling devices in the thermal
294        zone. For example, if a cooling device has a weight double
295        than that of other, it's twice as effective in cooling the
296        thermal zone.
297        RW, Optional
298
299passive
300	Attribute is only present for zones in which the passive cooling
301	policy is not supported by native thermal driver. Default is zero
302	and can be set to a temperature (in millidegrees) to enable a
303	passive trip point for the zone. Activation is done by polling with
304	an interval of 1 second.
305	Unit: millidegrees Celsius
306	Valid values: 0 (disabled) or greater than 1000
307	RW, Optional
308
309emul_temp
310	Interface to set the emulated temperature method in thermal zone
311	(sensor). After setting this temperature, the thermal zone may pass
312	this temperature to platform emulation function if registered or
313	cache it locally. This is useful in debugging different temperature
314	threshold and its associated cooling action. This is write only node
315	and writing 0 on this node should disable emulation.
316	Unit: millidegree Celsius
317	WO, Optional
318
319	  WARNING: Be careful while enabling this option on production systems,
320	  because userland can easily disable the thermal policy by simply
321	  flooding this sysfs node with low temperature values.
322
323sustainable_power
324	An estimate of the sustained power that can be dissipated by
325	the thermal zone. Used by the power allocator governor. For
326	more information see Documentation/thermal/power_allocator.txt
327	Unit: milliwatts
328	RW, Optional
329
330k_po
331	The proportional term of the power allocator governor's PID
332	controller during temperature overshoot. Temperature overshoot
333	is when the current temperature is above the "desired
334	temperature" trip point. For more information see
335	Documentation/thermal/power_allocator.txt
336	RW, Optional
337
338k_pu
339	The proportional term of the power allocator governor's PID
340	controller during temperature undershoot. Temperature undershoot
341	is when the current temperature is below the "desired
342	temperature" trip point. For more information see
343	Documentation/thermal/power_allocator.txt
344	RW, Optional
345
346k_i
347	The integral term of the power allocator governor's PID
348	controller. This term allows the PID controller to compensate
349	for long term drift. For more information see
350	Documentation/thermal/power_allocator.txt
351	RW, Optional
352
353k_d
354	The derivative term of the power allocator governor's PID
355	controller. For more information see
356	Documentation/thermal/power_allocator.txt
357	RW, Optional
358
359integral_cutoff
360	Temperature offset from the desired temperature trip point
361	above which the integral term of the power allocator
362	governor's PID controller starts accumulating errors. For
363	example, if integral_cutoff is 0, then the integral term only
364	accumulates error when temperature is above the desired
365	temperature trip point. For more information see
366	Documentation/thermal/power_allocator.txt
367	RW, Optional
368
369slope
370	The slope constant used in a linear extrapolation model
371	to determine a hotspot temperature based off the sensor's
372	raw readings. It is up to the device driver to determine
373	the usage of these values.
374	RW, Optional
375
376offset
377	The offset constant used in a linear extrapolation model
378	to determine a hotspot temperature based off the sensor's
379	raw readings. It is up to the device driver to determine
380	the usage of these values.
381	RW, Optional
382
383*****************************
384* Cooling device attributes *
385*****************************
386
387type
388	String which represents the type of device, e.g:
389	- for generic ACPI: should be "Fan", "Processor" or "LCD"
390	- for memory controller device on intel_menlow platform:
391	  should be "Memory controller".
392	RO, Required
393
394max_state
395	The maximum permissible cooling state of this cooling device.
396	RO, Required
397
398cur_state
399	The current cooling state of this cooling device.
400	The value can any integer numbers between 0 and max_state:
401	- cur_state == 0 means no cooling
402	- cur_state == max_state means the maximum cooling.
403	RW, Required
404
4053. A simple implementation
406
407ACPI thermal zone may support multiple trip points like critical, hot,
408passive, active. If an ACPI thermal zone supports critical, passive,
409active[0] and active[1] at the same time, it may register itself as a
410thermal_zone_device (thermal_zone1) with 4 trip points in all.
411It has one processor and one fan, which are both registered as
412thermal_cooling_device. Both are considered to have the same
413effectiveness in cooling the thermal zone.
414
415If the processor is listed in _PSL method, and the fan is listed in _AL0
416method, the sys I/F structure will be built like this:
417
418/sys/class/thermal:
419
420|thermal_zone1:
421    |---type:			acpitz
422    |---temp:			37000
423    |---mode:			enabled
424    |---policy:			step_wise
425    |---available_policies:	step_wise fair_share
426    |---trip_point_0_temp:	100000
427    |---trip_point_0_type:	critical
428    |---trip_point_1_temp:	80000
429    |---trip_point_1_type:	passive
430    |---trip_point_2_temp:	70000
431    |---trip_point_2_type:	active0
432    |---trip_point_3_temp:	60000
433    |---trip_point_3_type:	active1
434    |---cdev0:			--->/sys/class/thermal/cooling_device0
435    |---cdev0_trip_point:	1	/* cdev0 can be used for passive */
436    |---cdev0_weight:           1024
437    |---cdev1:			--->/sys/class/thermal/cooling_device3
438    |---cdev1_trip_point:	2	/* cdev1 can be used for active[0]*/
439    |---cdev1_weight:           1024
440
441|cooling_device0:
442    |---type:			Processor
443    |---max_state:		8
444    |---cur_state:		0
445
446|cooling_device3:
447    |---type:			Fan
448    |---max_state:		2
449    |---cur_state:		0
450
451/sys/class/hwmon:
452
453|hwmon0:
454    |---name:			acpitz
455    |---temp1_input:		37000
456    |---temp1_crit:		100000
457
4584. Event Notification
459
460The framework includes a simple notification mechanism, in the form of a
461netlink event. Netlink socket initialization is done during the _init_
462of the framework. Drivers which intend to use the notification mechanism
463just need to call thermal_generate_netlink_event() with two arguments viz
464(originator, event). The originator is a pointer to struct thermal_zone_device
465from where the event has been originated. An integer which represents the
466thermal zone device will be used in the message to identify the zone. The
467event will be one of:{THERMAL_AUX0, THERMAL_AUX1, THERMAL_CRITICAL,
468THERMAL_DEV_FAULT}. Notification can be sent when the current temperature
469crosses any of the configured thresholds.
470
4715. Export Symbol APIs:
472
4735.1: get_tz_trend:
474This function returns the trend of a thermal zone, i.e the rate of change
475of temperature of the thermal zone. Ideally, the thermal sensor drivers
476are supposed to implement the callback. If they don't, the thermal
477framework calculated the trend by comparing the previous and the current
478temperature values.
479
4805.2:get_thermal_instance:
481This function returns the thermal_instance corresponding to a given
482{thermal_zone, cooling_device, trip_point} combination. Returns NULL
483if such an instance does not exist.
484
4855.3:thermal_notify_framework:
486This function handles the trip events from sensor drivers. It starts
487throttling the cooling devices according to the policy configured.
488For CRITICAL and HOT trip points, this notifies the respective drivers,
489and does actual throttling for other trip points i.e ACTIVE and PASSIVE.
490The throttling policy is based on the configured platform data; if no
491platform data is provided, this uses the step_wise throttling policy.
492
4935.4:thermal_cdev_update:
494This function serves as an arbitrator to set the state of a cooling
495device. It sets the cooling device to the deepest cooling state if
496possible.
497