1
2BTRFS
3=====
4
5Btrfs is a copy on write filesystem for Linux aimed at
6implementing advanced features while focusing on fault tolerance,
7repair and easy administration. Initially developed by Oracle, Btrfs
8is licensed under the GPL and open for contribution from anyone.
9
10Linux has a wealth of filesystems to choose from, but we are facing a
11number of challenges with scaling to the large storage subsystems that
12are becoming common in today's data centers. Filesystems need to scale
13in their ability to address and manage large storage, and also in
14their ability to detect, repair and tolerate errors in the data stored
15on disk.  Btrfs is under heavy development, and is not suitable for
16any uses other than benchmarking and review. The Btrfs disk format is
17not yet finalized.
18
19The main Btrfs features include:
20
21    * Extent based file storage (2^64 max file size)
22    * Space efficient packing of small files
23    * Space efficient indexed directories
24    * Dynamic inode allocation
25    * Writable snapshots
26    * Subvolumes (separate internal filesystem roots)
27    * Object level mirroring and striping
28    * Checksums on data and metadata (multiple algorithms available)
29    * Compression
30    * Integrated multiple device support, with several raid algorithms
31    * Online filesystem check (not yet implemented)
32    * Very fast offline filesystem check
33    * Efficient incremental backup and FS mirroring (not yet implemented)
34    * Online filesystem defragmentation
35
36
37Mount Options
38=============
39
40When mounting a btrfs filesystem, the following option are accepted.
41Options with (*) are default options and will not show in the mount options.
42
43  alloc_start=<bytes>
44	Debugging option to force all block allocations above a certain
45	byte threshold on each block device.  The value is specified in
46	bytes, optionally with a K, M, or G suffix, case insensitive.
47	Default is 1MB.
48
49  noautodefrag(*)
50  autodefrag
51	Disable/enable auto defragmentation.
52	Auto defragmentation detects small random writes into files and queue
53	them up for the defrag process.  Works best for small files;
54	Not well suited for large database workloads.
55
56  check_int
57  check_int_data
58  check_int_print_mask=<value>
59	These debugging options control the behavior of the integrity checking
60	module (the BTRFS_FS_CHECK_INTEGRITY config option required).
61
62	check_int enables the integrity checker module, which examines all
63	block write requests to ensure on-disk consistency, at a large
64	memory and CPU cost.  
65
66	check_int_data includes extent data in the integrity checks, and
67	implies the check_int option.
68
69	check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
70	as defined in fs/btrfs/check-integrity.c, to control the integrity
71	checker module behavior.
72
73	See comments at the top of fs/btrfs/check-integrity.c for more info.
74
75  commit=<seconds>
76	Set the interval of periodic commit, 30 seconds by default. Higher
77	values defer data being synced to permanent storage with obvious
78	consequences when the system crashes. The upper bound is not forced,
79	but a warning is printed if it's more than 300 seconds (5 minutes).
80
81  compress
82  compress=<type>
83  compress-force
84  compress-force=<type>
85	Control BTRFS file data compression.  Type may be specified as "zlib"
86	"lzo" or "no" (for no compression, used for remounting).  If no type
87	is specified, zlib is used.  If compress-force is specified,
88	all files will be compressed, whether or not they compress well.
89	If compression is enabled, nodatacow and nodatasum are disabled.
90
91  degraded
92	Allow mounts to continue with missing devices.  A read-write mount may
93	fail with too many devices missing, for example if a stripe member
94	is completely missing.
95
96  device=<devicepath>
97	Specify a device during mount so that ioctls on the control device
98	can be avoided.  Especially useful when trying to mount a multi-device
99	setup as root.  May be specified multiple times for multiple devices.
100
101  nodiscard(*)
102  discard
103	Disable/enable discard mount option.
104	Discard issues frequent commands to let the block device reclaim space
105	freed by the filesystem.
106	This is useful for SSD devices, thinly provisioned
107	LUNs and virtual machine images, but may have a significant
108	performance impact.  (The fstrim command is also available to
109	initiate batch trims from userspace).
110
111  noenospc_debug(*)
112  enospc_debug
113	Disable/enable debugging option to be more verbose in some ENOSPC conditions.
114
115  fatal_errors=<action>
116	Action to take when encountering a fatal error: 
117	  "bug" - BUG() on a fatal error.  This is the default.
118	  "panic" - panic() on a fatal error.
119
120  noflushoncommit(*)
121  flushoncommit
122	The 'flushoncommit' mount option forces any data dirtied by a write in a
123	prior transaction to commit as part of the current commit.  This makes
124	the committed state a fully consistent view of the file system from the
125	application's perspective (i.e., it includes all completed file system
126	operations).  This was previously the behavior only when a snapshot is
127	created.
128
129  inode_cache
130	Enable free inode number caching.   Defaults to off due to an overflow
131	problem when the free space crcs don't fit inside a single page.
132
133  max_inline=<bytes>
134	Specify the maximum amount of space, in bytes, that can be inlined in
135	a metadata B-tree leaf.  The value is specified in bytes, optionally 
136	with a K, M, or G suffix, case insensitive.  In practice, this value
137	is limited by the root sector size, with some space unavailable due
138	to leaf headers.  For a 4k sectorsize, max inline data is ~3900 bytes.
139
140  metadata_ratio=<value>
141	Specify that 1 metadata chunk should be allocated after every <value>
142	data chunks.  Off by default.
143
144  acl(*)
145  noacl
146	Enable/disable support for Posix Access Control Lists (ACLs).  See the
147	acl(5) manual page for more information about ACLs.
148
149  barrier(*)
150  nobarrier
151        Enable/disable the use of block layer write barriers.  Write barriers
152	ensure that certain IOs make it through the device cache and are on
153	persistent storage. If disabled on a device with a volatile
154	(non-battery-backed) write-back cache, nobarrier option will lead to
155	filesystem corruption on a system crash or power loss.
156
157  datacow(*)
158  nodatacow
159	Enable/disable data copy-on-write for newly created files.
160	Nodatacow implies nodatasum, and disables all compression.
161
162  datasum(*)
163  nodatasum
164	Enable/disable data checksumming for newly created files.
165	Datasum implies datacow.
166
167  treelog(*)
168  notreelog
169	Enable/disable the tree logging used for fsync and O_SYNC writes.
170
171  recovery
172	Enable autorecovery attempts if a bad tree root is found at mount time.
173	Currently this scans a list of several previous tree roots and tries to 
174	use the first readable.
175
176  rescan_uuid_tree
177	Force check and rebuild procedure of the UUID tree. This should not
178	normally be needed.
179
180  skip_balance
181	Skip automatic resume of interrupted balance operation after mount.
182	May be resumed with "btrfs balance resume."
183
184  space_cache (*)
185	Enable the on-disk freespace cache.
186  nospace_cache
187	Disable freespace cache loading without clearing the cache.
188  clear_cache
189	Force clearing and rebuilding of the disk space cache if something
190	has gone wrong.
191
192  ssd
193  nossd
194  ssd_spread
195	Options to control ssd allocation schemes.  By default, BTRFS will
196	enable or disable ssd allocation heuristics depending on whether a
197	rotational or nonrotational disk is in use.  The ssd and nossd options
198	can override this autodetection.
199
200	The ssd_spread mount option attempts to allocate into big chunks
201	of unused space, and may perform better on low-end ssds.  ssd_spread
202	implies ssd, enabling all other ssd heuristics as well.
203
204  subvol=<path>
205	Mount subvolume at <path> rather than the root subvolume.  <path> is
206	relative to the top level subvolume.
207
208  subvolid=<ID>
209	Mount subvolume specified by an ID number rather than the root subvolume.
210	This allows mounting of subvolumes which are not in the root of the mounted
211	filesystem.
212	You can use "btrfs subvolume list" to see subvolume ID numbers.
213
214  subvolrootid=<objectid> (deprecated)
215	Mount subvolume specified by <objectid> rather than the root subvolume.
216	This allows mounting of subvolumes which are not in the root of the mounted
217	filesystem.
218	You can use "btrfs subvolume show " to see the object ID for a subvolume.
219	
220  thread_pool=<number>
221	The number of worker threads to allocate.  The default number is equal
222	to the number of CPUs + 2, or 8, whichever is smaller.
223
224  user_subvol_rm_allowed
225	Allow subvolumes to be deleted by a non-root user. Use with caution. 
226
227MAILING LIST
228============
229
230There is a Btrfs mailing list hosted on vger.kernel.org. You can
231find details on how to subscribe here:
232
233http://vger.kernel.org/vger-lists.html#linux-btrfs
234
235Mailing list archives are available from gmane:
236
237http://dir.gmane.org/gmane.comp.file-systems.btrfs
238
239
240
241IRC
242===
243
244Discussion of Btrfs also occurs on the #btrfs channel of the Freenode
245IRC network.
246
247
248
249	UTILITIES
250	=========
251
252Userspace tools for creating and manipulating Btrfs file systems are
253available from the git repository at the following location:
254
255 http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git
256 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
257
258These include the following tools:
259
260* mkfs.btrfs: create a filesystem
261
262* btrfs: a single tool to manage the filesystems, refer to the manpage for more details
263
264* 'btrfsck' or 'btrfs check': do a consistency check of the filesystem
265
266Other tools for specific tasks:
267
268* btrfs-convert: in-place conversion from ext2/3/4 filesystems
269
270* btrfs-image: dump filesystem metadata for debugging
271