Leo's Technical Log

ZFS Core Concepts and a Quick Start Guide

Introduction

ZFS (Zettabyte File System) is a revolutionary storage system originally developed by Sun Microsystems and now maintained by the OpenZFS community. Unlike traditional file systems, ZFS is not just a filesystem—it is a complete storage management solution that integrates volume management and filesystem functionality into a single coherent system.

In this article, we’ll walk through the core concepts behind ZFS and get hands-on by creating our first ZFS storage pool on Linux.

What Is ZFS?

First released in 2005, ZFS was designed with an ambitious goal: to build a filesystem that would never silently corrupt data. To achieve this, ZFS introduced a number of groundbreaking features, including Copy-on-Write (CoW), end-to-end checksumming, snapshots, and clones.

Key Features of ZFS

  • Strong data integrity guarantees Every data block is protected by a checksum, allowing ZFS to detect and even repair silent data corruption.
  • Massive scalability Theoretical maximum storage capacity reaches 256 quadrillion zettabytes.
  • Simplified administration No need for manual partitioning or traditional formatting workflows.
  • Advanced built-in features Snapshots, clones, compression, deduplication, and more.
  • Flexible RAID support Native support for mirrors and RAID-Z configurations.

ZFS Design Philosophy

ZFS stands out largely because of a few fundamental design principles.

1. End-to-End Data Integrity

ZFS computes a checksum for every block of data and stores that checksum in the parent block rather than alongside the data itself. This design allows ZFS to detect corruption anywhere along the data path—whether caused by failing disks, faulty controllers, or firmware bugs.

If redundancy is available, ZFS can automatically repair corrupted data without user intervention.

2. Copy-on-Write (CoW)

ZFS never overwrites existing data. When data is modified, it is written to a new location, and only after the write succeeds are the metadata pointers updated.

This approach provides several important benefits:

  • Transactional semantics that keep the filesystem always consistent
  • Near-zero-cost snapshots
  • Elimination of the classic “write hole” problem

3. Storage Pooling

ZFS abstracts physical storage devices into a storage pool. All filesystems draw space from this shared pool, removing the rigid constraints of traditional partition-based layouts.

Filesystems can grow dynamically as needed—no resizing or re-partitioning required.

4. Simplified Management

ZFS embraces the idea that “everything belongs to the filesystem.” Tasks that traditionally require multiple tools (fdisk, mkfs, lvm, etc.) can all be performed using a unified ZFS command set, significantly reducing operational complexity.


Core Concepts

To work effectively with ZFS, it’s essential to understand a few core abstractions.

Storage Pools

A storage pool is the foundation of ZFS. It consists of one or more virtual devices (vdevs) and represents a shared pool of storage capacity and I/O resources.

Key characteristics:

  • Pools can be expanded by adding new devices
  • Performance depends on the layout and type of vdevs
  • All filesystems in the pool share the same space and I/O bandwidth

Virtual Devices (vdevs)

A vdev is the basic building block of a ZFS pool. Common vdev types include:

  • Single disk – simplest setup, no redundancy

  • Mirror – similar to RAID 1, full data replication

  • RAID-Z – ZFS-native RAID with parity

    • RAID-Z1: single parity, tolerates 1 disk failure
    • RAID-Z2: double parity, tolerates 2 disk failures
    • RAID-Z3: triple parity, tolerates 3 disk failures

Important: Pool redundancy is determined by its vdevs. If any single vdev fails, the entire pool fails. For production systems, every vdev should provide adequate redundancy.

Datasets

In ZFS, dataset is a generic term that includes:

  • Filesystems – mountable directory trees
  • Volumes (zvols) – block devices, commonly used for VM disks
  • Snapshots – read-only point-in-time copies
  • Clones – writable copies created from snapshots

Datasets are hierarchical and can inherit properties from their parent datasets.

Properties

Many ZFS features are controlled via dataset properties, such as:

  • compression – lz4, gzip, zstd, etc.
  • quota – space usage limits
  • reservation – guaranteed space
  • atime – access time updates
  • copies – number of data replicas

Properties can be set at any level and inherited by child datasets.

Snapshots and Clones

Snapshots are one of ZFS’s most powerful features:

  • Created almost instantly
  • Consume no space initially
  • Only store changed data blocks
  • Can be rolled back or sent to another system

Clones are writable datasets created from snapshots:

  • Initially share all data blocks with the snapshot
  • Consume additional space only for modified data
  • Can diverge independently over time

Hands-On: Installing and Using ZFS on Linux

Let’s install ZFS and create our first storage pool. The examples below use Ubuntu/Debian.

Step 1: Install ZFS

sudo apt update
sudo apt install zfsutils-linux

zfs version
zpool version

For CentOS/RHEL:

sudo yum install epel-release
sudo yum install https://zfsonlinux.org/epel/zfs-release-2-2$(rpm --eval "%{dist}").noarch.rpm
sudo yum install kernel-devel zfs
sudo modprobe zfs

Step 2: Prepare Disks

Identify available disks:

lsblk

Warning: Creating a ZFS pool will erase all data on the selected disks.


Step 3: Create Your First Pool

Single-disk pool (not recommended for production):

sudo zpool create mypool /dev/sdb
sudo zpool status mypool

sudo zpool create mypool mirror /dev/sdb /dev/sdc

Step 5: Create a RAID-Z Pool

sudo zpool create mypool raidz /dev/sdb /dev/sdc /dev/sdd

Step 6: Pool Management Commands

sudo zpool list
sudo zpool status -v mypool
sudo zpool iostat mypool 1
sudo zpool get all mypool

Step 7: Create Filesystems

sudo zfs create mypool/data
sudo zfs create mypool/data/projects
zfs list

Set custom mountpoints:

sudo zfs set mountpoint=/mnt/mydata mypool/data

Step 8: Common Property Settings

sudo zfs set compression=lz4 mypool/data
sudo zfs set atime=off mypool/data
sudo zfs set quota=10G mypool/data

Step 9: Working with Snapshots

sudo zfs snapshot mypool/data@backup-2024-11-28
zfs list -t snapshot
sudo zfs rollback mypool/data@backup-2024-11-28
sudo zfs destroy mypool/data@backup-2024-11-28

Step 10: Monitoring and Maintenance

sudo zpool scrub mypool
sudo zpool status mypool
sudo zpool history mypool

Practical Tips

Automated Snapshots

Use cron to create daily snapshots and clean up old ones.

ZFS Send/Receive for Backups

Efficient full and incremental replication between systems using zfs send and zfs receive.

Performance Tuning

Adjust ARC size and monitor cache efficiency via /proc/spl/kstat/zfs/arcstats.


Frequently Asked Questions

Can disks be removed from a ZFS pool? Not directly. Mirrors can be detached, but RAID-Z pools must be rebuilt.

How do I expand a pool? Add new vdevs using zpool add.

Is ZFS fast? Yes—especially with sufficient RAM. A common guideline is 1 GB RAM per TB of storage.

Does ZFS support encryption? Yes. Native encryption is supported at the dataset level.


Conclusion

ZFS is a powerful and reliable storage platform that seamlessly combines volume management and filesystem functionality. While it introduces several new concepts, its core ideas are straightforward: pooled storage, copy-on-write, and end-to-end data integrity.

By following this guide, you’ve learned how to create pools, filesystems, and snapshots, and how to manage a basic ZFS setup. From here, you can explore advanced features such as replication, encryption, and deduplication.

Best practices to remember:

  • Always use redundancy in production
  • Run regular scrubs
  • Make extensive use of snapshots
  • Provision enough memory for optimal performance

Happy hacking with ZFS 🚀