From: Sergey Matveev
Date: 2021-01-11 18:17:07Z
*** David Rabkin [2021-01-11 19:57]:
>Я, Давид Рабкин, израильтянин. К темам, на которые мы общаемся, это не
>имеет никакого отношения. Как и то, что ты россиянин.
Ты единственный знакомый израильтянин с кем общаюсь (ну... из тех кто
мне говорил откуда он), плюс бывший маёвец -- так что помню :-). В любом
случае, это не проблемы людей которые мне пишут, что я не каждого могу
вспомнить -- это уж мне отыскивать, вспоминать или записывать надо.
>Я люблю ZFS и zsh, и мне интересны твои опыт и мнения на этот счет.
>Можно из письма на пять часов сделать значимый пост?
На пост не тянет, ибо мне письмо в итоге не нравится и я не понимаю
почему так много его составлял (объём средний). Да и оно не добавит ни
капли информации к тому что я и так пишу тут в блоге. Просто приложу
выдержку из него тут. Я думал это сразу же в блог засунуть, но решил что
не стоит того.
------------------------ >8 ------------------------
>That was the last time I looked into ZFS ...
>I don't know if that memory problem still exists.
Actually there have never been that kind of "problem". There always was
a myth about it. It is true that ZFS likes caching, likes it very much.
More RAM, more cache space, better performance. Possibly it worked
pretty bad in the middle of 2000s, but it has been *very* heavily
optimized since then. I have got and seen many system with 1GB of RAM
running ZFS on HDDs without any noticeable problems.
I think it is just some kind of people's education problem. I am sure
that *most* people think that there should be much "free" memory in
"top" commands output. I have seen many times that if someone sees small
amount of "free" memory there, then it means that his computer is in
trouble and he needs an upgrade. However all of us know that ideally
there should not be "free" memory at all and every piece of it must be
used for filesystem caching purposes. The same applies to ZFS cache: ARC
subsystem, that is a completely separate subsystem possibly (in old
times) looking as a RAM-huge process.
Actually Linux has problems with its cache and that is why I do not say
that Linux has production ready ZFS support. ARC on Linux is not the
same as ordinary page cache and can not immediately "give back" memory
to the system on demand, like ordinary page cache has to. And for
example if you run 2GB heavy process and there is no room for it, then
kernel "asks" ARC to free some memory, but ARC does not do that
immediately and starting process received "not enough memory" error.
However, when you press "up and enter" to run it again -- it will start
in most cases, because ARC returned some memory. So Linux has an ugly
workaround: by default it just limits ARC size to RAM/2 size, that looks
funny on 128GB RAM systems to see that half of RAM is taken by ARC (that
is good, cache is never bad), but other part is completely free. But all
of that only applies to Linux -- BSD/Solaris systems have no problems.
>What makes you say you won't live without ZFS, what
>features are the game-changers for you?
* complete replacement of volume managers, RAIDs, caching layers
(flashcache, whatever), fstab, /etc/exports and all that kind of
stuff. It is just too convenient to use and hard to sacrifice. If I
need some filesystem/mountpoint with atime, sync or other kind of
options, then previously I had to use volume manager to create new
volume, then I have to create filesystem with proper tuning, then I
have to edit /etc/fstab. It is very complicated and time consuming. In
ZFS it is just "zfs create -o option=value [...] pool/dataset"
command, nothing more. And I am not talking about RAID under all of
That dataset manipulations are so easy and fast (well, immediate from
user's point of view), that often I just create new dataset instead of
"mkdir", because probably I want exactly only with that directory/mountpoint
create snapshots, clones and takes backups of only it. When something
is much much more easier and convenient to do... it completely changes
you behaviour and tactics. For example it was always a pain to use
branches in Subversion VCS. But with git all that branches became so
easy and comfortable to use, that it become some kind of main thing
you work with.
* complete confidence in data integrity. Or at least of its bad
integrity detection. I saw several times that data may "rot" (HDD
alters data). And previously, before ZFS, I tend to create checksum
files with hashes of various files on filesystem. It was done manually
and obviously not convenient at all. With ZFS I can forcefully run
"scrub" to check all the data integrity. Metadata has 2-3 copies by
default, so it will be repaired immediately. Data will be repaired if
there is redundancy (mirrors and so on).
I underline that RAID does not protect from bitrot, as many people are
mistaken with. If one of your stripes on RAID mirror has differing
data, then... what stripe is right and correct? RAID has no
information about stripe's integrity and no way to decide what disk is
"rot" and what is right. It is literally lost data. RAID protects from
fault drive during the work, that is all.
Moreover RAID without built-in memory and power supply has write-hole
I have heard that Linux mdadm only few years ago gained ability to
store the state on some separate drive, outside the RAID itself, to
solve that kind of problem. But... few years ago? That means that
literally previously you had no possibility (without expensive
vendor locked-in hardware solutions with batteries and built-in
memory) to prevent data lass in RAID/power faults.
I do not remember clearly, but I think that was some kind of ZFS
slogan that "if your data is not on ZFS, then you have already lost
it". And actually that is damn right, because... you can not be sure
that it is in good state (without awful manual checksumming and
similar tasks). And I think that is the main reason I use ZFS: I am
confident and sure about data integrity and I can sleep soundly with
* higher performance. Well, ok, that is questionable. ZFS is not an
ordinary filesystem that is created, possibly tuned once, and then you
forget about it. ZFS is much more complicated and complex. And it
*requires* you to understand its principles of work. There are many
tunables in it and you easily can reduce performance (or even disk
space) with incorrect tunables for *your* workload.
For example PostgreSQL works with 8KiB pages everywhere -- and it
"tosses" 8KiB chunks of files on the disk. By default ZFS uses 128KiB
blocks. That means that by default *every* PostgreSQL (8KiB) disk
operation will lead to complete reading and writing of the whole
128KiB block! So it will "toss" 16 times more information on the disk
than necessary. If you use RAIDZ and place virtual machines images on
ZVOLs on it -- you will loose half of disk space on it! If you use
RAIDZ2/3 with some stripe-widths, then you easily can loose 2/3 of
disk space. Even Mirrors will "loose" less disk space.
So ZFS performance can differ in magnitudes for various workloads and
tunables per-pool or per-dataset. And when I heard people talking
about bad performance and I see their use cases and tunables -- in
most cases they just do not understand how it works (at least its
copy-on-write nature) and often do even more harming things with the
tunables. Of course that is not good for ZFS that is requires more
knowledge and understanding of it and of your workloads, it is not
Some workloads will be faster in nearly all cases. For example nearly
all writes in ZFS are sequentially written on the disk and parallel
simultaneous writes will be sequential for the disk. But files
deletion on ZFS is always slow. I would say *very* slow, it is true.
With small RAM and high loads it will be slower than other filesystems
(but, actually, who cares if none of them gives any guarantees about
integrity and possibly consistency of the whole filesystem? :-)). With
more RAM it will be much more performant. With many RAM, disks, fine
tunables for concrete workloads -- as a rule it will beat anyone.
If we talk about my personal home tasks, then ZFS beats everyone just
because of enabled LZ4 transparent compression. Currently my laptop
has 37% of disk space compressed. Most coworkers with ZFS has 20-40%
of compression on their PCs. First of all that means not the disk
space economy, but less data transferred between HDD/SSDs. Many years
ago I have got 60GB MongoDB database. MongoDB stored all data in
relatively highly compressible BSON documents. Literally just by
placing it on ZFS with LZ4 compression (that is so fast, that I have
never seen CPU being the bottleneck -- only SSD/HDDs) it took only
20GB of disk space. Literally 3 times less. And that also means that
ARC cache contains exactly that compressed blocks. So just using ZFS
there freed 40GB of RAM for caching purposes.
I have not tried relatively new Zstandard ZFS compression (my FreeBSD
version does not have it), but seems that it will beat LZ4 in practice
even more! If you are familiar, then Zstandard (zstd) is a thing that
by default is faster than gzip *and* compresses better than it.
Moreover it decompresses very fast. Previously I have never understood
the importance of fast decompression, but when I replaced nearly all
my .xz files with .zstd (with highest compression ratios), I
discovered that previously I was waiting the CPU decompression all the
time, not the disks writing that decompressed data. zstd with highest
ratios compresses slightly slower than xz, has slightly (1-2% or even
less than 1% difference in practice) bigger size, but decompression
speed is worth of it! I literally replaced gzip/xz with zstd in all my
personal use-cases. But I am not completely sure that with permanent
ZFS disk activity it won't eat all my CPU -- needs checking.
* snapshots! That thing is really life-changing! Sometimes I even lazy
enough to do some kind of git stash/git commit and just create "zfs
snap dataset@snap-name" snapshot, being able to "zfs rollback
dataset@snap-name" it to previous state. It is done instantly. Yeah,
some filesystems (UFS) can take snapshot -- but UFS sometimes takes
them for many minutes without quick rollback ability. You can not
create them easily with some kind of LVM, because you *have to* freeze
your filesystem state somehow. It is so burdensome, that noone will
think about filesystem snapshoting just to test his shell script
modifying files. With ZFS it is really literally easier to make a
snapshot, run the script, see that it sucked/bugged and destroyed all
your files, and make an *instant* rollback of your filesystem/dataset
state. If you do not want to rollback you whole filesystem (that has
email, logs, whatever) then you just do not forget to create another
sub-dataset (that immediately automatically becomes a mounted
directory) only for playing with your script.
Snapshots also gives ability to make backups of the whole
pool/dataset/filesystems. "zfs send dataset@snap-name" will write to
stdout serialized/marshallized representation of the whole dataset (or
possibly all datasets of your pool at once). If you want to restore
the whole filesystem/dataset with *everything* completely the same
(all tunables, metadata) just feed it to "zfs recv dataset" command.
Previously one of the most reliable way to backup the whole filesystem
was with dd :-). And of course if you have only 10GB of useful payload
(10GB of files), then zfs send will fire only that 10GB of data,
however it may be a 10TB pool.
You can see the difference between the states (snapshots) just by "zfs
diff snap-name1 snap-name2" (or difference with current dataset state).
You can access all files of any snapshot without any rollback just by
going to /path/to/dataset/.zfs/snap-name.
Ok, you can create you reliable checksummed 10GB backup today with
today's snapshot. Next day you create another snapshot. But obviously
there can be pretty small difference between them, because probably
you just read/write email all that time. You can create incremental
send-stream: zfs send -i snap-name-previous snap-name-current. That is
all. Nearly instantly you will get from stdout only the difference
between that snapshot. You can apply it on another machine (or when
you are restoring from backups) just by "zfs recv" again. ZFS will
understand that it is incremental and it will "apply" all the
differences, restoring the same state as it was originally. Literally
you can make snapshot and send them (that 1-2MB chunks of data) on
your backup machine, being able to *completely* without any burden
restore your filesystem state to any minutes state. I doubt there
existed so easy and convenient backup abilities anywhere.
All of that possible because of using Merkle-tree, hash-chains and
copy-on-write. That also means that resilvering (rebuilding) of the
mirror/RAIDZ arrays can be very fast. In ordinary RAID+LVM+FS setup,
if one of your 8TB drives just faulted for a second because of bad SAS
connector, then the *whole* complete RAID will be rebuild on repaired
drive -- all 8TB of data will be written on it again. Because RAID
does not know anything about underlying LVM, that does not know
anything about underlying filesystem, that actually modified only
500KB of data for the time of disk was outside that RAID. ZFS by
default makes every 5sec checkpoints and when it sees "repaired" drive
it just compares Merkle trees between its latest checkpoint and
current pool's one and for a seconds understands that only that 500KB
of data was changed and it syncs it and... that is all! All disks are
completely resilvered/rebuilt and we are sure about that because
everything is checksumed.
Snapshot are readonly. But you can make a "zfs clone" of any of them
-- it is literally a writable clone of some state. If I want to
experiment with OS upgrade in my virtual machine, then I will just
create clone of it (that is instant, because data will be written on
the disk only when its blocks are changed) and destroy (instant) if it
failed. And all of that without turning previous virtual machine off,
keeping it running.
And I am not talking about its "enterprise" features, like creating
dozens of stripes made of dozens of RAIDZ2 arrays, using L2ARC (SSD used
for additional (to RAM) read cache) and SLOG (separate disk for storing
fsync-operations for performance), hotspares. I just do not have
hardware and tasks for all of that at home.
Of course everything has its price. Except for higher entrance threshold
(knowledge and understand of ZFS and your workloads), of course it is
more resource hungry in general. Its authors give no guarantees of its
workability on 32-bit systems. It seems hard to integrate to the OS,
because it is not an ordinary FS, but a whole complex of various
subsystems (it even can integrate with NFS daemon). Linux currently does
not have so complete and good support of it, comparing to
FreeBSD/*Solaris/illumous systems. With small amount of free disk space
ZFS becomes very fragmented, killing performance -- you should have
10-20% of free disk space for compensating fragmentation. There are some
hacks and abilities to defragment it of course, but you have to remember
about that issue. Also ZFS kills the market of RAID-controllers :-),
because they are not useless with it, but even harmful.
>I have run oh-my-zsh for a while, but didn't use any special features.
>I use tcsh and can live with it
tcsh is my favourite after zsh. I do not want to try to convince you to
use zsh (tcsh is really very good too!), but I will describe my love to
First of all: 99% of all articles and resources in WWW tell about that
oh-my-zsh and that kind of stuff, mainly about zsh's completion
features. I have never talked about it, have never treat it as a big and
useful advantage at all. Personally I think that all of that rich
completions are even harmful, because I see how many people just do not
know about ordinary commands (like pgrep), command's options and even
not knowing how to get file list from the remote system via SSH. All
that interactive menus are awful from user performance point of view: it
requires you to do many keystrokes. And all that completions are just
damn slow! No lags must be noticeable by user -- that heavily hurts
"That" world of zsh sucks, in my opinion. I can not recommend it. And
zsh seems like yet another hipster-beloved (no offence to them :-))
tool, for novices, but not for true hardcore professionals.
But zsh actually is different beast, especially when we remember that it
has 30 years history already. Comparing to Bash, it is *much* more
lightweight, *much* faster (noticeable startup times for example), and I
think it is the most flexible from all the shells. Flexible not in the
sense of huge quantity of features added to it, but from its
architecture point of view and good programming decisions. I can even
call it very hackerish. Bash is really just a mess of various features.
If there is no something (like autopushd ability), then basically you
have no way to do it good enough. zsh has so many hooks, that virtually
every behaviour can be altered as you wish. zsh is very high quality,
*not* bloated software (oh-my-zsh-world is bloated).
tcsh was a great inventor of various useful interactive shell features.
Together with ksh they invented nearly everything user's time-saving in
shells. Why we use one shell instead of another? Why won't use pure dash
or BSD's /bin/sh? If another shell has line editing abilities -- it is
better, because it more convenient to use. If shell has parameters
expansion abilities, like "!$" (for substitution last argument of
previous command) -- that small feature will be killer-feature for me. I
think I use it dozens times per day for nearly 20 years already :-).
Availability of history -- very important feature. Ability to search in
that history -- another killer-feature. All our shells differs in that
kind of features.
Personally I use /bin/sh with any lack of history on all my FreeBSD
servers. Because I log in on them very rarely and can easily undergo and
close my eyes on lack of nearly all of the feature shells can provide.
Most people on the world even do not know about "Ctrl-R" ability of
history search in Bash and Readline-containing programs. But knowing
that kind small, but very useful features can be life-changing for
interactive work efficiency.
* zsh is POSIX compatible and even mostly Bash compatible. tcsh is not.
I am sure that all shell scripts (except for really personal ones)
must be written on pure POSIX shell, for maximum compatibility. I
remember that Debian has a war against Bash-isms. One of the oldest
russian AltLinux GNU/Linux distribution also has a jihad against
bashisms. But unfortunately sometimes (for example Python's
virtualenv-wrapper) there are script written on Bash. I have no system
with Bash. But that virtualenv-wrapper works well under zsh. tcsh
depresses me that its scripting is completely different from POSIX
* I do not remember what exact ksh-features I wanted in Bash (I tried to
work with it when I used GNU/Linuxes), but it lacked them. zsh has all
of them (and tcsh ones of course)
* zsh has shared history. None of tcsh, ksh and bash has it. It is
killer feature. Literally it shares the same common history between
all zsh instances. Most colleagues, as I remember, calls that is the
main killer feature and Bash biggest lack of
* autopushd ability. I do not remember clearly, but possibly that
feature I have met in tcsh and suffered from its unexistence in bash.
There are possible hacks for bash for it, but all of them break many
things and there were no complete solution for that relatively easy
task. But pushd/popd (with autopushd) is essential for me!
* powerful completion matchers. All modern interactive shells has
ability to complete filename with the Tab. But zsh can complete things
like: a/b/c/foo.py<TAB> that will expand to Aram/Boom/COLD/foobar.py,
when having the following hierarchy:
with an "ordinary" shell after you press TAB after any "a", "b", "c"
you won't see the completion, because "Abul"/"Aram" (random stupid
words get in my mind when I entered all of that) both starts with "A".
Moreover, most shells are case-sensitive without ability to be
insensitive. zsh can try to match the whole hierarchies satisfying
your input. Many times less keystrokes!
* extended globbing, **-path expansion. Not sure, but I do not remember
that tcsh has **-path expansion, where you can enter: vi **/foo.py and
it will search for foo.py among directories and subdirectories. Very
useful and convenient thing used very often. If you want to search
only for executable ones: **/foo.py(*), open latest foo*.py among
them, but without "tmp" or "swp" in filename?: **/foo*.py(om)~*(tmp|swp)*
("o"rder by "m"odification time, first element of that array, except
for ones that satisfy tmp/swp globbing). That task can be solved with
the "find" of course, but I will fall asleep till finishing its writing :-)
* history expansion, globbing modifiers. For example: take the last
argument from previous command, take its basename (strip off
directories) and strip off his extension, add ".bak" one: !$:t:r.bak
If you want to take the last argument of currently written command,
then use "!#$" instead of "!$". Retry previous command with all ".fs"
replaced with ".iso" (I remember that used that for OpenBSD
distribution files): !!:gs/.fs/.iso
All of that used very much! Especially taking the directory of the
file: ":h" modifier for example.
* "zmv" command/plugin. It is program written on pure-zsh, that can do
renames based on various patterns. I use only very simple ones, like:
zmv '*' '$f:gs/ /_' -- replace all spaces in filenames with underscore
zmv -W "*.foo.bar" "*.baz" -- replace .foo.bar extension with .baz in
zmv '(*).MetalBand-AlbumName-(*)-(*).wv' '$1-$3-$2.wv' -- will
rename "01-MetalBand-AlbumName-SongName-1998.wv' to '01-1998-SongName.wv'
This is synthetic example, but zmv saved probably hundreds of hours of
my time when renaming various musical stuff I get from various sources.
And there are hundred of other various features, tunables (like
directory aliases, global aliases ("ls G foo M" will expand on my laptop
to "ls | grep foo | less" -- even bash can not do aliases with pipes),
ability to run commands based on file's extension, history ignore
patterns (to skip some commands appearing in the history),
spellchecking/autocorrection (tcsh has it, but I do not like that
features), and so on. All of that are just out-of-box.
First time I heard and saw many of those modifiers, expansions and
whatever -- I though that I so seldom have any kind of tasks, that all
of that features will be useless. If I learn some of that magic runes, I
will forget them in a few days. But I was wrong: I use more and more of
them, because they are very time-saving and relatively easy remembered
after real usage (muscle memory?).
But there are several plugins (written on pure zsh) that were created
after FISH-shell. FISH is interesting thing, but hardly usable in
practice because it is not POSIX-complaint too. But it has several cool
ideas that are must have for me now, that are really life-changing in
* https://github.com/zsh-users/zsh-syntax-highlighting -- command line
syntax highlighting. Seems just a fancy funny thing. But it becomes
*very* helpful. I have programmed on Perl and other languages without
any syntax highlighting in my editor for years. But it actually really
helps to quickly navigate the source code. It gives some kind of
reaper points for your eyes. Syntax highlighting in zsh has similar
benefits: it is much more easier to understand in milliseconds where
are commands (separated with ";", "&&", "|", whatever), their
arguments, braces, brackets, environmental variables and all that kind
of stuff. It is very valuable for me now. Some older versions of that
plugin were slow -- literally it consumes CPU pretty much just to
colourify all that dozens of words. But soon they fixed that.
* https://github.com/zsh-users/zsh-autosuggestions -- this thing just
shows you (with different colour) the element in the history starting
with something you input. Tiny task, tiny plugin. But it is the most
useful one! As in tcsh, as can be configured in bash, if I input "foo"
and press "up", then shell will search in the history for any commands
starting with that "foo". But until I press "up" -- I do not know what
element from the history will be substituted first. That plugin just
shows what I will see, what element in the history will be taken if I
will press "up". It saves a lot time, because while I am entering
"foo", I immediately see the history and if it acceptable
suggestion/choice, then I can quickly press "up+enter". Not "up", read
the suggestion, and then press "enter", but "up+enter" because I
already see the suggestion. Often I see that the thing I have entered
is completely wrong or will suggest too many wrong entries from the
history -- and I see and value the suggestion immediately, because my
eyes look at the display, why my hands/fingers types characters in parallel.
* https://github.com/zsh-users/zsh-history-substring-search -- actually
I do not use that plugin, but have written my own simpler
implementation for my needs. The main idea is too search by patterns
and substring in the whole commands inside the history. For example I
have got some kind of command in history:
FOO=1 bar something | cat bazbla-bla-bla
and I just remember that I have it, but I only know that somewhere I
have got "bar" in it and "baz" too. Ok, previously I tried to enter
"bar<UP>", but because of "FOO=1" at the beginning, I won't needed
suggestion. And I do not know the reason: possibly it has something
before "bar", possibly I am completely wrong, possibly history just
does not contain it anymore. zsh-history-substring-search (idea is
taken directory from FISH shell) works like this: "bar baz<UP>" and it
will find that string, highlighting the words "bar"/"bar" in it. Bash
allows you to search that history only by typing something like:
<Ctrl-R>.*bar.*baz -- that is too burdensome, comparing to "bar baz".
But that plugin completely rebinds the "up" button and I actually do
*not* want to search the history always in that kind of mode. Most of
time I want to search by the beginning of what I have entered. And I
did not find the way (obviously it can be achieved, but I badly know
zsh programming) to have my "up" button the same behaviour as before,
and "shift-up" enabling "substring-match" search. I have written my
own desired functions that behaves similarly, but without highlighting.
For example if I want to run mutt and open "suckless" (maillist)
mailbox, then I enter "mu suck<shift-up>" (I am not joking, I literally
enter this often :-)) and "mutt -f =suckless" is suggested, I press
"enter". Because I open various mailboxes with "mutt -f" invocation,
simply pressing "mu<up>" can take much iterations while "suckless"
will be suggested.
There are several places without all that oh-my-zsh-ism, with serious
I understand and use probably only 10% of all of that. Unfortunately zsh
is like vim: you can learn it for years, many years finding another cool
tricks and abilities (not because they appear often (completely the
opposite), but because many of them requires good use-cases and examples