Which ZFS to use to after the death of OpenSolaris?

Why ZFS

You probably are not reading this to discover what ZFS is (short story: its a file system), so I'll skip that. You probably also understand the benefits, but I'll summarize them anyway, briefly:

  • Currently not available (arguably) in any mainstream modern filesystem:
    • End-to-end data integrity, through on-disk checksumming and automatic healing (if redundancy exists).
      • With exponentially growing disk capacities, "silent corruption" is approaching a mathematical certainty with current filesystems.
      • End-to-end data integrity is becoming a baseline requirement for any data you wish to remain valid over time, yet few filesystems currently address.
      • Helps prevent or eliminate "bit rot" (a form of silent corruption).
      • Helps prevent or eliminate data corruption by any hardware component in the chain.
  • Currently still somewhat rare in modern filesystems:
    • Copy-On-Write transactional model (for more reliable on-disk writes and provides foundation for other advanced features).
    • Double-parity RAID-Z2 (like RAID-6) and triple-parity RAID-Z3, to combat the growing near-mathematical likelihood of a second drive failing while resilvering from a first failed drive in RAID-5 configurations.
    • Elimination of the RAID-5 power-failure "Write Hole" for striped parity pools.
    • Software RAID implimentation (essentially required for end-to-end data integrity) with performance rivaling or exceeding modern hardware RAID hardware.
    • Can have redundancy for automatic bitrot healing even without mirrors or RAID-Zx, with the "copies" property.
    • Fine-grained snapshot capability.
    • Can stream snapshots to any format, medium, and/or destination.
    • Read/write snapshot clones.
    • All-in-one block device, array, and pool management - with only two basic commands. (The classic storage stack is flattened with redundancy between layers removed.)
    • Pools can be safely grown by replacing disks in an array with larger models, and/or by adding arrays new arrays.
      • Currently neither arrays nor pools can be shrunk (usually not a concern in most use cases).
    • Advanced two-stage, SSD-aware read-caching model.
    • Transparent in-line block-level pool-wide deduplication.
    • Transparent in-line compression.
    • Transparent in-line encryption.

Which ZFS

For those requiring open-source solutions (whether on principle, cost, or both), there used to be only one real answer to this question: OpenSolaris. Sun Microsystems made history when they open-sourced Solaris. While alternatives to ZFS on [Open]Solaris started springing up, OpenSolaris (where ZFS was born) was still the de-facto (and arguably best) way to get ZFS.

That story began to change when it seemed increasingly likely that Oracle would purchase Sun Microsystems, and [Open]Solaris along with it. The fear (not completely unfounded) was that Oracle would kill the "Open" in OpenSolaris, in spite of the fact that Sun's own CDDL license would make that difficult. The fruits of that purchase have been revealed: OpenSolaris has become Oracle Solaris Express 11. It both confirms and refutes the fears. Without going into the details of how and when Oracle plans to release the CDDL code to the community, suffice to say it is less than ideal, and not very "open" at all. (In addition, it seems commonly expected that entirely proprietary code will start to replace existing CDDL'ed code).

Integrating ZFS into an operating system is not just a technical challenge, it can also be a legal one. The original engineers of ZFS themselves at Sun allegedly wanted a license model that was explicitly incompatible with the common GPL license of the Linux kernel. Sun created the "CDDL" open-source license, based on the Mozilla license, which was already well-known to be incompatible with GPL.

However, that does not mean it is impossible to include ZFS in a non-Solaris kernel. For starters, the BSD license is not incompatible with CDDL. Furthermore, the incompatibility with GPL only applies to code included in the kernel. But it can be run in user-space (e.g. FUSE), or could be installed as a Loadable Kernel Module not distributed with the kernel itself.

Availability of ZFS and ZFS-like advanced file systems, as of 2011-02-09

Stable? Platform based on ZFS ver ZPool ver Pros Cons Notes
Y Oracle Solaris 11 Express 2010.11 ON 151a 31 - Currently has the most recent implementation of ZFS.
- Arguably the easiest of the bunch (of those offering ZFS in the kernel) to install and administer.
- GUI (Gnome) is more polished and useful than FreeBSD+Gnome, NexentaCP+Gnome, and OpenSolaris.
- Free (as in beer).
- License limited to use by developers/testers (other home uses probably wouldn't be thrown in jail).
- Not free (as in speech).
- Limited driver support.
- GUI not as polished as most Linux distros.
- Administrative commands very different than BSD or Linux.
- Primitive package manager.
- Limited package repository.
- The existence of Oracle as the new owner of Solaris, and the licensing model (not necessarily on paper but recently announced in practice) are antithetical to lovers of the fully free/libre BSD model.
Y Nexenta Core Platform v3.0.1 ON 134b 26 - Solid text-mode installer can easily install root to a ZFS mirror.
- The Nexenta community ported the Debian apt-get package manager, and have several available ported/recompiled binary packages are available through it (that will run on the Solaris kernel).
- Third party utilities are available to provide web interface to ZFS and other Solaris administrative tools.
- Has absolutely nothing to do with Ubuntu, in spite of product positioning.
- Gnome can be installed, but the result is very primitive and arguably not worth the effort, unless you have administrative utilities (compiled for Solaris) requiring GTK+/Gnome.
- Has a limited handful of GNU commands ported and recompiled to work on Solaris, which just winds up confusing Solaris users/admins. (GNU binaries can fortunately be disabled for less confusion.)
- In spite of some community confusion, linux binaries (from Ubuntu or not) will not run. - See this post for a detailed clarification of the common Nexenta/Ubuntu confusion. - GUI-less installation by default.
- Plans to move base from ON 134b, to Illumos in future. (Which will make it less reliant on Oracle, arguably less risky, and arguably more likely to truly fork in the future from Oracle Solaris.)
Y FreeBSD v8.1 (BSD) 14 - OS virtually free of almost all "big brother interference". - Solid implementation of ZFS (and although currenty as old as OpenSolaris 2009.06, at least is under ongoing development).
- Absolutely ancient installer.
- Manual tinkering required for bootable root on ZFS (single or mirrored).
- No deduplication with supplied ZFS version.
- Very much an "old-school" POSIX OS.
- GUI-less installation by default.
- Able to acheive very small installation footprint.
- Manual tinkering of config files, init scripts, and daemons required for most configuration/administration.
Y Sun OpenSolaris 2009.06 ON 111b 14 - Arguably "The Last Free Solaris".
- Solid installer.
- GUI (Gnome) more polished than FreeBSD+Gnome or NexentaCP+Gnome.
- No longer maintained.
- Limited driver support.
- GUI not as polished as most Linux distros.
- Administrative commands very different than BSD or Linux.
- Primitive package manager.
- Converting root pool to a mirror requires manual tinkering.
- Limited package repository.
- Cryptic package names.
- No deduplication with this version.
- If you are satisfied with ZPool v14 (same as latest stable FreeBSD v8.1), then it's hard to go wrong with this.
- Extended ACL model is completely broken and unusable for production (classic model is fine).
- Because ZFS kernel-level CIFS/SMB service relies on broken ACL model, you must use add-on Samba server (which is solid) instead, for Windows share server.
Y ZFS-FUSE for Linux, Mac OS X, and BSD FUSE cross-platform user-space filesystem API 23 - Solid by now (in spite of getting off to a rough start and in spite of FUSE modules [being very easy for newbies to write] having something of a reputation for immaturity).
- Perhaps inarguably the easiest to install and go (including installing a modern, user-friendly Linux distro).
- Able to acheive very small installation footprint with choice of a minimalist Linux distro.
- Currently does not perform as well as in-kernel variants (but in theory it could reach parity with refinement).
- Some purists object to the notion of a filesystem in userspace. But in actual use there is little administrative difference, nor must there necessarily be a performance impact.
y KQ Infotech ZFS Linux Loadable Kernel Module (Linux) 28 - Linux Loadable kernel module should in theory outperform ZFS-FUSE for Linux. - Claims to be "GA-quality". - Appears to be targeted as an enterprise product with paid support. License terms are not clear on their website, but would have to also be CDDL (free and open-source).
n Btrfs (Linux) n/a n/a - All or most of the major benefits of ZFS (particularly COW, pooling, end-to-end checksumming, integrated management, writable snapshots)
- Pools can be grown and shrunk (ZFS can only grow).
- GPL license compatible with Linux kernel (vs GPL-incompatible CDDL license of ZFS).
- Already included in Linux kernel (for testing only).
- No plans for in-line encryption on roadmap. - Sponsored by Oracle (may be seen as a "con" by some).
- Completely different implementation approach to features that ZFS.
- Current build does not include deduplication, but it is planned.
n Illumos ON 147 22 - The Illumos project aims to replace closed-source drivers, encryption code, and other binaries with open-source code; while otherwise maintaining binary compatibility with ongoing Oracle Solaris development. This should significantly reduce the risk of another sudden "shutdown" of an open-ish Solaris-based project (as happened with OpenSolaris after Oracle purchased Sun), and provide a much "sunnier" outlook for ZFS (and the ON kernel). - Plans to be the base platform for innumerable future distributions (including OpenIndiana, Nexenta, Schillix, Belinix, etc.). - Project homepage does not currently show many updates or much recent activity.
n Schillix Illumos 22 - A convenient way to get a binary dev build of Illumos, without compiling it yourself.
N OpenIndiana ON 147 28 - Aims to stay true to the OpenSolaris experience. - Plans to move base from ON 147, to Illumos in future. (Which will make it less reliant on Oracle, arguably less risky, and arguably more likely to truly fork in the future from Oracle Solaris.)
n LLNL ZFS Linux Loadable Kernel Module (Linux) 28 - Linux Loadable kernel module should in theory outperform ZFS-FUSE for Linux. - Little available information currently. - Lawrence Livermore National Laboratory port of ZFS under contract by the US Department of Energy. Like the KQ Infotech LKM, it is a Linux Loadable Kernel Module distributed separately, thus doesn't violate Sun/Oracle's GPL-incompatible CDDL license.
n Mac ZFS (Mac OS X) 2 8 - Old version of ZFS. - A Mac OS X (Darwin) Loadable Kernel Module.
? Milax ON 128a 20 - Small footprint, low overhead. - No deduplication in this version of ZFS. - Targeted as a minimalist installation.
? EON ZFS Storage ON 134 (?) 26 - Can install to and boot from thumbdrive. - Runs from RAM.
- Minimalist installation.

If you find any factual errors, please leave a comment.

Nexenta != "Ubuntu on OpenSolaris"

To read the many glowing reviews of Nexenta, would lead one to believe that they took Ubuntu, and slapped in the good essentials of OpenSolaris (namely the amazing ZFS).

First let me disclaim that I mean no harm to the good folks at Nexenta; or even the misinformation-spreaders who almost certainly meant no harm or confusion. Although Nexenta themselves don't go to great lengths to technically explain the true, limited nature of their hybrid product, they also don't deliberately misinform about it.

The misinformation comes in part from tech blogs, and seems especially rife in forum posts about Nexenta. (It seems to me the latter is where Nexenta themselves could do a better job of clarifying their product position.)

So let's set the record straight.

Nexenta is none of the following:

  • OpenSolaris on Ubuntu
  • Ubuntu on OpenSolaris
  • OpenSolaris with an Ubuntu kernel
  • Ubuntu with an OpenSolaris kernel
  • OpenSolaris with Ubuntu userland tools

That last, oft-repeated phrase is arguably the closest to being technically accurate, but still very misleading by itself.

Nexenta is all of the following:

  • A late OpenSolaris dev build (from before Oracle bought Sun), with some additional custom bug fixes and enhancements from the Nexenta community themselves.
  • Some GNU userland utilities ported and/or recompiled for OpenSolaris.
  • The apt-get repository system from an older version of Ubuntu, ported and recompiled for OpenSolaris.
  • A comparatively small collection of applications and tools, ported and/or recompiled for OpenSolaris, available through the Nexenta-specific apt-get binary repository database.

Many references, reviews, and especially forum posts about Nexenta innocently mislead (if not negligently mis-state) the notion that Nexenta can run the whole gamut of Linux applications available in the Ubuntu apt-get repository. That is not true. Let me restate:

Nexenta cannot run applications compiled for Ubuntu or any flavor of Linux.

...Whether available from the Debian or Ubuntu apt-get repositories, RMP, Yum, or any other Linux binary repository. (And by "Linux" I am of course aliasing the more technically correct phrase "a GNU/Linux distribution".) Furthermore, getting applications written for Linux to run on UNIX is not generally as trivial as it might seem (for starters Solaris uses different C libraries). It is not necessarily a simple matter of recompiling the same source code on different platforms.

Nexenta's commercial enterprise products and appliances may very well be appealing to enterprise customers (but certainly not because of anything Ubuntu-ey or Linux-ey about them). So my thoughts below don't really apply to that class.

Nexenta Core Platform ("Nexenta CP" or "NCP") is the free, open-source, GUI-less base product. Make no mistake: after installing, you have a UNIX installation. Specifically, an OpenSolaris installation along with all of [Open]Solaris' advances and quirks over more legacy-ish UNIX variants. Oh, and you also have the apt-get binary package manager, and a handful of GNU utilities typical common to Linux distros.

In no way, shape, or form do you have Ubuntu or anything remotely like Ubuntu.

The end result is an often times confusing mishmash of Linux and UNIX commands to contend with. While you can set an option that gives you a mostly "pure" set of OpenSolaris commands (and I highly advise you use it), the opposite is not true: You cannot expose a "pure" (or even mostly) set of Linux commands.

I could see Nexenta CP v3 potentially appealing to casual users new to UNIX-like operating systems. Perhaps ironically, hard-core administrators/developers who are fluent in both [Open]Solaris and GNU/Linux, will probably be the group most frustrated by Nexenta (at least until they switch it into "Solaris-only flavor").

The only way to configure the core system, including network interfaces, disk drives, etc. - is via [Open]Solaris commands. There are generally no Linux equivalents.

Perhaps the most unfortunate set of expectations with Nexenta CP, is that you can install an Ubuntu GUI on top of it. You can't. With considerable effort, you can (and I did) install the X-Window system, and GNOME (and even GDM). It might superficially resemble any other GNOME installation (e.g. Ubuntu...or [Open]Solaris), but it is a woefully incomplete GNOME implementation. Furthermore, you cannot really manage the system with the provided GUI tools, as your options are very limited compared to an Ubuntu or [Open]Solaris build. If you need or wish to run a web browser and a terminal that supports cross-app copy and paste (e.g. GNOME Terminal emulator), you're better off SSH'ing in from some other client anyway (say Linux, Solaris, Mac, Windows, Commodore, iPhone, Android...etc.). In other words, it probably just isn't worth the time to get a GUI working on Nexenta. (Of course, in order to SSH in you have to have a working network configuration first. That may or may not be automagic [pun intended].)

And now for the good

As long as you go into Nexenta CP understanding its true nature, there are plenty of nice things about it to like, such as:

  • The Nexenta CP v3 text-based installer is competent and solid. (Much easier than, say FreeBSD's installer.)
  • The installer can optionally put the root system on a bootable mirror - rather than having to hack one together after the fact (as with [Open]Solaris [Express] 2009.06 and 2010.11). Nice!
  • Nexenta CP v3.0.1 supports ZFS ZPool v22, which includes deduplication (but not yet encryption as with ZPool v30 and later). It is a higher ZPool version than ZFS on FreeBSD 8.1 (v15), but not as high as Solaris 11 Express 2010.11 (v31 which supports encryption), or even ZFS on FUSE (v28).
  • Unlike the NexentaStor CE product (free but limit of 18 TB) or NexentaStor trial (45 days), the Core Platform product has no limitations - well, besides not including the NexentaStor web GUI and some other nice enterprise features. However, many free third-party tools exist to (arguably) bring Nexenta CP up to grossly approximate parity with NexentaStor in terms of features.
  • While the repository is slim compared to a typical Linux distribution, the Nexenta port of apt-get is apt-get - which means it does a good job of managing dependencies, unlike [Open]Solaris' pkg_* utilities which does not (nor claim to).

Bottom line: A more accurate Nexenta CP product positioning statement

Nexenta: It's like OpenSolaris - but with a better package manager, some ported Linux applications available through it, and a handful of GNU utilities thrown in; all without a useful GUI.


Appendix


An incomplete compendium of misinformation


Hope that clears a few things up!

OpenSolaris: Common administrative reference and troubleshooting guide

Table of contents

System file and folder locations

    Section contents

    Table

    Purpose Location As of Comments
    Network Time Protocol (NTP) /etc/inet/ntp.conf 2006.09
    X11 configuration /etc/X11/xorg.conf 2006.09
    Per-user environment variables ~/.profile 2006.09
    • The concept of "global" variables do not exist in OpenSolaris, out-of-box.
    • For path extention, append to the line starting with "export PATH=" (no quotes).

Common admin commands

    Section contents

    Table

    Purpose Command(s) As of Comments
    Obtain root privledges

    One of:

    • pfexec {command}
    • pfexec bash
    • su - root
    2006.09
    Find hardware descriptions and serial numbers cfgadm -lav 2006.09 Useful for identifying drives
    Itemize hardware prtdiag 2006.09 Processors, RAM, etc.
    Power-off

    One of:

    • pfexec shutdown -i5 -g0 -y
    2006.09
    Reconfigure reboot

    One of:

    • pfexec touch /reconfigure; init 6
    • pfexec reboot -- -r
    • pfexec touch /.reconfigure; reboot
    2006.09
    Kill a user login

    Steps:

    • who (To list usernames of users logged in)
    • pfexec pkill -KILL -u {username}
    2006.09
    Restart automatic networking pfexec svcadm restart nwam 2006.09 Do not use this command if you aren't using automatic network configuration (which is not the same thing as having DHCP and/or DNS addressing enabled in manual network configuration mode).

ZFS

    ZFS overview

      Section contents

      ZFS and disk-related terminology

      • As of at least v2006.09
      • Slice: Analgous to a "partition" in other OS' terminology, though a bit more lightweight.
      • Block device: A disk, slice, or file.
      • Array: A fault-tolerant collection of block devices, configured as a mirror RAID.
      • VDEV: A single block device or array.
      • Pool: One or more VDEVs that can be grown dynamically while on-line.

      ZFS limitations

      • As of at least v2006.09
      • Once created, an array cannot be non-destructively grown.
        • However, a pool can be non-destructively grown by adding VDEVs to it. A common and useful scenario is to add mirrored pairs to a pool.
      • An array is limited in size by it's smallest block device.
        • However, a pool does not have this limitation. A common and useful scenario is to add mirrored pairs to a pool, where the drives in the mirror are the same size, but may be different in size from the drives in other arrays in the pool.
        • Additionally, an array can itself be grown, by individually swapping out all block devices with a larger devices. One must observe the following caveats:
          • Must wait for array to rebuild in between each replacement.
          • Size will not expand until last replacement has completed and the array has rebuilt itself.

      ZFS future enhancements

      • Planned for v2010.03
        • Block-level deduplication
        • Encryption
        • Ability to non-destructively expand an array by adding a block device

      Solaris disk naming convention

      • As of at least v2006.09
      • If Solaris thinks they are SCSI (whether or not they actually are), which is likely:
        • Convention: c{controller#}t{bus#}d{drive#}s{slice#}
        • E.g.: c0t0d0s3
      • If Solaris thinks they are ATA or SATA (whether or not they actually are):
        • Convention: c{controller#}d{drive#}s{slice#}
        • E.g.: c0d0s3
      • Note: Once a device is given a name, it supposedly does not change even if controllers or drives are moved around (not confirmed)

    ZFS and disk-related commands

      Section contents

      Table

      Purpose Command(s) As of Comments
      List disks

      Steps:

      • pfexec format
      • Then CTRL+C to cancel.
      2006.09
      Configure a newly added drive so that Solaris 'sees' it

      Steps:

      • cfgadm -la
      • or cfgadm -lav (more detailed)
      • Look in the "Condition" column for a line that says "Unconfigured".
      • Note the "Ap_Id" for that line (in the first column)
      • pfexec cfgadm -c configure {Ap_Id}
      2006.09
      Make a bootable mirror of ZFS root pool

      Steps:

      1. su - root
        • format -e
          • Select {target}
          • fdisk
            • 3. Delete a partition
              • Delete all existing partitions.
            • 1. Create a partition
              • 1=SOLARIS2
                • 100%
            • 5. Exit (update disk configuration and exit)
          • label
            • [0] SMI Label
          • quit
        • prtvtoc /dev/rdsk/{source}s2 | fmthard -s - /dev/rdsk/{target}s2
          • Note: use s2 for both!
        • zpool attach -f rpool {source}s0 {target}s0
      2. zpool status
        • Watch until status says "resilver complete".
      3. pfexec installgrub -m /boot/grub/stage1 /boot/grub/stage2/dev/rdsk/{target}s0
      2006.09
      • In general, ZFS works best with whole disks, and working with individual slices invites headache. However, as of v2006.09, mirroring a bootable root will only work with slices. (The slice can and should ideally consume the whole drive.) ZFS will let you mirror a root as whole disks, but it won't boot. These commands refer to the context of slices.
      • Notational convention
        • {source}: The existing, bootable root disk, in cXtXdX format.
        • {target}: The drive to make a mirror out of, in cXtXdX format. Must be same size or larger as source.
      Create a ZFS pool pfexec zpool create [-m {mount point}] [-n] [-f] {pool name} [type] {vdev 1} [...[vdev n]] [spare {vdev 1} [... [vdev n]]} 2006.09
      • [type] can be:
        • nothing (to just stripe across a bunch of vdevs without redundancy)
        • mirror
        • raidz (Same as raidz1.)
        • raidz1 (More robust version of RAID-5.)
        • raidz2 (More robust version of "RAID-6" or dual-parity RAID-5.)
        • raidz3 (Similar to RAID-5 and RAID-6 but triple-parity.)
      • Options:
        • -n (Simulate only.)
        • -f (Do it even if ZFS complains about something; e.g. force creation when ZFS complains about vdevs of different sizes.)
      • E.g.: zpool create -m /export/mountpoint poolname raidz3 c0d0 c0d1 c0d2 c0d3 c1d0 c1d1s3 spare c2d0
      Grow a ZFS pool pfexec zpool add [-f] {poolname} (remaining syntax identical to "zpool create") 2006.09
      • E.g.: zpool add poolname raidz2 c2d0 c2d1 c2d2 c2d3
        • Creates a new raidz2 array with the named block devices, and expands the pool named "poolname" to include it
      • Notes
        • This adds a vdev to a pool...not a block device to a vdev.
        • A vdev can be an array.
        • Data is always striped across all vdevs.
        • This is how you grow a pool.
      Move a ZFS pool's L2ARC pfexec zpool add {poolname} cache {vdev 1} [...[vdev n]] 2006.09
      • L2ARC = Level 2 Adaptive Replacement Cache.
      • AKA the pool's read cache.
      • Moving L2ARC to a dedicated device (e.g. fast drive or SSD) generally greatly improves the read performance (esp. IOPS) of a pool.
      • The larger the size of the L2ARC, the better.
      • If multiple VDEVs are specified, they will be striped.
      • ZFS is "SSD-aware" and will take special advantage of L2ARCs on SSD drives.
      Move a ZFS pool's ZIL pfexec zpool add {poolname} log {vdev 1} [...[vdev n]] 2006.09
      • ZIL = ZFS Intent Log
      • By default, the ZIL is distributed in the pool.
      • Moving the ZIL (esp. to a fast drive or SSD) can improve the write performance (esp. IOPS) of a pool.
      • The maximum useful size is generally 1/2 the amount of RAM; any larger yields decreasing returns.
      Set good default ZFS options

      Steps:

      • pfexec zpool set autoreplace=on {poolname}
      • pfexec zpool set compression=on {poolname}
      • pfexec zpool set autoexpand=on {poolname} (Doesn't seem to be supported on ZFS v3.)
      2006.09
      Move a ZFS pool to another machine

      Steps:

      • pfexec zpool export {poolname}
      • Move hardware (e.g. the pool's hard drives).
      • zpool import (To see what pools are available to import)
      • pfexec zpool import {poolname}
      2006.09 Also useful for "disconnecting" a pool to insure it isn't harmed during things such as OS upgrades.
      Replace a ZFS pool device (e.g. hard drive) in a ZFS pool pfexec zpool replace {pool name} {vdev to replace} {vdev replacement already online} 2006.09 Alternately, setting the autoreplace option to on will do this whenever you swap devices on the same physical connector.
      Rename a ZFS pool

      Steps:

      • pfexec zpool export {old poolname}
      • pfexec zpool import {old poolname} {new poolname}
      2006.09
      Rename a ZFS filesystem pfexec zfs rename {pool}/{orig fs name} {pool}/{new fs name} 2006.09
      List ZFS pools zpool list 2006.09
      Check ZFS pool status zpool status -v {poolname} 2006.09
      Scrub a ZFS pool pfexec zpool scrub {poolname} 2006.09 Verifies mirrors and stripes, and recalculates checksums.
      Delete a ZFS pool pfexec zpool destroy {poolname} 2006.09
      Upgrade the ZFS version of a pool zpool upgrade -v 2006.09
      Report ZFS I/O stats zpool iostat -v {poolname} 2006.09
      Investigate slice/partition info prtvtoc /dev/rdsk/{vdev} 2006.09

    ZFS troubleshooting

      Section contents

      Table

      Purpose Command(s) As of Comments
      ZFS snapshot service not fully working

      Some symptoms:

      • The Gnome slider snapshot service won't stay enabled.
      • Some snapshot frequency services won't stay enabled.
      • The "Restore" button on Nautilus is grayed out.
      • The command "svcs -xv" results in a bunch of snapshot related dependencies not running due to gnome-slider service not running.
    • Solution: Boot to a new default boot environment. Steps:
      1. Create a new cloned boot environment from current
        • pfexec beadm create -a {new boot environment name}
      2. Perfrm a cold reboot.
      3. List boot environments
        • beadm list
      4. Delete all previous boot environments except for current
        • pfexec beadm destroy {previous boot environment name}
        • Note: This is reasonably safe, as beadm will complain if you try to destroy the current boot environment.
      2006.09

VNC server

    Section contents

    Table

    Purpose Command(s) As of Comments
    Enable VNC server

    Steps:

    1. Make sure SUNWxvnc is installed (it is by default).
    2. Disable all VNC services except the one we want (which will be enabled later).
      • svcs | grep vnc
      • For each one listed:
        • pfexec svcadm disable application/x11/xvnc-inetd
    3. Edit the services init file
      • pfexec nano /etc/services
        • Verify existence of, or add the line:
          • vnc-server    5900/tcp    # Xvnc
    4. Edit the GDM config file
      • pfexec nano /etc/X11/gdm/custom.conf
        • Edit sections so that the following lines are set as follows
          • [security]
            DisallowTCP=false
            AllowRoot=true
            AllowRemoteRoot=true
          • [xdmcp]
            Enable=true
    5. Configure sessions to be persistent
      • pfexec svccfg -s xvnc-inetd
        • setprop inetd/wait = boolean: true
        • quit
    6. Configure so client window is bigger
      • pfexec svccfg -s xvnc-inetd
        • listprop inetd_start/exec
          • Copy the resulting string output (without the quotes).
        • setprop inetd_start/exec = astring: {string from above} -geometry {x-pixels}x{Y-pixels}
          • You may need to move the -geometry argument to just prior to the -inetd parameter.
    7. Enable VNC service
      • pfexec svcadm enable xvnc-inetd
      • Make sure the service is running
        • svcs xvnc-inetd
    2006.09

    Further reference

Network file sharing

    Why Samba rather than CIFS/SMB

      As of version 2009.06 (snv_111b), the CIFS/SMB server built into the OpenSolaris kernel and managed via ZFS commands, is not a teneble solution. Furthermore, as this was written near the eve of the upcoming 2010.03 release, it appears almost certain that the problems with the built-in server will not be fixed.

      The problem is not the CIFS/SMB services themselves, but rather its exclusive reliance on the ZFS extended ACL system to manage access; specifically:

      • The ZFS extended ACL system in OpenSolaris is badly broken, to wit:
        • While the model is conceptually elegant and more similar to the robust and flexible Windows NTFS ACL model, it is almost universally agreed in the OpenSolaris community that in it's current form, it is too confusing, difficult to manage, and hard to tell what is in place.
        • Part of the problem has to do with a long and convoluted console syntax, and no available GUI.
        • Standard POSIX-like tools built into OpenSolaris, that manage "traditional" POSIX security attributes, completely wipe out any custom extended security attributes.
          • These tools are myriad and difficult to track down. Often times users are completely unaware that their carefully crafted extended ACLs have been wiped out thus rendering their carefully design security useless, by utilities running sometimes automatically that they had no idea would cause problems.
          • OpenSolaris contains multiple copies of many utilities; some that operate on POSIX-standard security attributes, others that operate on extended ACL attributes. It is often not clear which is which, and too easy to accidentally use the wrong one and wipe out your extended attributes.
          • Even when some users have been able to successfully impliment and roll out an extended ACL model, mixed environment clients have often rendered their designs useless, or too difficult to predict. The most common solution is to create bash scripts to set extended attributes, and run the script on a schedule, typically once an hour, to fix any potentially broken extended security attributes. This is simply not a reasonable solution in any security-sensitive context!
      • Mostly due to these issues, and to the best of my research, nobody is using the ZFS extended ACL system in a serious, functioning, live environment.
      • This author at least highly advises to avoid the ZFS extended ACL model (and therefore the built-in CIFS/SMB server) until it is "fixed" and made usable.

      The only other viable alternative is samaba.org's Samba service. The benefits include:

      • Samba works and is managed on OpenSolaris almost exactly as it is on any other platform.
      • Samba has a huge base, much larger than OpenSolaris itself. Samba is running in countless mission-critical production environments globally, on numerous host operating systems.
      • Samba is very well supported and updated frequently.
      • samba.org is working on supporting SMB2, Microsoft's total rewrite of the SMB/CIFS protocol for Vista and Windows 7. It is highly unlikely that SMB2 could be supported by an effort as small as OpenSolaris.

    Samba reference

      Section contents

      Table

      Purpose Command(s) As of Comments
      Install Samba packages

      pkg install SUNWsmbs SUNWsmbskr

      2006.09
      Configure PAM

      Add the line "other password required pam_smb_passwd.so.1 nowarn" to /etc/pam.conf, if it is not there already. Steps:

      • Display file contents
        • clear; cat /etc/pam.conf; echo
      • Check if file contents the line, typically at the end of the file. If it does not exist, add the line to the end of the file
        • pfexec echo "other password required pam_smb_passwd.so.1 nowarn" >> /etc/pam.conf
      2006.09
      • These steps tie OpenSolaris user account passwords with Samba passwords. In theory, when an administrator or user changes his/her password, it will now propagate to Samba as well.
      • Existing user account passwords will not automatically be synced; the next command reference below accomplishes that.
      Set passwords of users who will accessing the shares pfexec passwd {existing OpenSolaris user account name} 2006.09
      • This step, when done any time after the steps from the previous command reference, synchronizes OpenSolaris user account passwords with Samba. If the OpenSolaris user account is not already recognized by Samba, an entry will be automatically created.
      • Samba does not support remapping of user names and/or groups when running in Workgroup mode; therefore set passowords for actual OpenSolaris user accounts you have already created.
      Remove Windows network username and groupname equivalencies; if any exist, it could cause problems in Workgroup mode. pfexec idmap remove -a 2006.09 This step is not necessary if you have never set up Windows network username and/or groupname equivalencies.
      Start or restart Samba and related services

      Steps:

      • pfexec svcadm disable smb/server
      • pfexec svcadm disable idmap
      • pfexec svcadm disable network/samba
      • pfexec pkill nmbd
      • pfexec svcadm enable -r idmap
      • pfexec svcadm enable -r network/samba
      • pfexec /usr/sfw/sbin/nmbd

      Or as a single command entry:

      • pfexec svcadm disable smb/server; pfexec svcadm disable idmap; pfexec svcadm disable network/samba; pfexec pkill nmbd; pfexec svcadm enable -r idmap; pfexec svcadm enable -r network/samba; pfexec /usr/sfw/sbin/nmbd
      2006.09
      • These steps also disable the internal CIFS/SMB service ("smb/server") and does not re-enable it.
      • They include "failsafe" steps to make sure the nmbd service is running, which is required to enable network browsing.
      • You will likely see a warning message to the effect of: "svcadm: svc:/milestone/network depends on svc:/network/physical, which has multiple instances." This is expected and can be safely ignored.
      Optional: Enable Samba web administration tool (SWAT)

      Steps:

      • pfexec svcadm enable swat
      • Verify that service is running
        • svcs -a | grep swat
      • Update your login rights if necessary, to be able to use SWAT
        • rolemod -K type=normal root
      • Open web browser to Samba Web Administration Tool
        • If logged into to local console:
          • http://localhost:901/
        • To access remotely:
          • http://{server name}:901/
      2006.09

    Samba troubleshooting

      Section contents

      Table

      Purpose Command(s) As of Comments
      General

      Steps:

      • Restart Samba and related services, according to reference from previous section.
      • Verify Samba is running
        • svcs | grep network/samba
      • View Samba log, look for errors
        • cat /var/samba/log/log.smbd | more
      • Verify nmbd is running (required for network browsing)
        • ps -ef | grep nmbd
      • View nmbd log, look for errors
        • cat /var/samba/log/log.nmbd | more
      • Verify that Samba is using the same smb.conf file that you think it is
        • smbd -b | grep smb.conf
      • Verify that smb.conf syntax is correct
        • testparm /etc/sfw/smb.conf
      2006.09
      Error message from Windows clients trying to connect: 'The specified network password is not correct.'

      Possible causes and fixes:

      • Windows-to-UNIX credential remapping is in place, and you are in Workgroup mode. Solution steps:
        1. Remove Windows-to-UNIX credential mapping
          • pfexec idmap remove -a
        2. Delete and re-add passwords then re-sync, for users trying to access shares. (System will prompt for password 4 times.)
          • pfexec smbpasswd -x {username}; pfexec smbpasswd -a {username}; pfexec passwd {username}
        3. Restart Samba.
      2006.09
      Can't ping host by name from client

      Possible causes and fixes:

      2006.09
      Error logged in /var/samba/log/log.smbd: 'getpeername failed. Error was Transport endpoint is not connected'

      This problem is typically caused by conflicting CIFS ports. Solution steps:

      1. Tell Samba to listen on port 139 rather than 445. Verify presence of or add to smb.conf
        • pfexec nano /etc/sfw/smb.conf
        • Look for or add, "smb ports = 139" (no quotes)
      2. Stop listening on port 445.
        • pfexec bash
        • close port 445: iptables -I INPUT 1 -p tcp --dport 445 -j DROP
        • exit
      3. Restart Samba.
      2006.09

Miscellaneous

    Section contents

    Table

    Purpose Command(s) As of Comments
    Force Gnome Terminal to always open a specific size, no matter where invoked from

    Steps:

    1. pfexec gedit /usr/share/vte/termcap/xterm
    2. Find something around line 10 that looks like this
      • :co#80:it#8:li#24:\

    3. Change the first and last number, for example to double the default X and Y sizes
      • :co#160:it#8:li#48:\

    4. Save and close the file.
    5. Close any existing Gnome terminals in your session.
    2006.09

OpenSolaris: Ready for Prime-Time?

Table of contents

Introduction

    I've been running an OpenSolaris server for about a year now, in a live "production" environment hosting approx 8 TB of home/family/personal data. (Mostly photo and video data, which double in size approx each year. In other words, each year, we generate at least as much photo and video data as all of our digital lives before that year--a fairly common phenomenon for households and businesses.) Currently I am running the most recent "stable" release: build snv_111b (version 2009.06).

    Primer: OpenSolaris is the first (and currently only) open-source version of true UNIX (as defined by the Open Group who owns the "UNIX" trademark). It was derived from Solaris (which is still closed-source and sold commercially), a branch of UNIX owned by the former Sun Microsystems (recently purchased by Oracle). As such, it is a "POSIX"-like operating system, or at least it is supposed to be. Solaris and OpenSolaris are primarily marketed as server operating systems, and secondarily as a desktop operating system.

    In a nutshell

    • The ZFS file system is amazing, and almost makes up for everything wrong with OpenSolaris.
    • OpenSolaris is absolutely not ready for production server use, much less "mission-critical" production.
    • OpenSolaris requires way too much manual intervention, and too many workarounds, hacks, and intensive research on-line to keep running. If you insist on trying it, just be prepared to pull a significant amount of hair out.
    • OpenSolaris as a hassle-free, productive desktop OS? Not gonna happen! Or notebook OS? Even crazier still. Those are actually advertised usage scenarios. No way Hose A.
    • I do realize and appreciate that OpenSolaris is free/libre, and that many people vastly more talented than I (most employed by former Sun Microsystems) put in countless hours developing it...so that I can download it for free and criticize it. This doesn't change the fact that it is just not ready for prime-time. Yet, or likely anytime soon. And that is my opinion.

    Upshot

    • OpenSolaris shows incredible promise and potential. Version 2010.03 is right around the corner. There will be some ZFS enhancements that make it even more attractive, but from what I understand so far, few if none of the gripes I have below will go away.
    • I would actually encourage using OpenSolaris if only for the reason of having a large, supportive community to encourage more and faster development on it, in order to start fulfilling it's great potential as a killer server environment. Just not in a production environment! At least not yet...and not for me--I don't plan on running it in the future again
      • The only thing I really like that is unique to OpenSolaris now is the ZFS file system, which is being ported to other operating systems as I write this, some even in semi-stable release (e.g. FreeBSD and Nexenta). Give me ZFS on a more classically POSIX-like OS any day.

Background

    Hardware specs

    • Intel 5400 motherboard chipset.
    • Two quad-core Xeon CPUs.
    • 16 GiB of RAM
    • Rackmount chassis with 20 hot-swappable SATA bays.
      • Most bays populated with 7200 RPM, 1 TB, SATA-II drives
        • Typically arranged in mirrored pairs, striped as a single pool.
        • Each mirror pair typically consists of:
          • One "server-grade" SATA-II drive.
          • One "consumer-grade" SATA-II drive from a different manufacturer but otherwise purchased at the same time.
          • This strategy seems to be a reasonable compromise between the slightly longer average life of so-called "server-grade" drives, and the cheaper price of consumer grade.
      • Several "permanent" SDD drives mounted inside for things such as ZFS L2ARC and ZIL.
    • 26 SATA ports
      • On-board ICH (6 ports)
      • SiI 3114 adapter (4 ports)
      • Two LSI SAS HBAs (8 ports each)
    • Dual gigabit NICs
      • One Intel e1000 for max OpenSolaris compatibility
      • One NetGear, just so the second one isn't the same brand or chipset as the first. (Which makes sense when you understand the maze of network administration tools and how much more complicated the same make and/or chipset would make things.)
    • 1,400 watt power supply (about 250 w continuous draw when idle).
    • Humdrum NVidia video card. (Do not attempt OpenSolaris with anything else or you will be sorry, even though you probably won't care for accelerated graphics).
    • Fair-sized UPS capable of keeping it alive for about 20 minutes.

    Server and administrative use case

    Even though it is "just" a home-based file server, nevertheless it must support occasionally high-bandwidth multiple access scenarios (high IOPS and throughput). Furthermore, the requirement of uptime is no less important to me than an enterprise server--not counting in dollar terms. (And even though it has yet to achieve anywhere near the number of nines required.)

    Let me also clarify my own role as something of a quasi-"expert" OpenSolaris server administrator:

    • I don't want to be a server administrator. Not a Windows Server admin, not Linux, not UNIX. I hate administering servers. I just want the damn things to...well, serve.
    • It's true that before having a family, I did have luxury of being an "OS fiddler/tweaker". Those days are long over.
  • Because I'm just a geek in general, I tend to do things that demand much from my computer systems--more than they can typically reliably deliver off-the-shelf.
    • This is why I always build my own computers (save for notebooks)...at least, before I had children.
  • The most frustrating--if not outright insane--part about the current state of computer technology, for me, is storage reliability.
    • Every year at least three hard drives bite the dust on me (usually more--once a record 12 drives in a single year).
      • This is due to the simple math of hard drive lifespan, multiplied by the number I use every day or often enough.
    • Recovering from drive failure means either restoring from backup, or more likely, rebuilding the OS from scratch if the drive was a system drive.
      • I always keep data on separate drives, and run mirrors, parity RAID, or some kind of redundancy. This makes life a little easier; just throw the bad hard drive away, insert a new one, and let the array rebuild.
    • Lately my data storage needs are growing so rapidly, that a regular ol' Windows Server and/or Linux box with RAID-5 just isn't cutting the mustard.
    • After much research, I jumped into OpenSolaris for it's amazing ZFS file system. I don't regret the move for discovering the magic of ZFS, but I very much do regret it overall.
  • The knowledge I gained on OpenSolaris doesn't even apply to any other operating system; not Linux, BSD, or other UNIX or UNIX-like derivatives. It is just sunken brain cells. Had I known the learning curve ahead of me then, I would have figured something else out--even just cutting way back on my production of digital media so as not to need ZFS in the first place!

The Good

    ZFS pool redundancy magic

      RAID-Z
      • RAID-Z does not suffer from the potential "Write Hole" data corruption problem, that most other parity-based RAID implimentations (e.g. standard RAID-5) may experience under certain conditions (such as an unexpected controller, service, or system failure).
        • I experienced this problem more than once, about a decade ago on some early versions of Promise' cheap consumer-grade "fake RAID" controllers.
      • ZFS offers single-parity (e.g. RAID-5), double-parity (aka "RAID-6"), and triple-parity options for RAID-Z.
        • Single-parity RAID is essentially dead (for modern production use), due to a risk of data loss during a post-failure rebuild period that is recently approaching "certainty".
          • The problem is that hard drives are getting so large (doubling in capacity roughly every year), that to rebuild an array after a single drive failure takes so long, that the odds of another drive in the array failing is approaching mathematical certainty. With single-parity, a second drive failure before recovering from the first, would lose the entire volume and data in it.
            • I have unfortunately experienced this problem as well. One drive on a Windows Server RAID-5 volume died. While the volume was rebuilding on a replacement drive, a second original drive died, which resulted in total volume loss. Fortunately I never relied on storage redundancy as a substitute for backups.
          • Double-parity RAID buys a few more years in the useful lifetime of parity-based RAID, and triple-parity RAID buys a few more years on top of that.
            • Eventually however, as long as we use spinning platters and they double in capacity each year, no amount of redundancy will solve the exponential problem.
            • The only robust long-term solution that scales much more reasonably with exponentially increasing drive capacity, is currently (and will probably always involve), simple mirroring at it's core. (Whether or not other solutions for data protection and/or performance are layered on top or underneath, such as striping.)
      Mirroring
      • A pool (aka "volume" in other OS parlance) is always striped across available virtual devices. Considering that a "virtual device" can be a single drive, a file, or an entire array, one begins to see how flexible a ZFS pool can be.
      • Mirroring is potentially more expensive than parity-based RAID.
        • The storage cost for parity RAID can be expressed: [$ per usable TB] = ([number of drives] * [$ per drive] / [TB per drive]) / (1 - ([parity count] / [number of drives])).
          • With a reasonable configuration, this is (variably) cheaper than mirroring.
        • For simple mirroring, the cost formula is equally simple: [$ per usable TB] = ([number of drives] * [$ per drive] / [TB per drive]) * [drives per mirror].
          • In other words, for the common case of two-drive mirrors, your cost per TB doubles.
          • However if you consider the fact that the price/capacity ratio for drive storage is cut in half every year, all this really means that you will always be one year behind the price/capacity curve. Doesn't seem so bad in context, for the benefit of mirrored data.
      • Mirroring can--and, depending on the use case, should--be used in conjunction with other solutions such as striping.
        • E.g., RAID 1+0 or just "10", which means data striped across multiple mirrors.
          • Note: RAID 10 is not the same thing as RAID 01. RAID 10 is a "stripe of mirrors" and is much more tolerant of multiple drive failures. RAID 01 is a "mirror of stripes", is much less tolerant of multiple failures (in fact less reliable than just RAID 1 mirroring), and is generally not advised for any use case.
        • A mirrored set does not have to come in just pairs. Depending on the value of your data and/or your distrust of hard drives, then you may wish to have three or even four-way mirrored sets (e.g. all four drives contain the exact same data and any three drives could fail at once without losing the volume or its data).
      • Mirrored data is computationally cheap to manage. There is no Xor parity calculations to perform, so no processor offloading is necessary, and therefore mirroring done by software as fast (in real-world use) as hardware-base mirroring--and vastly more portably so.
      • Redundancy achieved through mirrors (especially stripes of mirrors), all else being equal, generally performs significantly better than parity-based RAID, whether or not hardware offloading is performed for parity calculation. This is one reason why parity-based RAID is often considered unsuitable for database use.
      • There is no significant write performance penalty with mirrors, as there is with parity-based RAID.
        • It should be noted however, that ZFS eliminates much of the penalty for parity RAID writes, with its "Copy-On-Write" feature. In this scheme, parity for existing data does not need to be read and recalculated during a write, as it does with more traditional forms of parity RAID.

    Other ZFS benefits

    • Native support for data snapshots, which allow administrators or users to roll back to an entire system state, or just individual user data files. (Something like Mac OS X's "Time Machine" only more robust and more natural for the file system to provide.)
    • All ZFS data, regardless of whether underlying redundancy is present or not, is checksummed on-disk. This adds extra protection against not only parity RAID data corruption, but other forms of data corruption. For redundant pools, it also allows for automatic self-healing of corrupted data.
    • The ZFS file system is comparatively fast, especially with built-in support for SSD drives for second-level read caching and write logging, which greatly increases the IOPS figure (generally meaning performing well under multiple simultaneous loads).
    • A ZFS pool may be non-destructively grown, fairly simply by:
      • Adding virtual devices to the pool. (A virtual device can be a RAID-Z or mirrored array.)
      • Swapping existing drives of a virtual device (e.g. array) out with larger ones (then letting the overlaying array rebuild in between each one). When all drives of an array are swapped out with larger ones, the envire virtual device (and the pool that contains it) will expand into the additional space.
    • The administrative model, even though it is terminal-based only, is fairly easy and straightforward.
      • That is, once you get drives up, running, and recognized prior to managing them with ZFS model.
    • Full support for hot-swapping drives.
    • Native support for transparent on-the-fly compression.
    • The pending next release (2010.03) will also support:
      • Native, on-the-fly block level data deduplication
      • Native, on-the-fly encryption

    You may notice that everything I love about OpenSolaris has to do with ZFS. In fact, there is almost nothing I don't love about ZFS, save for the extended ACL system (which is optional).

The Bad

  • Supported on precious little hardware.
    • It was an absolute miracle that some of my hardware happened to be supported, and then only because I happen to consciously try to use the most popular and widely supported hardware when building systems.
    • Even then, I had to swap out some hardware for stuff that it would actually work on (e.g. storage controllers, NICs, video card).
  • Existing knowledge and skill working with Linux--or pretty much any other UNIX-like OS--is more or less useless on OpenSolaris.
    • Although occasionally innovative and welcomed improvements, OpenSolaris relies on so many terminal utilities that exist only on OpenSolaris (some not even on its commercial Solaris sister), that you have essentially no advantage by being a whiz with Linux, BSD, or any other UNIX-like variant.
    • You might as well hop over straight from Windows (or nothing) and have just as steep a learning curve.
    • I have no idea how OpenSolaris manages to still be considered "UNIX" by The Open Group, owner of the UNIX trademark and standard.
  • The universe of additional software that can be installed is practically non-existent (compared to anything other server OS including Windows or Linux).
  • Regular updates for bugs? Forget it! Rarely if ever happen. If OpenSolaris doesn't work for you now, you should not realistically anticipate an inter-release patch to fix it.
  • Community support is threadbare. And even then, support comes from mostly programmers and hardcore professional system administrators--so the help often assumes significant foreknowledge of OpenSolaris particulars. The fact is, just not that many people use OpenSolaris.
  • Not to contribute to the already considerable (and mostly unfounded) FUD surrounding the purchase of Sun by Oracle, but there really is yet no roadmap or bankable statements from Oracle on the future of OpenSolaris. (But in fairness, OpenSolaris never has had a publicly communicated roadmap...which was also a major drawback.)

The Ugly

  • Don't even think about letting the system power off accidentally! If it does shut down accidentally, the odds are unconscionably high that it won't fully come back up again on it's own without significant administrative intervention. The various problems I've experienced after power outages were legion--too many to itemize. And most of them were the result of old known, documented, unfixed bugs. To safe yourself untold headache:
    • Get the biggest UPS you can't afford, dedicated solely to your OpenSolaris server.
    • Consider a backup generator to the UPS.
    • Put duct tape over all power buttons.
    • Make sure nothing else is stored in the same room--no other servers, not even a broom. The only time you want to be in the same locked room as the server, is when you intend to spend hours there.
    • Make sure the room the server is in, only be accessed by OpenSolaris admins.
  • Which reminds me: I can't reboot the system. It has to actually power off completely, otherwise it gets stuck in an endless self-reboot loop and never fully comes up. This makes remote management when reboots are required all but impossible (and forget about WoL support!). The problem didn't exist on version 2008.11, and I'd wager it won't on 2010.03 either. But it does exist on this hardware with multiple tries of installing the current 2009.06 release. It is a know bug yet to be fixed with an inter-release patch.
  • Managing network interfaces is incredibly clumsy and error-prone. I never could get Jumbo Frames and/or IPv6 working properly.
  • Bootup errors when mounting directories: If you suddenly can't log in, you may discover that there are no user home directories. (In which case you would be placed in a single-user Maintenance Mode anyway.)
    • It's because, sometimes, inexplicably, things mapped to the "/export" directory (possibly other directories as well) won't mount.
    • It is a known bug and involves the kernel trying to mount things in the wrong order (and then finding that the next directory to mount isn't empty as it could expect if mounts were executed in the correct order).
    • The manual fixes are an incredible headache, and not very well known as many people apparently just give up, reinstall, and/or move on to some other OS.
    • There is only one paragraph on the entire World Wide Web that documents how to fix it. (Well now two with my link below.) And I only found that after pouring through page after page of documentation, bugfix databases, forum posts, and chat transcripts. And it's a known bug!
  • Error messages are often cryptic, if not misleading or outright incorrect.
  • The built-in CIFS/SMB service, although easy enough to enable by itself, is tied very closely to the new ZFS ACL security model. While this new extended security model is a welcome move towards something like the more intuitive and powerful Microsoft NTFS ACL model, in practice it is a nightmare to configure and troubleshoot. And unlike the similar extended model on Linux, there are no GUIs available to help. And in this case, since the command-line model is so complex, a GUI is all but essential just to understand what security is in place at the moment.
  • Getting the alternative SMB service running (samba.org's Samba service), is rife with issues, bugs, and workarounds on OpenSolaris. (But once it works, it is arguably a better Windows file sharing service than Microsoft offers, though the same is true of Samba running on any operating system. And better yet, with Samba you don't have to mess with ZFS ACLs, instead using the simpler [albeit less powerful] legacy *nix ACL model.)
  • Getting a VNC server running requires way too many workarounds and tweaking of obscure features.
  • The ZFS Snapshot service can be very finicky and is easily broken.
  • Getting OpenSolaris to boot from a mirrored root volume is an exercise in sheer frustration. It requires all kinds of hacks and workarounds and frankly if I had to do it again, I wouldn't!

Conclusion and next steps

    The bottom line, for me at least, is that I have a file server now, finally, after a year of frustration and becoming something of an expert OpenSolaris server admin against my will. And inasmuch as it is in my control, it will never, ever be shutdown or rebooted again!

    • I will still have to doubling available storage every year, without shutting the server down or rebooting. Fortunately I can do this, without necessarily even logging in to the console or remotely.
    • Assuming I can string enough UPSes together (because I refuse to turn it off which would be required to exchange for just one big UPS), it will survive future and too-frequent northern California storm blackouts. 12 hours seems to be an extreme upper bound without power here, and I don't think that is too unreasonable for just one server and a bank of batteries!
    • I have resolved to not touch the server it at all, and just let it operate for as long as it will. The only real hardware failures I experience in that kind of scenario (e.g. in a secure, clean, and stable environment) are drives dying, which is fine; I just replace those while everything stays running and everything adjusts automagically and non-destructively.
    • Upgrading to version 2010.03 (out in mere weeks) is completely out of the question! It took me months to "recover" from upgrading from 2008.11 to 2009.06, I've just recently gotten things stabilized, so there is no way I would voluntarily subject myself to that kind of torture again.
    • It seems reasonable that it should be able to run continuously for the next five years. I've easily achieved 1+ years out of Windows Servers on cheap consumer hardware before, sitting out in the open in high-traffic rooms, and the only reasons for eventual downtime were to replace dead drives. Take dead drives out of the equation (with hot-swapping), and consider that even OpenSolaris should in theory be more solid than Windows Server, and that the server is perpetually clean and safe; and five years does not seem unrealistic to me.

    In short, I only plan on running OpenSolaris for as long as my current version continues running without intervention other than hot-swapping dead drives. Once it gives up the ghost, surely by then the ZFS file system will be supported more fully on other operating systems (it is already brewing on BSD, Nexenta [an OpenSolaris/Debian hybrid], and Linux [for now only in userland space until Oracle changes the license model]). Or, there will be another file system as robust finally out of the gate (e.g. Oracle's own btrfs). Either way, I do not anticipate that I will ever find myself running OpenSolaris (of any version) again, and I find that unfortunate--even sad (but not unfair)--to say.

This work is licensed by James R. "Jim" Collier in 2010 under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.