"Crichton: What's going on?
 D'Argo: I think they've got an override of our override of their override."

Farscape Series 4, Ep. 21
"We're So Screwed, Part 3: La Bomba"


+==============================================+
| YAP2LC - Yet Another Passwd 2 LDIF Converter |
+==============================================+

@2003 v.0.5.9 rel. 23.12.2003
Radu Negut -- radun@romsys.ro

This software is entirely licensed under the terms of the GNU Public License.


DISCLAIMER!
-----------
As much as I'd wish it was, this program isn't perfect quality, bug-free code.
While I stride to make it so, there may be many things that've escaped my attention.
Please remember this program works with user accounts, their passwords and other
sensitive data. USE BACKUPS! On systems where we can process arbitrary passwd and
shadow files, make it so! Copy the originals somewhere and work ONLY on those COPIES.
Under no circumstance will I be held liable for any possible occuring damage.


ACT NOW!
--------
Immediately go to www.savefarscape.com and help save one of the best SciFi TV shows of
all time. If you have no idea what the show is all about, either try watching one of
the reruns on SciFi Channel or try grabbing one of the DVDs. Ideally start watching it
right from the first ep. Believe me, it's not cool, it's FROSTY!


Introduction
------------
So you've decided to move to LDAP authentication, even storing most application configuration
inside the directory, or, worse yet, you HAVE TO move on to LDAP and you have only a couple
of days available. If you've got only about two dozen users, that's ok; you spend an
afternoon together with your favorite coffee machine, and you enter all data by hand. But
what if you've got, say, 30.000 (thirty thousand) user accounts? In order to save people from
visiting the intensive care unit at the nearest hospital, I've written this little program.
As all of this is (mostly) text manipulation, it's no wonder that all other similar migration 
tools I've seen on the net are written in Perl or other scripting languages and are fairly slow. 
You may say to yourself: "So what? I only need running this once; I don't mind if it's not fast"
Right... The problem comes from misshapen account data (missing real names) or dumb problems
such as importing locked out or system accounts. Not to mention duplicate real name entries.
Unfortunately you only get a feel of this when you try importing the LDIF file, when it's
already too late. You usually end up running and editing scripts and account data for a couple 
of days until you're completly satisfied. That's why having a fast C program do the dirty work 
is better. Actually RFC 2307 pretty much tells the entire story; including implementation
suggestions.


Changes
-------
Mail alias processing has finally made its way in, with yap2lc supporting 2 database back-ends.
As a result, configuration and compile time switches were added, to allow the user to select
the appropriate back-end. Check the alias processing section below for all the gory details.
Also, since mail aliases can be really long, several buffers got botched up a notch. If you
still run into trouble, you can always tweak "yap2lc.h" by hand.


Installation
------------
Unpack the tarball; depending on your system, one of the following might do the trick:
tar -zxvf yap2lc*
or
gzcat yap2lc* | tar -xv
or
gunzip yap2lc* && tar xvf yap2lc*
Next switch to the newly created directory and run:
./configure --help
On older csh versions you may have to run
sh ./configure --help
This will show a set of compile time options you can build yap2lc with. Depending on whether
you want alias processing or not you can run ./configure by itself or with "--with-..." switches.
Most likely you will want to run:
./configure --with-bdb-inc=/usr/local/BerkeleyDB.4.2/include \
  --with-bdb-lib=/usr/local/BerkeleyDB.4.2/lib --with-bdb
or whatever your path to Berkeley DB looks like.

Note: Yap2lc REQUIERES at least BerkeleyDB v4.X to process DB files. Any version prior to that,
like the one that usually gets shipped with Linux or FreeBSD, in order to dynamically link sendmail
to, is NOT currently supported. You can get the latest version of BerkeleyDB from www.sleepycat.com .

Check the output for any missing stuff. If you have unusually long real names you may wish to
increase the fixed buffer sizes in "yap2lc.h". What you should be looking out for is MAX_LDIF_SIZ
out of which MAX_VAR_ALLOC gets subtracted at run time; in the improbable situation that you
get truncated entries, try increasing the values. Don't be worried; if your system supports
over 1024 characters in the real name field and you actually made use of this "feature", then
you can worry. Next, compile the program by running:
make
If it doesn't compile, let me know about it (radun@romsys.ro). Depending on library locations, you
may wish to tweak LD_LIBRARY_PATH. Personally I've tested it on Linux/i386 (RedHat-7.2, 8.0, 9.0; 
Slackware-8.0), Solaris/sparc (8, 9) and FreeBSD (4.3) Stress tests performed with 60.000 (sixty 
thousand) users on an Athlon XP 2000+ and 512 Mb RAM gave the following results: with duplicate 
entry removal: ~500 millisecs; without duplicate entry removal: 2 secs. On an UltraSparc IIi @ 400Mhz 
with 384 Mb RAM I got the following, using the same number of entries: with duplicate entry removal: 
~300 millisecs; without duplicate entry removal: 15 secs. The lack of duplicate entry removal slowed 
things down because the files were created by copying the same accounts over and over again and so 
the program had to write a lot more to the outfile. Also all testing was done without any mail alias 
processing, since in this case processing depends a lot on alias nesting depth level.


Compatibility and limitations
-----------------------------
The first major issue is on BSD-style systems, where the absence of fgetpwent() makes working
on alternate files/copies impossible, meaning that on those systems we have to work on live
passwd/master.passwd. USE BACKUPS!
The second major problem is entry synchronization. Yap2lc processes the passwd and shadow
style formatted files in parallel and reads every entry successively. If the files aren't
sync'ed, users will end up with passwords belonging to other users inside the directory.
Yap2lc has quite a couple of filters and sanitizing routines to automatically remove unwanted
or unnecessary accounts as well as misshapen stuff that the LDAP server would not accept.
In other words, hand editing of these sensitive files is at best useless, let alone dangerous.
I have no experience whatsoever with locale stuff; no idea if and how it finds it's way into
user authentication. If your user's real names already made it into passwd, it means they're 
probably already UTF-8 encoded (see RFC 2849 for details).
You should also be aware that the filters yap2lc applies are chain linked and cummulative.
This means that each and every entry successively passes each filter and only if it passes all 
of them it gets committed to the LDIF file. Therefore adding up all the filter counters may
result in a larger number than existing user entries. The log file will have individual entries
for each and every problem that a particular account presents.
Yap2lc cannot (at this moment) process CDB databases; this means that if you're running qmail
with CDB you must convert the text version of your aliases to a supported database type first.
This is quite straightforward if you've got sendmail/postfix on another machine. Just be careful
on where you run 'newaliases'/'yap2lc' since databases are sensitive to machine endianness. There
is another issue regarding the version of the DB API, which is discussed below.


Mail alias processing
---------------------
As of version 0.5.9, yap2lc can automatically process and insert any mail aliases a user has on
the system. Yap2lc currently suports DB (Berkeley) and NDBM (standalone, not through the DB API).
Unfortunately, since nothing is ever perfect, there are some considerations. First of all, yap2lc 
only accesses the binary database that the MTA builds from the plain text file. Usually this file 
is called "aliases.db" and resides in either /etc or /etc/mail.
Therefore, adding any new gizmos without running "newaliases" or equivalent commands, will not
show through to yap2lc (just as they won't in mail relaying either). 
Second there is database compatibility. Most out-of-the-box sendmail installations (for
instance Solaris 9) come statically linked to the old Berkeley DB version which can't be 
processed by the new API unless the database is upgraded. Yap2lc automatically detects such
old type databases and is capable of upgrading them by itself. However, there are a couple of gotchas:
once the database gets upgraded, the old MTA won't be able to read it (use BACKUPS!); second, 
databases aren't portable across different byte order architectures, so upgrade the database ONLY 
on the same type of architecture that it was created on (big endian ONLY for big endian and little 
endian ONLY for little endian). Failure to observe these issues results in corrupted databases.
For semi-old versions of the DB API (which used functions without transaction id's) yap2lc has 
built-in auto-detection and uses the old style prototypes automatically.
Regarding nesting levels, yap2lc can process aliases of an unlimited nesting depth, given there is
enough memory to do that. However, as you might imagine, heavily nested levels will take more time
to process.


Configuring and running YAP2LC
------------------------------
After building yap2lc, move the binary to some secluded directory and, depending on your user
database / ldif template size, where there is lots of space. Have a look at the sample config
and LDIF template files in the "samples" directory. Copy the sample "yap2lc.conf", your /etc/passwd
and /etc/shadow (only for non-BSD systems) and then optionally either copy and edit the supplied 
"ldif.template" or create your own. Before you fire up the program, let's go over the configuration 
options for a sec:
NOTE: when I talk about "removing" and "filtering out" I mean: not allowing those accounts
      becoming LDIF entries, I am not deleting anything from the original files. Subsequently
      running yap2lc, however, overwrites its own output and log files.

Command line switches (if none are present only the conf file is used):
    -p operate in "pipe" mode. This will make yap2lc read the LDIF template from stdin and write
       its output to stdout. If used without any other switch, no filtering is performed.
    -c read the yap2lc.conf file from the current directory and apply all filter definitions while
       still operating in pipe mode. passwd and shadow files are read from the current directory.
       Logging, outfile and ldifTemplate directives within the conf file are always ignored
    -d instead of reading passwd and shadow files from current directory, read them from /etc
    -l write log to ./yap2lclog
    -h print a lovely help screen
NOTE: you can combine several cli switches together, but the -p switch is mandatory (apart from
      -h, of course :) )

Config file directives:
- in the FILEPATHS section you should specify the filenames of everything involved in the
  process. If you just copied and edited the files, the defaults should work out of the box.
- the MAIL ALIASES section helps you tweak the way yap2lc should handle the alias database:
      - 'processAliases': this switch turns on alias processing and lets yap2lc assume you
         want it to automatically fill in ALL mail aliases the account has on the system. This
	 means that once we stumble upon the $MLS token you will get as many alias entries as the
	 particular account has, not just one line.
      - 'aliasesDBType': select the type of back-end you have on your system; currently only
         Berkeley DB (bdb) and ndbm are supported. All MTA's know how to use BDB, but some have
	 moved on to newer stuff, which will be supported later on. NDBM is here only for historical
	 databases/purpose and no nesting processing is currently supported for this database type.
      - 'aliaseLdifEntry': what kind of tag you want mail alias addresses to have prepended. The
         supplied example is taken from the RFC. We need this since yap2lc has to build alias
	 addresses by itself, not by relying on template parsing.
- the FILTERING section allows you to specify how exactly you want yap2lc to select stuff out
  of the mess:
      - 'sanitize': this is the filtering master switch; 'no' means no filtering whatsoever;
      - 'cryptScheme': this is what gets prepended in front of '$PWD' and specifies how the
         password should be "interpreted". The default, "{crypt}", should work just fine, but
	 this is now user specifiable for better control.
      - 'removeDuplicates': whether duplicate real name entries should not be committed; this is
         a pretty useful switch since duplicate real names are not a problem in Unix authentication
	 (only user names have to be unique), yet in LDAP this produces duplicate CN entries which
	 are not allowed; trying to import duplicate entries is impossible.
      - 'allowNoPwd': whether passwordless entries should be comitted; LDAP doesn't mind if entries
         are unpassworded, but you should; this referrs only to entries that either have a zero
	 length password or have "NP" in their password entry, NOT locked out accounts.
      - 'allowLockedOut': whether locked out accounts should be committed or not; this referrs to
         accounts having either "!!" or "*LK*" as their password; this is up to you; usually
	 system daemons have this, but it's easier to filter them out via the UID range. If you
	 believe those accounts might after all be needed inside the directory, set this to 'yes'.
      - 'idAccountStatus': if you want yap2lc to automatically commit locked out accounts as inactive
         answer 'yes' to this one. In order for this to work you will also have to say 'yes' to
	 'allowLockedOut' and have "accountStatus: $YAS" in your LDIF template.
      - 'minUID' and 'maxUID': here you can specify a certain UID range that you want yap2lc to
         process. This usually speeds things up since: a) you most likely do not need those accounts 
	 and b) this separates system accounts from user accounts in a snap (e.g. under linux user 
	 accounts start off at uid 500, under Solaris at uid 100, etc.).
      - 'excludeGID': if you wish to exclude certain user groups from being processed just put their
         GIDs here. You can specify several GIDs by separating them with commas, but do not insert
	 any spaces between the numbers and the commas. You can enter up to 10 GIDs, which I believe
	 is quite reasonable.
      - 'removeByRegexp': whether you want to remove accounts based on a regular expression
      - 'filterRegexp': the regular expression you wish to use
      - 'applyRegexpTo': this specifies upon which entry field you want the regex matching be performed
         Currently this switch accepts only two values: 'uname' and 'rname', the user login and
	 real name; you can only specify one field.
- the LOGGING section allows you to selectively log the entries that got dropped by the filters. The
  log is in the format of "uname:uid:reason_why_dropped". Logging allows you to easily identify and
  correct misshapen entries.
Inside the LDIF template, yap2lc recognizes the following tokens:

- '$UNM': the user's login name
- '$RUN': the user's real name
- '$UID': the user's UID
- '$MLS': followed by @domain.dom; this is where mail aliases wil be inserted
- '$PWD': the user's password, prepended with whatever you specified in "cryptScheme"
- '$YAS': automatically fill in account status
- '$GID': the user's GID
- '$HDR': the user's home directory
- '$LSH': login shell
- '$SLC': shadow last change
- '$SMN': shadow min
- '$SMX': shadow max
- '$SWR': shadow warning
- '$SIN': shadow inactive
- '$SXP': shadow expire
- '$SFG': shadow flag

The tokens do not have any space pre/appended, e.g. "$UNM@bogus.net" is (as far as data is correct)
a valid email account. The only noted exceptions are $PWD and $MLS. One neat trick, for instance, is 
to have the user picture specified as "$HDR/$UNM.jpg".
Now you're ready to go; fire up "./yap2lc" and wonder how much all of this will take. If all goes
well you should see a nice stats screen at the end. Consult the log file for accounts that did not
pass some filter and why.
Optionally you can run yap2lc against the sample passwd and shadow files inside the "samples"
directory. These files contain accounts that have all the problems yap2lc can handle, so you can
get a feel of how all the sanitization and filters work. The supplied aliases.db and its text
version purposely have a twisted alias nested 6 levels deep (which rolls back on a previous
alias, although they do NOT loop).

Drop me an email if you find it useful/ brainkilling/ buggy/ use it/ hate it/ like it/ etc.
Coherent bug reports and constructive criticism are most welcome.


Have nots (TODO ?)
------------------
What yap2lc is most definitely lacking is support for NIS/NIS+ and various IP Services targets.
RFC 2307 has the complete list but I wondered if all those are really used out in the real
world, so I dropped them.
CDB support is the priority right now; the problem comes from the fact that the offial source
wasn't designed to be an API, and most users running CDB with qmail will most likely lack a
CDB library. Therefore yap2lc will have to have all CDB functionality built in.

Changelog
---------
v0.5.9 - added support for migrating unlimited nesting level mail aliases for two database back-ends
v0.5.5 - added $UID token to fill in user UIDs
v0.5.4 - added the ability to process data through pipes and necessary command line switches,
	 ability to use/ignore configuration filters in pipe mode; fixed a small leak in regex
	 processing
v0.5.1 - fixed a conf parsing bug and password sanitization under BSD-style systems
v0.5.0 - duplicate entry removal; regex entry filtering; added support for selectively logging 
	 dropped entries; improved "general purpose" sanitization; added a (hopefully useful) stats 
	 screen; modified the filter chain behavior; account status is now optionally automatically 
	 filled in; all shadow field targets are now supported; user specifiable crypt scheme
v0.4.1 - added UID range based account filtering
v0.4.0 - token based, parsed ldif template ensures easy extensibility to any number of LDAP profile 
	 entries; increased processing speed; removed the (now useless) stub section; split the ldif 
	 template and program configuration into different files
v0.3.0 - first public release; using arbitrary input files; sanity check; using stub section; 
	 processing timer; support for BSD-style systems
v0.2.0 - fixed a disgusting leak, modified password processing
v0.1.0 - first unpublic release

