perlography Manual

This is a Perl script for doing bibliographies using the same data structure as BibTeX. It has been tested (but not extensively) in a UNIX environment and under MacOS running MacPerl.

Table of Contents


Why change?

For its intended use, BibTeX remains an excellent choice. The reason I began writing Perl scripts to do the same thing is that I have more uses for bibliographic data than just making TeX bibliographies. Now that TeX documents can be hyperlinked and now that some bibliographies are on web pages, I needed to be able to produce HTML versions of bibliographies from my .bib files.

Norman Gray did a nice port of the standard BibTeX style to an html version and I used this for a while, but in the preamble Gray notes:

% this produces a file which is a ‹dl›...‹/dl›, which should be incorporated
% into another html file somehow.
% There will still be ~ and -- within the output file (it's too difficult
% to get rid of them here). A post-processor should turn these
% into either   and &enspace; or ' ' and '-' as required.
For example, to maintain my Publications web page, I needed to modify Gray's style and then doing a post-processing on the .bbl file. Originally I only did the post-processing in Perl, but I eventually longed for one program to do the entire job: hence Perlography.

Perl seems a nice choice to overcome the "it's too difficult to get rid of them here" problem. As a major programming language supported on nearly all platforms, Perl has many available ways to learn the language and also has debugging facilities. Having programed in both BibTeX and Perl, I prefer Perl.

An overview.

Perlography is intended to be a BibTeX emulator on BibTeX data. If passed a .aux file which works in BibTeX, it should produce identical output, provided someone has written the Perl versions of the needed style files. Anyone wishing to add to the code is welcome. Bugs in the code will also be contemplated by the management, but as Perl code is just a text file, actual corrections would be even more welcome. In any case, you are welcome to email to taylor.2@nd.edu.

Being somewhat busy with other projects, I have not produced a full-fledged Perl version of any BibTeX style file, but I have produced a Perlography style file, "small.pst" which is a subset of the standard BibTeX "plain.bxt". It does articles, books and "articles in collections" and outputs the entries in input order, which at the moment meets my needs. There is also the style file "LRT_web.pst" which does my Publications web page. These can be saved as TEXT files and are functioning Perlography style files.

Input files

The standard BibTeX arrangement of a .aux file with a style string and a data string still works. The .aux file should literally work with no changes since Perlography will append the .bib and .pst as needed. Just Drag&Drop the .aux file onto the Droplet in MacOS or use the command line interface in UNIX and you should get the expected output.

Unlike BibTeX, the .aux file may have more than one "\bibdata" entry, in which case they are all processed using the given "\bibstyle" entry. If you have a "\bibdata{*}" entry, then all the .bib files in the folder containing the .aux file are processed.

Perlography also supports a second mechanism. If the .bib file has, as its first line, a string with a .pst at the end, then that string is assumed to be the name of the style file and the rest of the .bib file is used as the .bib file. This file can be used by itself - no .aux file is needed. These .bib files will be referred to as "loaded". If a .aux file calls one of these loaded .bib files, the style file from the .aux file is used and the initial string in the .bib file is ignored. As with BibTeX, text not related to an entry is ignored.

Perlography supports "\cite" the same way BibTeX does. In particular "\cite{*}" processes all the entries in the .bib file. If you are using a loaded .bib file, again all the entries are processed.

Getting started.

Go to the Perlography page and copy the code. In UNIX, save the code in a file and make the file executable. In MacPerl, make the code into a Droplet and put the result where .aux files can easily be dropped onto it. In UNIX you will also need to adjust the shebang line to fit your system. Finally, set the path to the directory/folder where the Perlography style files are to be kept.

Then copy one or more of the style files here (or write some of your own) and you are ready to boogie. Create a .aux file and a .bib file (or just a .bib file with a first line the name of a .pst file). Send the .aux file (or the loaded .bib file) to Perlography using Mac Drag&Drop or the UNIX command line. An output file appears in the same location as your .aux file. The precise name of this file is under the control of the style file: in "small.pst" it is just the name of the .bib file with the .bib replaced by .bbl (just as in BibTeX) and in "LRT_web.pst" the final .bib is replaced by .html. In your style file it is whatever you request.

Writing style files.

For each entry type that you want to handle, you need an output routine. The name convention is that for an @article you need a "sub output_ARTICLE", for an @book you need a "sub output_BOOK", etc. (Case is important to Perl so while your BibTex .bib file can have @article, @ARTICLE or even @ArTiClE, the output file must have "sub output_ARTICLE". If the required subroutine is missing, Perlography just alerts you to this fact, skips the entry, and continues processing. The input to each output routine is just a pointer to the data for that entry so it is useful to understand the structure of this data.

As it is read in, each BibTeX entry is converted into a Perl associative array, or hash. The first field is 'KOE' (for kind of entry) as is ARTICLE, BOOK, or whatever. The next field is 'bib_entry' as is the key. The rest of a BibTeX entry consists of a name like 'author', 'title' or whatever, followed by an equal sign, an entry either quoted or braced and finally a comma. Each such entry is added to the Perl datum as a hash with key 'author', 'title' or whatever and value the actual data. The output routine is then responsible for taking this data and producing the entry in the output file. If there is no ouput routine for a particular kind of entry, then Perlography alerts you to this fact, skips the entry and plows on. For example, I like to keep my preprints at the end of my personal .bib file so I can accumulate the data for the entry gradually as it comes in, but I do not want this appearing on my web page. The style file has no "output_PREPRINT" entry so these entries are skipped. (Perlography also mimics BibTeX in that if there are fields in the entry that are not understood by a style routine they will just be ignored, so one may have fields which are used by some styles but not by others.

You may do preprocessing on the Perl data before it is passed to the output routine by means of subroutines "sub format_author" etc. where the field-name keys are all lower case. They too receive a pointer to the entry to be processed and are usually expected to alter the data for that entry in some way. As an example, the initial author-data for multiple authors looks like

author = "B. Hughes and L. R. Taylor and E. B. Williams",

in a typical BibTeX file, whereas the output is expected to look like
B. Hughes, L. R. Taylor and E. B. Williams

It seems natural to do this sort of thing to all the entries before beginning the output process, especially since you probably want to format all the author names the same even if they are eventually going to different output routines. You also need to examine the authors names especially before beginning the output since these names usually determine the order in which the entries should be output.

In any case, first all the "sub format_??" routines are applied to all the entries. It is not necessary to have a "sub format_??" routine for every field - for example, 'title' is often fine as read from the original .bib file. Then the entries are sorted and then they are output. There are a few wrinkles discussed below, but this is the basic strategy.

The sub format_author routine is special because authors are used in several different ways. Initially authors are names strung together with and's. But we need a way to get at the last names for such tasks as alphabetization. Finally we need to string the names together separated by commas except for the last entry which needs an and. The usual goal of sub format_author is to replace the initial author data with an array Here is a minimalist such routine:

sub format_author{ 
my($xyx);
$xyx=shift(@_);
&main::string_name_field(\${${$xyx}}{"author"},"N");
}

There are some routines in Perlography meant to help produce the output files. If these are not present in the .pst-style file, they are quietly ignored.

sub Prolog -- Usually writes a preamble to the output file. Better than the BibTeX @PREAMBLE (which is also supported) since the preamble text can be modified by Perl to reflect different situations. It can also do other initialization as required. It gets as input the name of the .bib file (with .prolog appended).

sub BBL -- It gets as input the name of the .bib file (with the .bib appended) and outputs a string which is the name of the output file (usually just the name with a .html or .bbl or whatever you need appended. If BBL is omitted, .bbl is appended.

sub EachEntry -- This is a routine which is executed just before each entry is sent to its output-routine. As input it receives two entry-hashes, the one which is going to be processed next and the one which was just processed.

sub Epilog -- Usually writes some closing text to the output file. It gets as input the name of the .bib file (with .epilog appended).

Useful routines.

There are a few routines in the main Perlography script which are useful in writing .pst files. Perlography puts the routines read in from the .pst file in their own namespace, so if you need to reference routines and variables from Perlography itself, you need to preceded them with 'main::' as below.

sub reverse_input_hash inverts the input order: put

sub compute_sort_hash{
&main::reverse_input_hash(@_);
}
in your style file.

sub standard_hash mimics the BibTeX usual sort, provided you have called the "sub string_name_field" subroutine earlier to breakup the author name string. This is usually done in the "sub format_author" subroutine. This sorts on the actual entries in the .bib file. If you just want first initials used instead of whatever is there as a first and middle name, use "Y" in place of the "N".

Rather than produce ever more convoluted code to sort your .bib file, consider the option of keeping it so that input order is the desired output order. A second approach is to define a field called, say "sort", so that sorting on it gives the desired order.

sub names_to_initials takes a string (think first and middle names) and returns a string with the names replaced by first initial followed by a period. If there are several initials, the middle spaces are replaced by ties (~). It is easy using Perl to either remove these ties or to add a final tie if either is desired.

sub string_name_field takes a string of names joined by and's and replaces it by an array with the following structure: the first string consists of all the string except for stuff related to the last name; the next string is the "von" part (the part in lower case); the next string is the last name; and the final part is the Jr., IX or whatever. Then this structure is repeated with the next name and so on. Parts which are empty are reliably set to empty strings. This routine takes a second variable, "Y" or "N". If the value is "Y" the first name is run through "sub names_to_initials", otherwise it is not. Perlography supports the BibTeX scheme whereby a sting of words inside a brace pair in treated as a single word.