|
This chapter covers an important but complex aspect of the Apache Perl API,
the process of controlling and customizing the Apache configuration process
itself. Using the techniques shown in this chapter, you will be able to
define new configuration file directives that provide runtime configuration
information to your modules. You will also be able to take over all or part
of the Apache configuration process and write Perl code to dynamically
configure the server at startup time.
The Apache Perl API provides a simple mechanism for passing information
from configuration files to Perl modules using the
PerlSetVar directive. As we've seen, the directive takes two arguments, the name of a
variable and its value:
PerlSetVar FoodForThought apples
Because Perl is such a whiz at parsing text, it's trivial to pass an array,
or even a hash in this way. For example, here's one way (out of a great
many) to pass an array:
# in configuration file
PerlSetVar FoodForThought apples:oranges:kiwis:mangos
# in Perl module
@foodForThought = split ":", $r->dir_config('FoodForThought');
And here's a way to pass an associative array:
# in configuration file
PerlSetVar FoodForThought apples=>23,kiwis=>12
# in Perl module
%foodForThought = split /\s*(?:=>|,)\s*/, $r->dir_config('FoodForThought);
Notice that the pattern match allows whitespace to come before or after the
comma or arrow operators, just as Perl does.
By modifying the pattern match appropriately, you can pass more complex
configuration information. The only trick is to remember to put double
quotes around the configuration value if it contains whitespace, and not to
allow your text editor to wrap it to another line. You can use backslash as
a continuation character if you find long lines a pain to read:
PerlSetVar FoodForThought "apples => 23,\
kiwis => 12,\
rutabagas => 0"
If you have a really complex configuration, then you are probably better off using a separate
configuration file and pointing to it using a single PerlSetVar directive. The server_root_relative()
method is useful for specifying configuration files that are relative to
the server root:
# in server configuration file
PerlSetVar FoodConfig conf/food.conf
# in Perl module
$conf_file = $r->server_root_relative($r->dir_config('FoodConfig'));
Despite the simplicity of this approach, there are times when you may
prefer to create your own ``first class'' configuration directives. This
becomes particularly desirable when you have many different directives,
when the exact syntax of the directives is important and you want Apache to
check the syntax at startup time, or when you are planning to distribute
your module and want it to appear polished. There is also a performance
penalty associated with parsing
PerlSetVar configuration at request time, which is avoided using first class
configuration directives because they are parsed once at server startup
time.
Apache provides an API for defining configuration directives. You provide
the directive's name, syntax, and a string briefly summarizing the
directive's intended usage. You may also limit the applicability of the
directive to certain parts of the configuration files. Apache parses the
directive and passes the parsed structure to your module for processing.
Your module will then use this information to set up global variables or do
whatever initialization it needs to.
The process of defining new configuration directives is not as simple to
use as other parts of the Perl API. This is because configuration
directives are defined in a compiled C structure that cannot be built
dynamically at run time. In order to work with this restriction,
mod_perl takes the following roundabout route:
- You create an empty module with h2xs.
- You modify the newly-created Makefile.PL file to declare
an array containing the definitions for the new configuration
directives and to invoke the command_table() function from a helper
module named Apache::ExtUtils.
- You write a .pm file containing Perl callbacks for each of
the configuration directives you define.
- You run perl Makefile.PL to autogenerate a .xs file.
- You run make and make install to create the loadable
module and move it into place.
- You add a PerlModule directive to the server configuration
file to load the module at server startup time.
We'll take you through a short example first to show you the whole process
and then get into the details later. Our candidate for adding configuration
directives is the Apache::PassThru module from Chapter 7 (Listing 7.10), which transparently maps portions of
the server's document tree onto remote servers using Apache's proxy
mechanism.
As you may recall, Apache::PassThru used a single PerlSetVar
variable named PerlPassThru, which in turn contained a series of
local=>remote URI pairs stored in one long string. Although this strategy is adequate,
it's not particularly elegant. Our goal here will be to create a new first
class configuration directive named
PerlPassThru. PerlPassThru will take two arguments, a local URI and a remote URI to map it to. To map
several local URIs to remote servers, you'll be able to repeat the
directive. Because it makes no sense for the directory to appear in
directory sections or
.htaccess files, PerlPassThru will be limited to the main parts of the httpd.conf, srm.conf and access.conf files, as well as to <VirtualHost> sections.
First we'll need something to start with, so we use h2xs to create a skeletal module directory:
% h2xs -Af -n Apache::PassThru
Writing Apache/PassThru/PassThru.pm
Writing Apache/PassThru/PassThru.xs
Writing Apache/PassThru/Makefile.PL
Writing Apache/PassThru/test.pl
Writing Apache/PassThru/Changes
Writing Apache/PassThru/MANIFEST
The -A and -f command-line switches turn off the generation of autoloader stubs and the C
header file conversion steps, respectively. -n gives the module a name. We'll be editing the files Makefile.PL and PassThru.pm. PassThru.xs will be overwritten when we go to make the module, so there's no need to
worry about it.
The next step is to edit the Makefile.PL script to add the declaration of the PerlPassThru directive and to arrange for
Apache::ExtUtils' command_table() function to be executed at the appropriate moment. Listing 8.1 shows a
suitable version of the file. We've made multiple modifications to the Makefile.PL
originally produced by h2xs. First, we've placed a package
declaration at the top, putting the whole script in the
Apache::PassThru namespace. Then, after the original line use
ExtUtils::MakeMaker, we load two mod_perl-specific modules,
Apache::ExtUtils, which defines the command_table() function, and Apache::src, a small utility class that can be used to find the location of the Apache
header files. These will be needed during the make.
- Listing 8.1: Makefile.PL for the Improved Apache::PassThru
-
package Apache::PassThru;
# File: Apache/PassThru/Makefile.PL
use ExtUtils::MakeMaker;
use Apache::ExtUtils qw(command_table);
use Apache::src ();
my @directives = (
{ name => 'PerlPassThru',
errmsg => 'a local path and a remote URI to pass through to',
args_how => 'TAKE2',
req_override => 'RSRC_CONF'
}
);
command_table(\@directives);
WriteMakefile(
'NAME' => __PACKAGE__,
'VERSION_FROM' => 'PassThru.pm',
'INC' => Apache::src->new->inc,
'INSTALLSITEARCH' => '/home/httpd/lib/perl',
'INSTALLSITELIB' => '/home/httpd/lib/perl',
);
__END__
Next comes the place where we define the new configuration directives
themselves. We create a list named @directives , each element of which corresponds to a different directive. In this case,
we only have one directive to declare, so @directives is one element long.
Each element of the list is an anonymous hash containing one or more of the
keys name, errmsg, args_how and req_override (we'll see later how to implement the most common type of directive using a
succinct anonymous array form). name corresponds to the name of the directive, ``PerlPassThru'' in this case,
and errmsg corresponds to a short usage message that will be displayed in the event of
a configuration syntax error. args_how tells Apache how to parse the directive's arguments. In this case we
specify TAKE2, which tells Apache that the directive takes two (and only two) arguments.
We'll go over the complete list of parsing options later, and also show you
a shortcut for specifying parsing options using Perl prototypes. The last
key, req_override tells Apache what configuration file contexts the directive is allowed in.
In this case we specify the most restrictive context, RSRC_CONF, which limits the directive to apearing in the main part of the
configuration files or in virtual host sections. Notice that
RSRC_CONF is an ordinary string, not a bareword function call!
Having defined our configuration directive array, we pass a reference to it
to the command_table() function. When run, this routine writes out a file named PassThru.xs to the current directory.
command_table() uses the package information returned by the Perl
caller() function to figure out the name of the file to write. This is why it was
important to include a package declaration at the top of the script.
The last part of Makefile.PL is a call WriteMakefile(), a routine provided by
ExtUtils::MakeMaker and automatically placed in Makefile.PL
by h2xs. However we've modified the autogenerated call in three slight but
important ways. The INC key, which MakeMaker
uses to generate include file switches, has been modified to use the value
returned by Apache::src->new->inc (a shorthand way of creating a new Apache::src object and immediately calling its
inc() method). This call will return a list of directories that contain various
header files needed to build Apache C-language modules. We've also added
the keys INSTALLSITEARCH and INSTALLSITELIB
to the parameters passed to WriteMakeFile(), in each case specifying the path we use for Apache Perl API modules on
our system (you'll have to modify this for your setup). This ensures that
when we make install the module file and its loadable object will be moved to the location of
Apache-specific modules rather than into the default Perl library
directory.
The next step is to modify PassThru.pm to accommodate the new configuration directive. We start with the stock
file from Listing 7.10, and add the following lines to the top of the file:
use Apache::ModuleConfig ();
use DynaLoader ();
use vars qw($VERSION);
$VERSION = '1.00';
if($ENV{MOD_PERL}) {
no strict;
@ISA = qw(DynaLoader);
__PACKAGE__->bootstrap($VERSION);
}
This brings in code for fetching and modifying the current configuration
settings, and loads the DynaLoader module, which provides the bootstrap() routine for loading shared library code. We test the MOD_PERL environment variable to find out if we are running inside httpd, if so invoke bootstrap() to load the object file that contains the compiled directory configuration
record.
Next, we add the following configuration processing callback routine to the
file:
sub PerlPassThru ($$$$) {
my($cfg, $parms, $local, $remote) = @_;
$cfg->{PassThru}{$local} = $remote;
}
The callback (known for short as the ``directive handler'') is a subroutine
will be called each time Apache processes a PerlPassThru
directive. It is responsible for stashing the information into a
configuration record where it can be retrieved later by the
handler() subroutine. The name of the subroutine must exactly match the name of the
configuration directive, capitalization included. It should also have a
prototype that correctly matches the syntax of the configuration directive.
All configuration callbacks are called with at least two scalar arguments
($$). The first argument, $cfg, is the per-directory or per-server object where the configuration data
will be stashed. As we will explain shortly, this object can be recovered
later during startup or request time. The second argument, $parms , is an Apache::CmdParms object from which you can retrieve various other information about the
configuration.
Depending on the syntax of the directive, callbacks will be passed other
parameters as well, corresponding to the arguments of the configuration
directive that the callback is responsible for. In the case of PerlPassThru(), which is a TAKE2 directive, we expect two additional arguments, so the complete function
prototype is ($$$$).
The body of the subroutine is trivial. For all intents and purposes the
configuration object is a hash reference in which you can store arbitrary
key/value pairs. The convention is to choose a key with the same name as
the configuration directive. In this case we use an anonymous hash to store
the current local and remote URIs into the configuration object at a key
named PassThru. This allows us to have multiple mappings while guaranteeing that each
local URI is unique.
The handler() subroutine needs a slight modification as well. We remove the line
my %mappings = split /\s*(?:,|=>)\s*/, $r->dir_config('PerlPassThru');
and substitute the following:
my %mappings = ();
if(my $cfg = Apache::ModuleConfig->get($r)) {
%mappings = %{ $cfg->{PassThru} } if $cfg->{PassThru};
}
We call the Apache::ModuleConfig class method get() to retrieve the configuration object corresponding to the current request.
We then fetch the value of the configuration object's
PassThru key. If the key is present, we dereference it and store it into %mappings . We then proceed as before. For your convenience, Listing 8.2 gives the
complete code for the modified module.
The last step is to arrange for Apache::PassThru to be loaded at server startup time. The easiest way to do this is to load
the module with a PerlModule directive:
PerlModule Apache::PassThru
The only trick to this is that you must be careful that the
PerlModule directive is called before any PerlPassThru
directives appear. Otherwise Apache won't recognize the new directive and
will abort with a configuration file syntax error. The other caveat is that PerlModule only works to bootstrap configuration directives in mod_perl versions 1.17 and higher. If you are using an earlier version, use this
configuration section instead:
<Perl>
use Apache::PassThru ();
</Perl>
<Perl> sections are described in more detail towards the end of this chapter.
Now change the old Apache::PassThru configuration to use the first-class PerlPassThru directive:
PerlModule Apache::PassThru
PerlTransHandler Apache::PassThru
PerlPassThru /CPAN http://www.perl.com/CPAN
PerlPassThru /search http://www.altavista.com
After restarting the server, you should now be able test the
Apache::PassThru handler to confirm that it correctly proxies the
/CPAN and /search URIs.
If your server has the mod_info module configured, you should be able to view the entry for the Apache::PassThru module. It should look something like this:
Module Name: Apache::PassThru
Content handlers: none
Configuration Phase Participation: Create Directory Config, Create
Server Config
Request Phase Participation: none
Module Directives:
PerlPassThru - a local path and a remote URI to pass through
to
Current Configuration:
httpd.conf
PerlPassThru /CPAN http://www.perl.com/CPAN
PerlPassThru /search http://www.altavista.com
Now try changing the syntax of the PerlPassThru directive. Create a directive that has too many arguments, or one that has
too few. Try putting the directive inside a <Directory> section or
.htaccess file. Any attempt to violate the syntax restrictions we specified in Makefile.PL with the args_how and req_override
keys should cause a syntax error at server startup time.
- Listing 8.2: Apache::PassThru with a Custom Configuration
Directive
-
package Apache::PassThru;
# file: Apache/PassThru.pm;
use strict;
use vars qw($VERSION);
use Apache::Constants qw(:common);
use Apache::ModuleConfig ();
use DynaLoader ();
$VERSION = '1.00';
if($ENV{MOD_PERL}) {
no strict;
@ISA = qw(DynaLoader);
__PACKAGE__->bootstrap($VERSION);
}
sub handler {
my $r = shift;
return DECLINED if $r->proxyreq;
my $uri = $r->uri;
my %mappings = ();
if(my $cfg = Apache::ModuleConfig->get($r)) {
%mappings = %{ $cfg->{PassThru} } if $cfg->{PassThru};
}
foreach my $src (keys %mappings) {
next unless $uri =~ s/^$src/$mappings{$src}/;
$r->proxyreq(1);
$r->uri($uri);
$r->filename("proxy:$uri");
$r->handler('proxy-server');
return OK;
}
return DECLINED;
}
sub PerlPassThru ($$$$) {
my($cfg, $parms, $local, $remote) = @_;
unless ($remote =~ /^http:/) {
die "Argument `$remote' is not a URL\n";
}
$cfg->{PassThru}{$local} = $remote;
}
1;
__END__
We'll now look in more detail at how you can precisely control the behavior
of configuration directives.
As you recall, a module's configuration directives are declared in an array
of hashes passed to the command_table() function. Each hash contains the required keys name and errmsg. In addition, there many be any of four optional keys func, args_how,
req_override and cmd_data.
For example, this code fragment defines two configuration directives named TrafficCopSpeedLimit and TrafficCopRightOfWay:
@directives = (
{
name => 'TrafficCopSpeedLimit',
errmsg => 'an integer specifying the maximum allowable
kilobytes per second',
func => 'right_of_way',
args_how => 'TAKE1',
req_override => 'OR_ALL',
},
{
name => 'TrafficCopRightOfWay',
errmsg => 'list of domains that can go as fast as they
want',
args_how => 'ITERATE',
req_override => 'OR_ALL',
cmd_data => '[A-Z_]+',
},
);
command_table(\@directives);
The required name key points to the name of the directive. It should have exactly the same
spelling and capitalization as the directive you want to implement (Apache
doesn't actually care about the capitalization of directives, but Perl does
when it goes to call your configuration processing callbacks).
Alternatively, you can use the optoinal func key to specify a subroutine with a different name than the configuration
directive.
The mandatory errmsg key should be a short but succinct usage statement that summarizes the
arguments that the directive takes.
The optional args_how key tells Apache how to parse the directive. There are 11 (!) possibilities
corresponding to different numbers of mandatory and optional arguments.
Because the number of arguments passed to the Perl callback function for
processing depends on the value of args_how, the callback function must know in advance how many arguments to expect.
The optional cmd_data key can be used to pass arbitrary information to the directive handler. The
handler can retrieve this information by calling the $parms object's info() method. In our example, we use this information to pass a pattern match
expression to the callback. This is how it might be used:
sub TrafficCopRightOfWay ($$@) {
my($cfg, $parms, $domain) = @_;
my $pat = $parms->info;
unless ($domain =~ /^$pat$/i) {
die "Invalid domain: $domain\n";
}
$cfg->{RightOfWay}{$domain}++;
}
req_override, another optional key, is use to restrict the directive so that it can
only legally appear in certain sections of the configuration files.
Most configuration-processing callbacks will declare function prototypes
that describe how they are intended to be called. Although in the current
implementation Perl does not check callbacks' prototypes at runtime, they
serve a very useful function nevertheless. The command_table() function can use callback prototypes to choose the correct syntax for the
directive on its own. If no args_how
key is present in the definition of the directive, command_table()
will pull in the .pm file containing the callback definitions and attempt
to autogenerate the args_how field on its own, using the Perl prototype() builtin function. By specifying the correct prototype, you can forget about args_how entirely and let
command_table() take care of choosing the correct directive parsing method for you.
If both an args_how and a function prototype are provided,
command_table() will use the value of args_how in case of a diagreement. If neither an args_how nor a function prototype is present, command_table() will choose a value of TAKE123, which is a relatively permissive parsing rule.
Apache supports a total of 11 different directive parsing methods. This
section lists their symbolic constants and the Perl prototypes to use if
you wish to take advantage of configuration definition shortcuts.
- NO_ARGS ("$$", or no prototype at all)
-
The directive takes no arguments. The callback will be invoked once each
time the directive is encountered. Example:
sub TrafficCopOn ($$) {
shift->{On}++;
}
- TAKE1 ("$$$")
-
The directive takes a single argument. The callback will be invoked once
each time the directive is encountered, and its argument will be passed as
the third argument. Example:
sub TrafficCopActiveSergeant ($$$) {
my($cfg, $parms, $arg) = @_;
$cfg->{Sergeant} = $arg;
}
- TAKE2 ("$$$$")
-
The directive takes two arguments. They are passed to the callback as the
third and fourth arguments.
sub TrafficCopLimits ($$$$) {
my($cfg, $parms, $minspeed, $maxspeed) = @_;
$cfg->{Min} = $minspeed;
$cfg->{Max} = $maxspeed;
}
- TAKE3 ("$$$$$")
-
This is like <TAKE1> and <TAKE2>, but the directive takes three
mandatory arguments.
- TAKE12 ("$$$;$")
-
In this interesting variant, the directive takes one mandatory argument,
and a second optional one. This can be used when the second argument has a
default value that the user may want to override. Example:
sub TrafficCopWarningLevel ($$$;$) {
my($cfg, $parms, $severity_level, $msg) = @_;
$cfg->{severity} = $severity_level;
$cfg->{msg} = $msg || "You have exceeded the speed limit. Your
license please?"
}
- TAKE23 ("$$$$;$")
-
TAKE23 is just like TAKE12, except now there are two mandatory arguments and an optional third one.
- TAKE123 ("$$$;$$")
-
In the TAKE123 variant, the first argument is mandatory and the other two are optional.
This is useful for providing defaults for two arguments.
- ITERATE ("$$@")
-
ITERATE is used when a directive can take an unlimited number of arguments. For
example, the mod_autoindex IndexIgnore
directive, specifies a list of one or more file extensions to ignore in
directory listings:
IndexIgnore .bak .sav .hide .conf
Although the function prototype suggests that the callback's third argument
will be a list, this is not the case. In fact, the callback is invoked
repeatedly with a single argument, once for each argument in the list. It's
done this way for interoperability with the C API, which doesn't have the
flexible argument passing that Perl provides.
The callback should be prepared to be called once for each argument in the
directive argument list, and to be called again each time the directive is
repeated. For example:
sub TrafficCopRightOfWay ($$@) {
my($cfg, $parms, $domain) = @_;
$cfg->{RightOfWay}{$domain}++;
}
- ITERATE2 ("$$@;@")
-
ITERATE2 is an interesting twist on the ITERATE theme. This is used for directives
that take a mandatory first argument followed by a list of arguments to be
applied to the first. A familiar example is the
AddType directive, in which a series of file extensions are applied to a single
MIME type:
AddType image/jpeg JPG JPEG JFIF jfif
Like ITERATE, the callback function prototype is there primarily to provide a unique
signature that can be recognized by
command_table(). Apache will invoke your callback once for each item in the list. Each
time Apache runs your callback, it passes the routine the constant first
argument (``image/jpeg'' in the example above), and the current item in the
list (``JPG'' the first time around, ``JPEG'' the second time, and so on).
In the example above, the configuration processing routine will be run a
total of four times.
Let's say our Apache::TrafficCop needs to ticket cars parked on only the days when it is illegal, such as
street sweeping day:
TrafficCopTicket street_sweeping monday wednesday friday
The ITERATE2 callback to handle this directive would look like:
sub TrafficCopTicket ($$@;@) {
my($cfg, $parms, $violation, $day) = @_;
push @{ $cfg->{Ticket}{$violation} }, $day;
}
- RAW_ARGS ("$$$;*")
-
An args_how of RAW_ARGS instructs Apache to turn off parsing altogether. Instead it simply passes
your callback function the line of text following the directive. Leading
and trailing whitespace is stripped from the text, but it is not otherwise
processed. Your callback can then do whatever processing it wishes to
perform.
This callback receives four arguments, the third of which is a
string-valued scalar containing the text following the directive. The last
argument is a filehandle tied to the configuration file. This filehandle
can be used to read data from the configuration file starting on the line
following the configuration directive. It is most common to use a RAW_ARGS prototype when processing a ``container'' directive. For example, let's say
our TrafficCop needs to build a table of speed limits for a given district:
<TrafficCopSpeedLimits charlestown>
Elm St. 20
Payson Ave. 15
Main St. 25
</TrafficCopSpeedLimits>
By using the RAW_ARGS prototype, the third argument passed in will be ``charlestown>'', it's
up to the handler to strip the trailing
>. Now the handler can use the tied filehandle to read the following lines
of configuration, until it hits the container end token, </TrafficCopSpeedLimits>. For each configuration line that is read in, leading and trailing
whitespace is stripped, as is the trailing newline. The handler can then
apply any parsing rules it wishes to the line of data.
Example:
my $EndToken = "</TrafficCopSpeedLimits>";
sub TrafficCopSpeedLimits ($$$;*) {
my($cfg, $parms, $district, $cfg_fh) = @_;
$district =~ s/>$//;
while((my $line = <$cfg_fh>) !~ m:^$EndToken:o) {
my($road, $limit) = ($line =~ /(.*)\s+(\S+)$/);
$cfg->{SpeedLimits}{$district}{$road} = $limit;
}
}
There is one other trick to making configuration containers work. In order
to be recogized as a valid directive, the
name entry passed to command_table() must contain the leading <. This token will be stripped by Apache::ExtUtils
when it maps the directive to the coresponding subroutine callback.
Example:
my @directives = (
{ name => '<TrafficCopSpeedLimits',
errmsg => 'a district speed limit container',
args_how => 'RAW_ARGS',
req_override => 'OR_ALL'
},
);
One other trick that is not required, but can provide some more user
friendliness is to provide a handler for the container end token. In our
example, the Apache configuration gears will never see the
</TrafficCopSpeedLimits> token, as our RAW_ARGS handler will read in that line and stop reading when it is seen. However in
order to catch cases in which the </TrafficCopSpeedLimits>
text appears without a preceding <TrafficCopSpeedLimits>
opening section, we need to turn the end token into a directive that simply
reports an error and exits.
command_table() includes a special tests for directives whose names begin with </. When it encounters a directive like this, it strips the leading </ and trailing > characters from the name and tacks _END onto the end. This allows us to declare an end token callback like this
one:
my $EndToken = "</TrafficCopSpeedLimits>";
sub TrafficCopSpeedLimits_END () {
die "$EndToken outside a <TrafficCopSpeedLimits> container\n";
}
which corresponds to a directive definition like this one:
my @directives = (
...
{ name => '</TrafficCopSpeedLimits>',
errmsg => 'end of speed limit container',
args_how => 'NO_ARGS',
req_override => 'OR_ALL',
},
);
Now, should the server admin misplace the container end token, the server
will not start, complaining with this error message:
Syntax error on line 89 of httpd.conf:
</TrafficCopSpeedLimits> outside a <TrafficCopSpeedLimits> container
- FLAG ("$$$")
-
When the FLAG prototype is used, Apache will only allow the argument to be one of two
values, On or Off. This string value will be converted into an integer, 1 if the flag is On, if it is Off. If the configuration argument is anything other than
On or Off, Apache will complain:
Syntax error on line 90 of httpd.conf:
TrafficCopRoadBlock must be On or Off
Example:
#Makefile.PL
my @directives = (
...
{ name => 'TrafficCopRoadBlock',
errmsg => 'On or Off',
args_how => 'FLAG',
req_override => 'OR_ALL',
},
#TrafficCop.pm
sub TrafficCopRoadBlock ($$$) {
my($cfg, $parms, $arg) = @_;
$cfg->{RoadBlock} = $arg;
}
On successfully processing a directive, its handler should return a null
string or undef. If an error occurs while processing the directive, the
routine should return a string describing the source of the error. There is
also a third possibility. The configuration directive handler can return DECLINE_CMD , a constant that must be explicitly imported from Apache::Constants. This is used in the rare circumstance in which a module redeclares
another module's directive in order to override it. The directive handler
can then return DECLINE_CMD when it wishes the directive to fall through to the original module's
handler.
In addition to specifying the syntax of your custom configuration
directives, you can establish limits on how they can be used by specifying
the req_override key in the data passed to
command_table(). This option controls which parts of the configuration files the
directives can appear in, something that is called the directive's
``context'' in the Apache manual pages. This key should point to a bitmap
formed by combining the values of several C-language constants:
- RSRC_CONF
-
The directive can appear in any *.conf file outside a directory section (<Directory>, <Location> or
<Files>; also <FilesMatch> and kin). The directive is not allowed in .htaccess files.
- ACCESS_CONF
-
The directive can appear within directory sections. The directive is not
allowed in .htaccess files.
- OR_AUTHCFG
-
The directive can appear within directory sections, but not outside them.
It is also allowed within .htaccess files, provided that
AllowOverride AuthConfig is set for the current directory.
- OR_LIMIT
-
The directive can appear within directory sections, but not outside them.
It is also allowed within .htaccess files, provided that
AllowOverride Limit is set for the current directory.
- OR_OPTIONS
-
The directive can appear anywhere within the *.conf files as well as within .htaccess files provided that AllowOverride Options is set for the current directory.
- OR_FILEINFO
-
The directive can appear anywhere within the *.conf files as well as within .htaccess files provided that AllowOverride FileInfo is set for the current directory.
- OR_INDEXES
-
The directive can appear anywhere within the *.conf files as well as within .htaccess files provided that AllowOverride Indexes is set for the current directory.
- OR_ALL
-
The directive can appear anywhere. It is not limited in any way.
- OR_NONE
-
The directive cannot be overriden by any of the AllowOverride options.
The value of req_override is actually a bitmask. Apache derives the directive context by taking the
union of all the set bits. This allows you to combine contexts by combining
them with logical ORs and ANDs. For example, this combination of constants
will allow the directive to appear anywhere in a *.conf file, but forbid it
from ever being used in a .htaccess file:
'req_override' => 'RSRC_CONF | ACCESS_CONF'
As in the case of args_how, the value of the req_override key is actually not evaluated. It is simply a string that is written into
the .xs file and eventually passed to the C compiler. This means that any
errors in the string you provide for req_override will not be caught until the compilation phase.
We've already seen one way to simplify your configuration directives by
allowing command_table() to deduce the correct args_how from the callback's function prototype. One other shortcut is available to
you as well.
If you pass command_table() a list of array references rather than hash references, then it will take the first item in each array ref to be the
name of the configuration directive, and the second item to be the
error/usage message. req_override will default to
OR_ALL (allow this directive anywhere), and args_how will be derived from the callback prototype, if present, TAKE123 if not.
By taking advantage of this shortcut, we can rewrite the list of
configuration directives at the beginning of this section more succinctly:
@directives = (
[
'TrafficCopSpeedLimit',
'an integer specifying the maximum allowable bytes per second',
],
[
'TrafficCopRightOfWay',
'list of domains that can go as fast as they want',
],
);
command_table(\@directives);
You can also mix and match the two configuration styles. The
@directives list can contain a mixture of array refs and hash
refs.
command_table() will do the right thing.
Digging deeper, the process of module configuration is more complex than
you'd expect because Apache recognizes multiple levels of configuration
directives. There are global directives such as those contained within the
main httpd.conf file, per-server directives specific to virtual hosts contained within <VirtualHost>
sections, and per-directory configuration directives contained within
<Directory> sections and .htaccess files.
To understand why this issue is important, consider this series of
directives:
TrafficCopSpeedLimit 55
<Location /I-95>
TrafficCopRightOfWay .mil .gov
TrafficCopSpeedLimit 65
</Location>
<Location /I-95/exit-13>
TrafficCopSpeedLimit 30
</Location>
When processing URLs in /I-95/exit13 there's a potential source of conflict because the TrafficCopSpeedLimit directive appears in several places. Intuitively, the more specific
directive should take precedence over the one in its parent directory, but
what about
TrafficCopRightOfWay? Should /I-95/exit13 inherit the value of
TrafficCopRightOfWay or ignore it?
On top of this, there is the issue of per-server and per-directory
configuration information. Some directives, such as HostName, clearly apply to the server as a whole and have no reason to change on a
per-directory basis. Other directives, such as Options, apply to individual directories or URIs. Per-server and per-directory
configuration information should be handled separately from each other.
To handle these issues, modules may declare as many as four subroutines to
handle configuration issues: SERVER_CREATE(),
DIR_CREATE(), SERVER_MERGE() and DIR_MERGE().
The SERVER_CREATE() and DIR_CREATE() routines are responsible for creating per-server and per-directory
configuration records. If present, they are invoked before Apache has processed any of the module's configuration directives in order
to create a default per-server or per-directory configuration record.
Provided that at least one of the module's configuration directives appears
in the main part of the configuration file, SERVER_CREATE() will be called once for the main server host, and once for each virtual
host. Similarly,
DIR_CREATE() will be called once for each directory section (including <Location> and .htaccess files) in which at least one of the module's configuration directives
appears.
As Apache parses and processes the module's custom directives, it invokes
the directive callbacks to add information to the per-server and/or
per-directory configuration records. Since the vast majority of modules act
at a per-directory level, Apache passes the per-directory record to the
callbacks as the first argument. This is the $cfg argument that we saw in the previous examples. A callback that is concerned
with processing per-server directives will simply ignore this argument and
use the Apache::ModuleConfig class to retrieve the per-server configuration record manually. We'll see
how to do this later.
Later in the configuration process, one or both of the
SERVER_MERGE() and DIR_MERGE() subroutines may be called. These routines are responsible for merging a
parent per-server or per-directory configuration record with a
configuration that is lower in the hierarchy. For example, merging will be
required when one or more of a module's configuration directives appear in
both a
<Location /images> section and a <Location
/images/PNG> section. In this case, DIR_CREATE() be called to create default configuration records for each of the /images and
/images/PNG directories, and the configuration directives' callbacks will be called to
set up the appropriate fields in these newly-created configurations. After
this, the DIR_MERGE()
subroutine is called once to merge the two configuration records together.
The merged configuration now becomes the per-directory configuration for /images/PNG.
This merging process is repeated as many times as needed. If a directory or
virtual host section contains none of a particular module's configuration
directives, then the configuration handlers are skipped and the
configuration for the closest ancestor of the directory is used instead.
In addition to being called at server startup time, the
DIR_CREATE() function may be invoked again at request time, for example whenever Apache
processes a .htaccess file. The
DIR_MERGE() functions are always invoked at request time in order to merge the current
directory's configuration with its parents.
When C modules implement configuration directive handlers they must, at the
very least, define a per-directory or per-server constructor for their
configuration data. However if a Perl modules does not implement a
constructor, mod_perl uses a default constructor that creates a hash reference blessed into the
current package's class. Later Apache calls your module's directive
callbacks to fill in this empty hash, which is, as usual, passed in as the $cfg argument.
Neither C nor Perl modules are required to implement merging routines. If
they do not, merging simply does not happen and Apache uses the most
specific configuration record. In the example at the top of this section,
the configuration record for the URI location
/I-95/exit-13 would contain the current value of
TrafficCopSpeedLimit, but no specific value for
TrafficCopRightOfWay.
Depending on your module's configuration system, you may wish to implement
one or more of the configuration creation and merging methods described
below. The method names use the all upper-case naming convention as they
are never called by other user code; instead they are invoked by mod_perl from the C level.
- DIR_CREATE()
-
If the directive handler's class defines or inherits a DIR_CREATE()
method, it will be invoked to create per-directory configuration
structures. This object is the second argument passed to all directive
handlers, which is normally used to store the configuration arguments. When
no DIR_CREATE() method is found, mod_perl will construct the configuration object for you:
bless {}, $Class;
You might use a DIR_CREATE() method to define various defaults or to use something other than a hash
reference to store the configuration values. This example uses a blessed
hash reference, and sets the value of TopLimit to a default value:
package Apache::TrafficCop;
sub new {
return bless {}, shift;
}
sub DIR_CREATE {
my $class = shift;
my $self = $class->new;
$self->{TopLimit} ||= 65;
return $self;
}
- DIR_MERGE()
-
When the <Directory> or <Location> hierarchy contains configuration entries at multiple levels, the directory
merger routine will be called on to merge all the directives into the
current, bottom-most level.
When defining a DIR_MERGE() method, the parent configuration object is passed as the first argument,
and the current object as the second. In the example DIR_MERGE() routine shown below, the keys of the current configuration will override
any like-named keys in the parent. The return value should be a merged
configuration object blessed into the module's class:
sub DIR_MERGE {
my($parent, $current) = @_;
my %new = (%$parent, %$current);
return bless \%new, ref($parent);
}
- SERVER_CREATE()
-
- SERVER_MERGE()
-
The SERVER_CREATE() and SERVER_MERGE() methods work just like
DIR_CREATE() and DIR_MERGE(). The difference is simply in the scope and timing in which they are
created and merged. The
SERVER_CREATE() method is only called once per configured virtual server. The SERVER_MERGE() method is invoked during server startup time, rather than at request time
like DIR_MERGE().
The configuration mechanism uses two auxiliary classes,
Apache::CmdParms and Apache::ModuleConfig to pass information between Apache and your module.
Apache::ModuleConfig is the simpler of the two. It provides just a single method, get(), which retrieves a module's current configuration information. The return
value is the object created by the module DIR_CREATE() or SERVER_CREATE() methods.
The get() method is called with the current request object or server object and an
optional additional argument indicating which module to retrieve the
configuration from. In the typical case, you'll omit this additional
argument to indicate that you want to fetch the configuration information
for the current module. For example, we saw this in the Apache::PassThru handler() routine:
my $cfg = Apache::ModuleConfig->get($r);
Had we used a SERVER_CREATE() method, the configuration data would be obtained using the request server
object:
my $cfg = Apache::ModuleConfig->get($r->server);
As a convenience, the per-directory configuration object for the current
module is always the first argument passed to any configuration processing
callback routine. Directive processing callbacks that need to operate on
server-specific configuration data should ignore this hash and fetch the
configuration data themselves using a technique we will discuss shortly.
It is also possible for one module to peek at another module's
configuration data by naming its package as the second argument to
get():
my $friends_cfg = Apache::ModuleConfig->get($r, 'Apache::TrafficCop');
You can now read and write the other module's configuration information!
Apache::CmdParms is a helpful class that Apache uses to pass a variety of configuration
information to modules. A
Apache::CmdParms object is the second argument passed to directive handler routines.
The various methods available from Apache::CmdParms are listed fully in the next chapter. The two you are most likely to use in
your modules are server() and path(). server() returns the
Apache::Server object corresponding to the current configuration. From this object you can
retrieve the virtual host's name, its configured port, the document root,
and other core configuration information. For example, this code retrieves
the administrator's name from within a configuration callback and adds it
to the module's configuration table:
sub TrafficCopActiveSergeant ($$$) {
my($cfg, $parms, $arg) = @_;
$cfg->{Sergeant} = $arg;
my $chief_of_police = $parms->server->server_admin;
$cfg->{ChiefOfPolice} = $chief_of_police;
}
Another place where the server() method is vital is when directive processing callbacks need to set
server-specific configuration information. In this case, the per-directory
configuration passed as the first callback argument can be ignored, and the
per-server configuration fetched by calling the Apache::ModuleConfig get()
with the server object as its argument.
Here's an example:
sub TrafficCopDispatcher ($$$) {
my($cfg, $parms, $arg) = @_;
my $scfg = Apache::ModuleConfig->get($parms->server)
$scfg->{Dispatcher} = $arg;
}
If the configuration-processing routine is being called to process a
container directive such as <Location> or
<Directory>, the Apache::CmdParms path() method will return the directive's argument. Depending on the context this
might be a URI, a directory path, a virtual host address, or a filename
pattern.
See Chapter 9 for the details on other methods that
Apache::ModuleConfig and Apache::CmdParms makes available.
As a full example of creating custom configuration directives, we're going
to reimplement the standard mod_mime module in Perl. It has a total of seven different directives, each with a
different argument syntax. In addition to showing you how to handle a
moderately complex configuration setup, this example will show you in
detail what goes on behind the scenes as mod_mime associates a content handler with each URI request.
This module replaces the standard mod_mime module. You do not have to remove mod_mime from the standard compiled-in modules in order to test this module. However
if you wish to remove mod_mime anyway in order to convince yourself that the replacement actually works,
the easiest way to do this is to compile mod_mime as a dynamically loaded module and then comment out the lines in httpd.conf that load it. In either case, install Apache::MIME as the default MIME checking phase handler by putting this line in perl.conf or one of the other configuration files:
PerlTypeHandler Apache::MIME
Like the previous example, the configuration information is contained in
two files. Makefile.PL (Listing 8.3) describes the directives, and Apache/MIME.pm (Listing 8.4) defines the callbacks for processing the directives at
runtime. In order to reimplement
mod_mime, we need to reimplement a total of seven directives, including SetHandler, AddHandler, AddType and AddEncoding.
Makefile.PL defines the seven directives using the anonymous hash method. All but one
of the directives is set to use the OR_FILEINFO
context, which allows the directives to appear anywhere in the main
configuration files, and in .htaccess files as well provided that
Override FileInfo is also set. The exception, TypesConfig, is the directive that indicates where the default table of MIME types is
to be found. It only makes sense to process this directive during server
startup, so its context is given as RSRC_CONF, limiting the directive to
the body of any of the *.conf files. We don't specify the
args_how key for the directives, instead allowing
command_table() to figure out the syntax for us by looking at the function prototypes in MIME.pm.
Running perl Makefile.PL will now create a .xs file, which will be compiled into a loadable object
file during make.
Turning to Listing 8.4, we start by bringing in the DynaLoader and
Apache::ModuleConfig modules as we did in the overview example at the beginning of this section:
package Apache::MIME;
# File: Apache/MIME.pm
use strict;
use vars qw($VERSION @ISA);
use LWP::MediaTypes qw(read_media_types guess_media_type add_type add_encoding);
use DynaLoader ();
use Apache ();
use Apache::ModuleConfig ();
use Apache::Constants qw(:common DIR_MAGIC_TYPE DECLINE_CMD);
@ISA = qw(DynaLoader);
$VERSION = '0.01';
if($ENV{MOD_PERL}) {
no strict;
@ISA = qw(DynaLoader);
__PACKAGE__->bootstrap($VERSION);
}
We also bring in Apache, Apache::Constants and an LWP library called LWP::MediaTypes. The Apache and Apache::Constants
libraries will be used within the handler() subroutine, while the LWP library provides utilities for guessing MIME
types, languages and encodings from file extensions. As before, Apache::MIME needs to call bootstrap immediately after loading other modules in order to bring in its compiled
.xs half. Notice that we have to explicitly import the DIR_MAGIC_TYPE and DECLINE_CMD constants from
Apache::Constants, as these are not exported by default.
Let's skip over handler() for the moment and look at the seven configuration callbacks, TypesConfig(), AddType(),
AddEncoding() and so on.
sub TypesConfig ($$$) {
my($cfg, $parms, $file) = @_;
my $types_config = Apache->server_root_relative($file);
read_media_types($types_config);
#to co-exist with mod_mime.c
return DECLINE_CMD if Apache->module("mod_mime.c");
}
TypesConfig() has a function prototype of ``$$$'', indicating a directive syntax of TAKE1. It will be called with the name of the file holding the MIME types table
as its third argument. The callback retrieves the file name, turns it into
a server-relative path, and stores the path into a lexical variable. The
callback then calls the LWP function read_media_types() to parse the file and add the MIME types found there to an internal table
maintained by LWP::MediaTypes. When the LWP::MediaTypes
function guess_media_type() is called subsequently, this table will be consulted. Note that there is no
need, in this case, to store the configuration information into the $cfg hash reference because the information is only needed at the time the
configuration directive is processed.
Another important detail is that the TypesConfig handler will return DECLINE_CMD if the mod_mime module is installed. This gives mod_mime a chance to also read in the TypesConfig file. If mod_mime isn't given this opportunity, it will complain bitterly and abort server
startup. However we don't allow any of the other directive handlers to fall
through to mod_mime in this way, effectively shutting
mod_mime out of the loop.
sub AddType ($$@;@) {
my($cfg, $parms, $type, $ext) = @_;
add_type($type, $ext);
}
The AddType() directive callback is even shorter. Its function prototype is ``$$@;@'',
indicating an ITERATE2 syntax. This means that if the AddType directive looks like this:
AddType application/x-chicken-feed .corn .barley .oats
the function will be called three times. Each time the callback is invoked
its third argument will be ``application/x-chicken-feed'', and the fourth
argument will be successively set to ``.corn'', ``.barley'' and ``.oats''.
The function recovers the third and fourth parameters and passes them to
the LWP::MediaTypes function add_type(). This simply adds the file type and extension to LWP's internal table.
sub AddEncoding ($$@;@) {
my($cfg, $parms, $enc, $ext) = @_;
add_encoding($enc, $ext);
}
AddEncoding() is similar to AddType(), but uses the
LWP::MediaTypes add_encoding() function to associate a series of file extensions with a MIME encoding.
More interesting are the SetHandler() and AddHandler()
callbacks:
sub SetHandler ($$$) {
my($cfg, $parms, $handler) = @_;
$cfg->{'handler'} = $handler;
}
sub AddHandler ($$@;@) {
my($cfg, $parms, $handler, $ext) = @_;
$cfg->{'handlers'}->{$ext} = $handler;
}
The job of the SetHandler directive is to force requests for the specified path to be passed to the
indicated content handler, no questions asked. AddHandler(), in contrast, adds a series of file extensions to the table consulted by
the MIME type checker when it attempts to choose the proper content handler
for the request. In both cases, the configuration information is needed
again at request time, so we have to keep it in long term storage within
the $cfg
hash.
SetHandler() is again a ``TAKE1'' type of callback. It recovers the content handler name
from its third argument and stores it in the $cfg
data structure under the key handler. AddHandler() is an ``ITERATE2'' callback which receives the name of a content handler
and a file extension as its third and fourth arguments. The callback stuffs
this information into an anonymous hash maintained in $cfg
under the handlers key.
sub ForceType ($$$) {
my($cfg, $parms, $type) = @_;
$cfg->{'type'} = $type;
}
The ForceType directive is used to force all documents in a path to be a particular MIME
type, regardless of its file extension. It's often used within a <Directory> section to force all documents contained within the directory to be a
particular MIME type, and is helpful for dealing with legacy documents that
don't have informative file extensions. The ForceType() callback uses TAKE1 syntax in which the required argument is a MIME type. The callback recovers
the MIME type and stores it in the $cfg hash reference under the key
type.
sub AddLanguage ($$@;@) {
my($cfg, $parms, $language, $ext) = @_;
$ext =~ s/^\.//;
$cfg->{'language_types'}->{$ext} = lc $language;
}
The last directive handler, AddLanguage(), implements the
AddLangauge directive, in which a series of file extensions are associated with a
language code (e.g. ``fr'' for French, ``en'' for English). It is an ITERATE2 callback and works just like
AddHandler(), except that the dot is stripped off the file extension before storing it
into the $cfg hash . This is because of an old inconsistency in the way that mod_mime works, in which the
AddLanguage directive expects dots in front of the file extensions, while the AddType and AddHandler directives do not.
Now we turn our attention to the handler() subroutine itself. This code will be called at request time during the MIME
type checking phase. It has four responsibilities:
- Guess the MIME content type for the requested document.
- Guess the content encoding for the requested document.
- Guess the content language(s) for the requested document.
- Set the content handler for the request.
- If the requested document is a directory, initiate special
directory processing.
Items 1 through 3 are important, but not critical. The content type,
encoding and language may well be changed during the response phase by the
content handler. In particular, the MIME type is very frequently changed
(e.g. by CGI scripts). Item 4, however, is crucial since it determines what
code will be invoked to respond to the request. It is also necessary to
detect and treat requests for directory names specially, using a
pseudo-MIME type to initiate Apache's directory handling.
sub handler {
my $r = shift;
if(-d $r->finfo) {
$r->content_type(DIR_MAGIC_TYPE);
return OK;
}
handler() begins by shifting the Apache request object off the subroutine stack. The
subroutine now does a series of checks on the requested document. First, it
checks whether $f->finfo() refers to a directory. If so, then handler() sets the request content type to a pseudo-MIME type defined by the constant DIR_MAGIC_TYPE and exits. Returning DIR_MAGIC_TYPE signals Apache that the user requested a directory, causing the server to
pass control to any content handlers that list this constant among the MIME
types they handle. mod_dir and mod_autoindex are two of the standard modules that are capable of generating directory
listings.
my($type, @encoding) = guess_media_type($r->filename);
$r->content_type($type) if $type;
unshift @encoding, $r->content_encoding if $r->content_encoding;
$r->content_encoding(join ", ", @encoding) if @encoding;
If the file is not a directory, then we try to guess its MIME type and
encoding. We call on the LWP::MediaTypes function
guess_media_type() to do the work, passing it the filename and receiving a MIME type and list
of encodings in return. Although unusual, it is theoretically possible for
a file to have multiple encodings and LWP::MediaTypes allows this. The returned type is immediately used to set the MIME type of
the requested document by calling the request object's content_type() method. Likewise, the list of encodings is added to the request using content_encoding() after joining them together into a comma-delimited string. The only
subtlety here is that we honor any previously-defined encoding for the
requested document by adding it to the list of encodings returned by
guess_media_type(). This is in case the handler for a previous phase happened to add some
content encoding.
Now comes some processing that depends on the values in the configuration
hash, so we recover the $cfg variable by calling
Apache::ModuleConfig's get() method:
my $cfg = Apache::ModuleConfig->get($r);
The next task is to parse out the requested file's extensions and use them
to set the file's MIME type and/or language.
for my $ext (LWP::MediaTypes::file_exts($r->filename)) {
if(my $type = $cfg->{'language_types'}->{$ext}) {
my $ltypes = $r->content_languages;
push @$ltypes, $type;
$r->content_languages($ltypes);
}
Using the LWP::MediaTypes function <file_exts()>, we split out all the extensions in the
requested document's filename and loop through them. This allows a file
named ``travel.html.fr'' to be recognized and dealt with appropriately.
We first whether the extension matches one of the extensions in the
configuration object's language_types key. If so, we use the extension to set the language code for the document.
Although it is somewhat unusual, the HTTP specification allows a document
to specify multiple languages in its Content-Language field, so we go to some lengths to merge multiple language codes into one
long list which we then set with the request object's content_languages() method.
if(my $type = $cfg->{'handlers'}->{$ext} and !$r->proxyreq) {
$r->handler($type);
}
}
While still in the loop, we deal with the content handler for the request.
We check whether the extension is among the ones defined in the
configuration variable's handlers hash. If so, we call the request object's handler() method to set the content handler to the indicated value. The only catch is
that if the current transaction is a proxy request, we do not want to alter
the content handler, because another module may have set the content
handler during the URI translation phase.
$r->content_type($cfg->{'type'}) if $cfg->{'type'};
$r->handler($cfg->{'handler'}) if $cfg->{'handler'};
After looping through the file extensions, we handle the ForceType
and SetHandler directives, which have the effect of overriding file extensions. If the
configuration key type is non-empty, we use it to force the MIME type to the specified value.
Likewise, if
handler, is non-empty, we again call handler(), replacing whatever content handler was there before.
return OK;
}
At the end of handler() we return OK to tell Apache that the MIME type checking phase has been
handled successfully.
Although this module was presented mainly as an exercise, with minimal work
it can be used to improve on mod_mime. For example, you might have noticed that the standard mod_mime has no ForceEncoding or
ForceLanguage directives that allow you to override the file extension mappings in the
way that you can with ForceType. This is easy enough to fix in Apache::MIME by adding the appropriate directive definitions and callbacks.
- Listing 8.3: Makefile.PL for Apache::MIME
-
package Apache::MIME;
# File: Makefile.PL
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
use Apache::src ();
use Apache::ExtUtils qw(command_table);
my @directives = (
{ name => 'SetHandler',
errmsg => 'a handler name',
req_override => 'OR_FILEINFO' },
{ name => 'AddHandler',
errmsg => 'a handler name followed by one or more file extensions',
req_override => 'OR_FILEINFO' },
{ name => 'ForceType',
errmsg => 'a handler name',
req_override => 'OR_FILEINFO' },
{ name => 'AddType',
errmsg => 'a mime type followed by one or more file extensions',
req_override => 'OR_FILEINFO' },
{ name => 'AddLanguage',
errmsg => 'a language (e.g., fr), followed by one or more file extensions',
req_override => 'OR_FILEINFO' },
{ name => 'AddEncoding',
errmsg => 'an encoding (e.g., gzip), followed by one or more file extensions',
req_override => 'OR_FILEINFO' },
{ name => 'TypesConfig',
errmsg => 'the MIME types config file',
req_override => 'RSRC_CONF'
},
);
command_table \@directives;
WriteMakefile(
'NAME' => __PACKAGE__,
'VERSION_FROM' => 'MIME.pm',
'INC' => Apache::src->new->inc,
);
__END__
- Listing 8.4: Apache::MIME Reimplements the Standard
mod_mime module.
-
package Apache::MIME;
# File: Apache/MIME.pm
use strict;
use vars qw($VERSION @ISA);
use LWP::MediaTypes qw(read_media_types guess_media_type add_type add_encoding);
use DynaLoader ();
use Apache ();
use Apache::ModuleConfig ();
use Apache::Constants qw(:common DIR_MAGIC_TYPE DECLINE_CMD);
@ISA = qw(DynaLoader);
$VERSION = '0.01';
if($ENV{MOD_PERL}) {
no strict;
@ISA = qw(DynaLoader);
__PACKAGE__->bootstrap($VERSION);
}
sub handler {
my $r = shift;
if(-d $r->finfo) {
$r->content_type(DIR_MAGIC_TYPE);
return OK;
}
my($type, @encoding) = guess_media_type($r->filename);
$r->content_type($type) if $type;
unshift @encoding, $r->content_encoding if $r->content_encoding;
$r->content_encoding(join ", ", @encoding) if @encoding;
my $cfg = Apache::ModuleConfig->get($r);
for my $ext (LWP::MediaTypes::file_exts($r->filename)) {
if(my $type = $cfg->{'language_types'}->{$ext}) {
my $ltypes = $r->content_languages;
push @$ltypes, $type;
$r->content_languages($ltypes);
}
if(my $type = $cfg->{'handlers'}->{$ext} and !$r->proxyreq) {
$r->handler($type);
}
}
$r->content_type($cfg->{'type'}) if $cfg->{'type'};
$r->handler($cfg->{'handler'}) if $cfg->{'handler'};
return OK;
}
sub TypesConfig ($$$) {
my($cfg, $parms, $file) = @_;
my $types_config = Apache->server_root_relative($file);
read_media_types($types_config);
#to co-exist with mod_mime.c
return DECLINE_CMD if Apache->module("mod_mime.c");
}
sub AddType ($$@;@) {
my($cfg, $parms, $type, $ext) = @_;
add_type($type, $ext);
}
sub AddEncoding ($$@;@) {
my($cfg, $parms, $enc, $ext) = @_;
add_encoding($enc, $ext);
}
sub SetHandler ($$$) {
my($cfg, $parms, $handler) = @_;
$cfg->{'handler'} = $handler;
}
sub AddHandler ($$@;@) {
my($cfg, $parms, $handler, $ext) = @_;
$cfg->{'handlers'}->{$ext} = $handler;
}
sub ForceType ($$$) {
my($cfg, $parms, $type) = @_;
$cfg->{'type'} = $type;
}
sub AddLanguage ($$@;@) {
my($cfg, $parms, $language, $ext) = @_;
$ext =~ s/^\.//;
$cfg->{'language_types'}->{$ext} = lc $language;
}
1;
__END__
We've just seen how you can configure Perl modules using the Apache
configuration mechanism. Now we turn it around to show you how to configure
Apache from within Perl. Instead of configuring Apache by hand editing a
set of configuration files, the Perl API allows you to write a set of Perl
statements to dynamically configure Apache at run time. This gives you
limitless flexibility. For example, you can create create complex
configurations involving hundreds of virtual hosts without manually typing
hundreds of <VirtualHost> sections into
httpd.conf. Or you can write a master configuration file that will work without
modification on any machine in a ``server farm.'' You could even look up
configuration information at run time from a relational database.
The key to Perl-based server configuration is the <Perl> directive. Unlike the other directives defined by mod_perl, this directive is paired to a corresponding </Perl> directive, forming a Perl
section.
When Apache hits a Perl section during startup time, it passes everything
within the section to mod_perl. mod_perl in turn, compiles the contents of the section by evaluating it inside the
Apache::ReadConfig package. After compilation is finished,
mod_perl walks the Apache::ReadConfig symbol table looking for global variables with the same names as Apache's
configuration directives. The values of those globals are then fed into
Apache's normal configuration mechanism as if they'd been typed directly
into the configuration file. The upshot of all this is that instead of
setting the account under which the server runs with the User
directive:
User www
you can write this:
<Perl>
$User = 'www';
</Perl>
This doesn't look like much of a win until you consider that you can set
this global using any arbitrary Perl expression, as for example:
<Perl>
my $hostname = `hostname`;
$User = 'www' if $hostname =~ /^papa-bear/;
$User = 'httpd' if $hostname =~ /^momma-bear/;
$User = 'nobody' if $hostname =~ /^goldilocks/;
</Perl>
The Perl global that you set must match the spelling of the corresponding
Apache directive. Globals that do not match known Apache directives are
silently ignored. Capitalization is not currently significant.
In addition to single-valued directives such as User, Group and
ServerRoot, you can use <Perl> sections to set multivalued directives such as DirectoryIndex and AddType. You can also configure multipart sections such as <Directory> and <VirtualHost>. Depending on the directive, the Perl global you need to set may be a
scalar, an array or a hash. To figure out which type of Perl variable to
use, follow these rules:
- Directive Takes no Arguments
-
There are few examples of configuration directives that take no arguments.
The only one that occurs in the standard Apache modules is
CacheNegotiatedDocs, which is part of mod_negotiation. To create a non-argument directive, set the corresponding scalar variable
to the empty string '':
$CacheNegotiatedDocs = '';
- Directive Takes one Argument
-
This is probably the most common case. Set the corresponding global to the
value of your choice. Example:
$Port = 8080;
- Directive Takes Multiple Arguments
-
These include directives such as DirectoryIndex and AddType. Create a global array with the name of the directive and set it to the
list of desired arguments. For example:
@DirectoryIndex = map { "index.$_" } qw(html htm shtml cgi);
An alternative to this is to create a scalar variable containing the usual
value of the directive as a string, for example:
$DirectoryIndex = "index.html index.htm index.shtml index.cgi";
- Directive is Repeated Multiple Times
-
If a directive is repeated multiple times with different arguments each
time, you can represent it as an array of arrays. This example using the AddIcon directive shows how:
@AddIcon = (
[ '/icons/compressed.gif' => qw(.Z .z .gz .tgz .zip) ],
[ '/icons/layout.gif' => qw(.html .shtml .htm .pdf) ],
);
- Directive is a Block Section with Begin and End Tags
-
Configuration sections like <VirtualHost> and <Directory>
are mapped onto Perl hashes. Use the directive's argument (the hostname,
directory or URI) as the hash key, and make the value stored at this key an
anonymous hash containing the desired directive/value pairs. This is easier
to see than to describe. Consider the following virtual host section:
<VirtualHost 192.168.2.5:80>
ServerName www.fishfries.org
DocumentRoot /home/httpd/fishfries/htdocs
ErrorLog /home/httpd/fishfries/logs/error.log
TransferLog /home/httpd/fishfries/logs/access.log
ServerAdmin [email protected]
</Virtual>
You can represent this in a <Perl> section by the following code:
$VirtualHost{'192.168.2.5:80'} = {
ServerName => 'www.fishfries.org',
DocumentRoot => '/home/httpd/fishfries/htdocs',
ErrorLog => '/home/httpd/fishfries/logs/error.log',
TransferLog => '/home/httpd/fishfries/logs/access.log',
ServerAdmin => '[email protected]',
};
There is no special Perl variable which maps to the <IfModule>
directive container, however, the Apache module method will provide you with this functionality. Example:
if(Apache->module("mod_ssl.c")) {
push @Include, "ssl.conf";
}
The Apache define method can be used to implement an
<IfDefine> container. Example:
if(Apache->define("MOD_SSL")) {
push @Include, "ssl.conf";
}
Certain configuration blocks may require directives to be in a particular
order. As you should know, Perl does not maintain hash values in any
predictable order. Should you need to preserve order with hashes inside <Perl> sections, simply install Gurusamy Sarathy's Tie::IxHash module from CPAN. Once installed, mod_perl will tie %VirtualHost , %Directory , %Location and %Files hashes to this class, preserving their order when the Apache configuration
is generated.
- Directive is a Block Section with Multiple Same-Value Keys
-
The Apache named virtual host mechanism provides a way to configure virtual
hosts using the same IP address. For example:
NameVirtualHost 192.168.2.5
<VirtualHost 192.168.2.5>
ServerName one.fish.net
ServerAdmin [email protected]
</VirtualHost>
<VirtualHost 192.168.2.5>
ServerName red.fish.net
ServerAdmin [email protected]
</VirtualHost>
In this case, the %VirtualHost syntax from the previous section would not work, since assigning a hash
reference for the given IP address will overwrite the original entry. The
solution is to use an array reference whos values are hash references, one
for each virtual host entry. For example:
$VirtualHost{'192.168.2.5'} = [
{
ServerName => 'one.fish.net',
...
ServerAdmin => '[email protected]',
},
{
ServerName => 'red.fish.net',
...
ServerAdmin => '[email protected]',
},
];
- Directive is a Nested Block
-
Nested block sections are mapped onto anonymous hashes, much like main
sections. For example, to put two <Directory> sections inside the virtual host of the previous example, you can use this
code:
<Perl>
my $root = '/home/httpd/fishfries';
$VirtualHost{'192.168.2.5:80'} = {
ServerName => 'www.fishfries.org',
DocumentRoot => "$root/htdocs",
ErrorLog => "$root/logs/error.log",
TransferLog => "$root/logs/access.log",
ServerAdmin => '[email protected]',
Directory => {
"$root/htdocs" => {
Options => 'Indexes FollowSymlinks',
AllowOverride => 'Options Indexes Limit FileInfo',
Order => 'deny,allow',
Deny => 'from all',
Allow => 'from fishfries.org',
},
"$root/cgi-bin" => {
AllowOverride => 'None',
Options => 'ExecCGI',
SetHandler => 'cgi-script',
},
},
};
</Perl>
Notice that all the usual Perlisms, such as interpolation of the $root
variable into the double-quoted strings, still work here. Another thing to
see in this example is that in this case we've chosen to write the
multi-valued Options directive as a single string:
Options => 'Indexes FollowSymlinks',
The alternative would be to use an anonymous array for the directive's
arguments, as in:
Options => ['Indexes','FollowSymlinks'],
Both methods work. In Perl, there's always more than one way to do it. The
only gotcha is that you must always be sure of what is an argument list and
what isn't. In the Options directive, ``Indexes'' and ``FollowSymlinks'' are distinct arguments and
can be represented as an anonymous array. In the Order directive, the string ``deny,allow'' is a single argument, and representing
it as the array
['deny','allow'] will not work, even though it looks like it should (use the string ``deny,allow''
instead).
<Perl> sections are available if you built and installed mod_perl
with the PERL_SECTIONS configuration variable set (Appendix B). They are evaluated in the order in
which they appear in httpd.conf, srm.conf and
access.conf. This allows you to use later <Perl> sections to override values declared in earlier parts of the configuration
files.
If there is a syntax error in the Perl code causing it to fail during
compilation, Apache will report the problem and the server will not start.
One way to catch Perl syntax errors ahead of time is to structure your
<Perl> sections like this:
<Perl>
#!perl
#... code here ...
__END__
</Perl>
You can now directly syntax check the configuration file using the Perl
interpreter's -cx switches. -c makes Perl perform a syntax check, and -x tells the interpreter to ignore all junk prior to the
#!perl line:
% perl -cx httpd.conf
httpd.conf syntax OK
If the Apache configuration generated from your Perl code produces a syntax
error, this message will be sent to the server error log, but the server
will still start. In general, it is always a good to look at the error log
after starting the server to make sure startup went smoothly. If you have
not picked up this good habit already, we strongly recommend you do so when
working with <Perl>
configuration sections.
Another helpful trick is to build mod_perl with the PERL_TRACE
configuration option set to true. Then, when the enviroment variable
MOD_PERL_TRACE is set to s, httpd will output diagnostics showing how the <Perl> section globals are converted into directive string values.
Another tool that is occasionally useful is the
Apache::PerlSections module. It defines two public routines named
dump() and store(). dump() dumps out the current contents of the <Perl> section as a pretty-printed string. store does the same, but writes the contents to the file of your choice. Both
methods are useful for making sure that the configuration you are getting
is what you expect.
Apache::PerlSections requires the Perl Devel::Symdump and
Data::Dumper modules, both available on CPAN. Here is a simple example of its use:
<Perl>
#!perl
use Apache::PerlSections();
$User = 'nobody';
$VirtualHost{'192.168.2.5:80'} = {
ServerName => 'www.fishfries.org',
DocumentRoot => '/home/httpd/fishfries/htdocs',
ErrorLog => '/home/httpd/fishfries/logs/error.log',
TransferLog => '/home/httpd/fishfries/logs/access.log',
ServerAdmin => '[email protected]',
};
print STDERR Apache::PerlSections->dump();
__END__
</Perl>
This will cause the following to appear on the command line at server
startup time:
package Apache::ReadConfig;
#scalars:
$User = 'nobody';
#arrays:
#hashes:
%VirtualHost = (
'192.168.2.5:80' => {
'ServerAdmin' => '[email protected]',
'ServerName' => 'www.fishfries.org',
'DocumentRoot' => '/home/httpd/fishfries/htdocs',
'ErrorLog' => '/home/httpd/fishfries/logs/error.log',
'TransferLog' => '/home/httpd/fishfries/logs/access.log'
}
);
1;
__END__
The output from dump() and store() can be stored to a file and reloaded with a require statement. This allows you to create your configuration in a modular
fashion:
<Perl>
require "standard_configuration.pl";
require "virtual_hosts.pl";
require "access_control.pl";
</Perl>
More information about Apache::PerlSections can be found in Appendix A.
If the Perl configuration syntax seems a bit complex for your needs, there
is a simple alternative. The special variables $PerlConfig
and @PerlConfig are treated as raw Apache configuration data. Their values are fed directly
to the Apache configuration engine, and treated just as if it was static
configuration data.
Examples:
<Perl>
$PerlConfig = "User $ENV{USER}\n";
$PerlConfig .= "ServerAdmin $ENV{USER}\@$hostname\n";
</Perl>
<Perl>
for my $host (qw(one red two blue)) {
$host = "$host.fish.net";
push @PerlConfig, <<EOF;
Listen $host
<VirtualHost $host>
ServerAdmin webmaster\@$host
ServerName $host
# ... more config here ...
</VirtualHost>
EOF
}
</Perl>
One more utility method is available, Apache->httpd_conf which simply pushes each argument into the @PerlConfig array, along with tacking a newline onto the end of each. Example:
Apache->httpd_conf(
"User $ENV{USER}",
"ServerAdmin $ENV{USER}\@$hostname",
);
For a complete example of an Apache configuration constructed with
<Perl> sections, we'll look at Doug's setup. As a freelance
contractor, Doug must often configure his development server in a brand new
environment. Rather than creating a customized server configuration file
each time, Doug uses a generic configuration that can be brought up
anywhere simply by running:
% httpd -f $HOME/httpd.conf
This one step automatically creates the server and document roots if they
don't exist, as well as the log and configuration directories. It also
detects the user that it is being run as, and configures the
User and Group directives to match.
Listing 8.5 shows a slightly simplified version of Doug's
httpd.conf. It contains only two hard-coded Apache directives:
# file: httpd.conf
PerlPassEnv HOME
Port 9008
There's a PerlPassEnv directive with the value of ``HOME'', required in order to make the value
of this environment variable visible to the code contained within the <Perl> section, and there's a
Port directive set to Doug's favorite port number.
The rest of the configuration file is entirely written in Perl:
<Perl>
#!perl
$ServerRoot = "$ENV{HOME}/www";
The <Perl> section begins by choosing a path for the server root. Doug likes to have
his test enviroment set up under his home directory in ~/www, so the variable $ServerRoot is set to $ENV{HOME}/www. The server root will now be correctly configured regardless of whether
users' directories are stored under /home, /users, or /var/users.
unless (-d "$ServerRoot/logs") {
for my $dir ("", qw(logs conf htdocs perl)) {
mkdir "$ServerRoot/$dir", 0755;
}
require File::Copy;
File::Copy::cp($0, "$ServerRoot/conf");
}
Next, the code detects whether the server root has been properly
initialized, and if not, creates the requisite directories and
subdirectories. It looks to see whether C$ServerRoot/logs> exists and is
a directory. If not, the code proceeds to create the directories, calling mkdir() repeatedly to create first the server root and subsequently logs, conf, htdocs and perl subdirectories beneath it. The code then copies the generic httpd.conf file that is currently running into the newly-created conf subdirectory, using the File::Copy module's cp() routine. Somewhat magically,
mod_perl arranges for the Perl global variable $0 to hold the path of the *.conf file that is currently being processed.
if(-e "$ServerRoot/startup.pl") {
$PerlRequire = "startup.pl";
}
Next, the code checks whether there is a startup.pl present in the configuration directory. If this is the first time the
server is being run, the file won't be present, but there may well be one
there later. If the file exists, the code sets the $PerlRequire global to load it.
$User = getpwuid($>) || $>;
$Group = getgrgid($)) || $);
$ServerAdmin = $User;
The code sets the User, Group, and ServerAdmin directives next. The user and group are taken from the Perl magic variables $> and
$) , corresponding to the user and group IDs of the person who launched the
server. Since this is the default when Apache is run from a non-root shell,
this has no effect now, but will be of use if the server is run as root at
a later date. Likewise $ServerAdmin is set to the name of the current user.
$ServerName = `hostname`;
$DocumentRoot = "$ServerRoot/htdocs";
my $types = "$ServerRoot/conf/mime.types";
$TypesConfig = -e $types ? $types : "/dev/null";
The server name is set to the current host's name by setting the
$ServerName global, and the document root is set to
$ServerRoot/htdocs. We look to see whether the configuration file
mime.types is present, and if so use it to set $TypesConfig to this value. Otherwise, we use /dev/null.
push @Alias,
["/perl" => "$ServerRoot/perl"],
["/icons" => "$ServerRoot/icons"];
Next, the <Perl> section declares some directory aliases. The URI /perl is aliased to $ServerRoot/perl, and /icons is aliased to $ServerRoot/icons. Notice how the @Alias global is set to an array of arrays in order to express that it contains
multiple
Alias directives.
my $servers = 3;
for my $s (qw(MinSpareServers MaxSpareServers StartServers MaxClients)) {
$$s = $servers;
}
Following this the code sets the various parameters controlling Apache's
preforking. The server doesn't need to handle much load, since it's just
Doug's development server, so MaxSpareServers and friends are all set to a low value of three. We use ``symbolic'' or
``soft'' references here to set the globals indirectly. We loop through a
set of strings containing the names of the globals we wish to set, and
assign values to them as if they were scalar references rather than plain
strings. Perl automagically updates the symbol table for us, avoiding the
much more convoluted code that would be required to create the global using
globs or by accessing the symbol table directly. Note that this technique
will be blocked if strict reference checking is turned on with use strict 'refs'.
for my $l (qw(LockFile ErrorLog TransferLog PidFile ScoreBoardFile)) {
$$l = "logs/$l";
#clean out the logs
local *FH;
open FH, ">$ServerRoot/$$l";
close FH;
}
We use a similar trick to configure the LockFile, ErrorLog,
TransferLog and other logfile-related directives. A few additional lines of code
truncate the various log files to zero length if they already exist. Doug
likes to start with a clean slate every time he reconfigures and restarts a
server.
my @mod_perl_cfg = qw{
SetHandler perl-script
Options +ExecCGI
};
$Location{"/perl-status"} = {
@mod_perl_cfg,
PerlHandler => "Apache::Status",
};
$Location{"/perl"} = {
@mod_perl_cfg,
PerlHandler => "Apache::Registry",
};
The remainder of the configuration file sets up some directories for
running and debugging Perl API modules. We create a lexical variable named @mod_perl_cfg that contains some common options, and then use it to configure the /perl-status and /perl <Location> sections. The /perl-status URI is set up so that it runs Apache::Status
when retrieved, and /perl is put under the control of
Apache::Registry for use with registry scripts.
use Apache::PerlSections ();
Apache::PerlSections->store("$ServerRoot/ServerConfig.pm");
The very last thing that the <Perl> section does is to write out the current configuration into the file $ServerRoot/ServerConfig.pm. This snapshots the current configuration in a form that Doug can review
and edit, if necessary. Just the configuration variables set within the <Perl< section are snapshot. The PerlPassEnv
and Port directives, which are outside the section, are not captured and will have
to be added manually.
This technique makes possible the following interesting trick:
% httpd -C "PerlModule ServerConfig"
The -C switch tells httpd to process the directive
PerlModule, which in turn loads the module file ServerConfig.pm. Provided that Perl's PERL5LIB environment variable is set up in such a way that Perl will be able to find
the module, this has the effect of reloading the previously-saved
configuration and setting Apache to exactly the same state it had before.
- Listing 8.5: Doug's Generic httpd.conf
-
# file: httpd.conf
PerlPassEnv HOME
Port 9008
<Perl>
#!perl
$ServerRoot = "$ENV{HOME}/www";
unless (-d "$ServerRoot/logs") {
for my $dir ("", qw(logs conf htdocs perl)) {
mkdir "$ServerRoot/$dir", 0755;
}
require File::Copy;
File::Copy::cp($0, "$ServerRoot/conf");
}
if(-e "$ServerRoot/startup.pl") {
$PerlRequire = "startup.pl";
}
$User = getpwuid($>) || $>;
$Group = getgrgid($)) || $);
$ServerAdmin = $User;
$ServerName = `hostname`;
$DocumentRoot = "$ServerRoot/htdocs";
my $types = "$ServerRoot/conf/mime.types";
$TypesConfig = -e $types ? $types : "/dev/null";
push @Alias,
["/perl" => "$ServerRoot/perl"],
["/icons" => "$ServerRoot/icons"];
my $servers = 3;
for my $s (qw(MinSpareServers MaxSpareServers StartServers MaxClients)) {
$$s = $servers;
}
for my $l (qw(LockFile ErrorLog TransferLog PidFile ScoreBoardFile)) {
$$l = "logs/$l";
#clean out the logs
local *FH;
open FH, ">$ServerRoot/$$l";
close FH;
}
my @mod_perl_cfg = qw{
SetHandler perl-script
Options +ExecCGI
};
$Location{"/perl-status"} = {
@mod_perl_cfg,
PerlHandler => "Apache::Status",
};
$Location{"/perl"} = {
@mod_perl_cfg,
PerlHandler => "Apache::Registry",
};
use Apache::PerlSections ();
Apache::PerlSections->store("$ServerRoot/ServerConfig.pm");
__END__
</Perl>
When mod_perl is configured with the server, configuration files can be
documented with POD. There are only a handful of POD directives that
mod_perl recognizes, but enough so you can mix POD with actual server
configuration. The recognized directives are as follows:
- =pod
-
When a =pod token is found in the configuration file, mod_perl will soak up the file
line-by-line, until a =cut token or a special
=over token (as described below), is reached.
- =cut
-
When a =cut token is found, mod_perl will turn the configuration processing back over
to Apache.
- =over
-
The =over directive can be used in conjunction with the =back
directive to hand sections back to Apache for processing. This allows the pod2* converters to include the actual configuration sections in its output. In
order to allow for =over to be used elsewhere, mod_perl will only hand these sections back to Apache
if the line contains the string apache. Example:
=over to apache
- =back
-
When mod_perl is inside a special =over section as described above, it will go back to POD soaking mode once it
sees a =back directive. Example:
=back to pod
- __END__
-
While __END__ is not a POD directive, mod_perl recognizes this token when present in a
server configuration file. It will simply read in the rest of the
configuration file, ignoring each line until there is nothing left to read.
Complete example:
=pod
=head1 NAME
httpd.conf - The main server configuration file
=head2 Standard Module Configuration
=over 4
=item mod_status
=over to apache
#Apache will process directives in this section
<Location /server-status>
SetHandler server-status
...
</Location>
=back to pod
=item ...
...
=back
=cut
__END__
The server will not try to process anything here
We've now covered the entire Apache module API, at least as far as Perl is
concerned. The next chapter presents a complete reference guide to the Perl
API, organized by topic. This is followed in Chapter 10 by a reference
guide to the C language API, which fills in the details that C programmers
need to know about.
|
|