Wednesday, May 12, 2010

Web Applications and Dispatching

Historically the operation of finding the code to handle the web request was trivial - the server just matched the path part of the web request into the filesystem to find the CGI program producing the requested page. But then it got more complicated when we moved from the CGI 'one program equals one page' way into the realm of web applications that serve many pages.

The tree (or DAG) data structure is the underlying concept of both the path part of URL and many programming language library (or class) structures - so there is a natural way of mapping between them. To have code handling requests to related pages in the same file we can extend it so that the last part of the path is interpreted as a subroutine name.

That idea gets a bit more complicated when we realize that we don't want to expose all of our often internal methods (and libraries/classes) to be callable from outside - we need a way to mark them as 'external' and expose only these.

Of course sometimes there are reasons that we don't want to use such a literal mapping, it can be some security (by obscurity) need to hide internal code structures, need for more elegant URLs or other requirements. So sometimes we need to extend it or completely replace it with something else.

The Perl web frameworks (that I know the best) in respect to usage of that mapping can be roughly divided into following styles (many use more than one style so the boundaries are fuzzy):


  1. The 'old' CGI (or PHP) paradigm of one address one program

  2. Using the mapping of paths to libraries, selecting 'externally callable' methods by placing them on a list ( run_modes in CGI::Aplication ).

  3. Using the mapping of paths to libraries, but dispatching to methods configured via a hash (also run_modes in CGI::Application).

  4. Using the mapping of paths to libraries (but not to methods) and calling the 'get' or 'post' method in the landing library (like in Tatsumaki).

  5. Configuring the dispatching by code attributes of the methods (like in Catalyst).

  6. Not using methods but anonymous subroutines. This way it is easy to assign the dispatching configuration to the code by using DSLs (like in Dancer).

  7. Dispatching configuration external to the classes that are dispatched to.


Having the dispatching configuration close to the subroutine code let's you avoid switching between two places in the source code when adding a new action or changing an existing one. I think one of the greatest conveniences of Catalyst was that that configuration was not only in the same file but directly in the subroutine definition.

2 comments:

Anonymous said...

Many eons ago I made a framework but did not know I was doing that. It had the convention that
/foo/bar
would call method do_bar() of class foo. Should the method be callable by the url? Name it do_xxxx. Simple in a perlish way and no security issues.
(and
/foo/do_bar
would try to call do_do_bar and fail
)

zby said...

That's similar to what I did in WebNano - but instead of a 'do_' prefix I use an '_action' postfix. In fact you can also say that it is similar to ': Action' code attribute in Catalyst. Many people don't like how it involves giving semantic to name - but I like how low tech it is and in fact inheriting from a base class always gives semantic to names - like the point 3 above makes the 'get' method special, what we do here is make it a bit more general - we give semantic to all methods with a prefix (or postfix) - .i.e all methods that are 'get_*' (by the way maybe 'get_' would make a more acceptable prefix?). I need to write a longer post about that.