Thursday, June 24, 2010

Implementing file upload progress bar

At work I was assigned the task of replacing the previous hodgepodge of tools that provide the progress bar functionality for forms with file uploads. At first glance this seems like a trivial thing to do - you periodically observe how much of the file you've got and update the progress bar, but there are details that make it hard to do with the current tools. I report here how I solved that problem - not because I think this is the optimal way - but rather to open the discussion and because I could not find any description of solving it when I googled for it.

First of all the common programming tools (like CGI.pm or Mason that we use here) assume that the page handler receives the whole request as input - and that whole request is not available until after the file is uploaded. So for example 'my $q = CGI.pm->new' will not finish until it is too late to measure the upload progress. The solution to that is to use another page to report the upload progress and call that page via Ajax from Javascript code updating the progress bar. This would work great - but the file is normally uploaded to a temporary file with a random name and the other script would not have any chance to guess it. We need to generate a new random file name in the form page and then pass that name to the form handler script so that it would save the data to that file, and in parallel to the Ajax scripts that would check the size of that file.


To save the data into a specified filename I used the CGI.pm callback feature:

my $q = CGI->new( \&hook, $fh, undef );
...
sub hook {
my ($filename, $buffer, $bytes_read, $fh) = @_;
print $fh substr($buffer, 0, $bytes_read);
$fh->flush();
}

It is described in the subsection called "Progress bars for file uploads and avoiding temp files" of the CGI.pm documentaion, but actually it is a great leap of thought to say that it supports progress bar implementation, you still cannot use it directly to get the progress bar from the CGI object on the form landing page, you still need the separate scripts measuring the progress. For my solution all I needed was to pass the target file name to the code saving the data, this could be easier than writing this callback above. And the callback is still not everything - I yet need a way to pass the generated filename from the form page to that script - and not via form parameters, remember they are not available at that stage. So how can that be done? Simple - as PATH_INFO - which is available in the %ENV hash even before the params are parsed by CGI.pm.

This is the skeleton of the solution - there are a few more details in the actual implementation - but the code will be published soon as Open Source - so I hope everyone will be able to look them up there.

No comments: