Using ISINDEX for server-side searches

Doing searches using the ISINDEX element is not difficult, but can appear tricky at first, but is clearer once you remember two basic things:

that ISINDEX tells the browser to collect data (typed text) from the user, which the browser the forwards to an HTTP server.
that a server does not usually know how to do a search, and instead passes the data on to a second program, called a gateway program, that does all the work.

This interface program between the database tools programs and the HTTP server is a script or program, usually placed in the server's "cgi-bin" directory. These scripts, known as gateway programs, are accessed via URLs such as http://www.foo.com/cgi-bin/foo, where foo is the name of the script or program, and the /cgi-bin/ path is a special path that references the directory containing the special programs and scripts that can be executed by the server. The name does not need to be /cgi-bin/,and in fact many sites have many different gateway program directories, each directory reflecting a particular project or task.

9.1.1 Example usage of ISINDEX

I have the file /u/www/Webdocs/Personnel on my http server. I want to allow someone to search this file for names, using a WWW browser, and I want to do this using the ISINDEX element.

1. The Server-side Script

Step one is to create a script to interface the server and browser with the search program (here a program named grep). My script is srch-example, which is found in my server's cgi-bin directory.

2. How Does This Script Work?

What happens? When the script is accessed it always prints the line Content-type: text/html. This is sent to the HTTP server, which in turn forwards it back to the browser. This particular line is a MIME content-type header, and tells the browser what type of data is being sent back. Here, this line tells the browser to expect a text/html document.

3. ISINDEX Signals a Search

the if statement checks to see if there are any command-line arguments to the script -- that is, whether the program was launched as if it were typed in as:

Arguments are passed from the browser to the server script via the URL: arguments are added to the end of the URL, separeted from the regular URL by a question mark. In our case there are no arguments, so we execute the first branch of the if. This section of the program echoes some standard HTML markup, and then sends the ISINDEX element. This tells the browser that this is a search, and that it should prompt the user for text input.

The browser display the received document and prompts the user for a search string. For example, Mosaic will present a fill-in template, where you type the desired search string. When you press return, the browser re-accesses the same URL as before, but this time appends the search string to the URL.

4. Second URL Access: Search Results

The above URL again accesses the program srch-example, but this time with an argument (ian), so that the second branch of the if is executed. This branch echoes new headings, indicating what was searched for, and runs the grep program to search the file. By default the output of grep is echoed, so the search results are sent to the browser. ISINDEX is NOT added here, as this branch provides the results of the search, byt does not contain a second box for user input. The returned result is a document containing the search results.

5. A Demonstration

That is, briefly, the whole story. If you've patiently read until now, you can test this example and see this script in action by accessing a appropriate test URL.

Data Encoding in URLs

The data typed in by the user must be specially encoded when placed in a URL, to avoid possible misinterpretation (i.e., accidentally breaking a URL at a space character). In addition, text input by the user into an ISINDEX query box is also encoded, to ensure safe transmission. The encoding mechanisms are rather complicated, and involve converting what are called "unsafe" ISO-Latin 1 characters into their ocatl encodings. For example, the space character becomes the code "%20". (Percent followed by the hexadecimal number code corresponding to the space character). For the details of how this all works you should either read my book (O.K, that was a cheap plug), or consult the detailed on-line documentation on URLs.