by Brett Payne, of Tauranga, New Zealand
A simple method for transforming tabular or columnar data into web page format
IntroductionTabular data saved as tab-delimited text files can be viewed directly by most browsers. However, the column widths are predetermined, and the original alignment of the text is therefore usually destroyed.
There are several proprietary and shareware programmes which will convert tabular data into web pages. Most of these employ the standard <TAB> format used by HTML for tables. If the file is viewed in the form of a spreadsheet, using a programme such as Microsoft Excel, the same effect can be produced by using the /Save As HTML/ feature. However, HTML tables have some disadvantages. While it is possible to manipulate various parameters to encourage or prevent text wrap-around, column widths can be more difficult to control. In addition, the use of standard tables in HTML is accompanied by substantial increases in file-size.
This article presents a simple solution for transforming a tab-delimited text file to an ordinary text file that contains the right number of spaces to replace each tab symbol while maintaining column alignment. There are two alternative methods that can be used, either directly from the tab-delimited text file to ordinary text using Microsoft Word, or via a spreadsheet programme, such as Microsoft Excel. The latter option is often useful, as it will permit sorting and other manipulation of the data prior to output.
MS Word is first used to view
the tab-delimited text file. It is necessary to ensure that the tabs
are arranged at suitable intervals. In other words, care should be
taken that none of the text overflows the formatted column widths.
With wider tables it may be necessary to change the page width under the
/Page Setup/ option accordingly.
Microsoft Excel Spreadsheet
Web Page created using MS Excel /Save as HTML/ facility
Microsoft Word Document
Web Page created using MS Word /Save as HTML/ facility
Aligned Text File
Web Page created using TAB2HTML method
All six of the above files contain exactly the same original data, merely displayed in different formats. The web page produced using the MS Word /Save as HTML/ facility failed because the columns of text were no longer aligned. Modern internet browsers generally include a facility to view *.txt (and *.xls or *.doc) files. However, as clicking on the link to the Tab-delimited Text File above demonstrated, the results are sometimes less than satisfactory. Using the TAB2HTML technique produces an HTML file twice the size of the original tab-delimited text file, but results in a space-saving of more than 80% over that generated with the MS Excel /Save as HTML/ method.
In other words, if tabular data is presented on the web using the methods described, nearly six times as much data can be fitted in the same space as that taken up by HTML files produced by the standard facility in MS Excel. If the same data is stored as an ordinary MS Word or Excel file, it can use twice as much space again.
This has significant implications for those who transcribe and index genealogical data such as parish registers, census data or cemetery records, and wish to publish this information on the internet. After entry into a simple text or spreadsheet file using "off the shelf" software, the data can be readily compiled, manipulated, sorted or edited in a variety of ways before transfer to an HTML file, without the need for great expertise or expensive database programmes. The data may also be shared between users without difficulty because only widely recognised file formats are employed.
|GenDesk offers a new research tool for genealogists - the payment-based QueryBoard. Benefit from the wide knowledge base and experience of a growing team of international researchers. GenDesk also hosts a free genealogy chat room, where you can share your experiences or get research hints from experienced and friendly amateur researchers.|