DOM parsing script

  • This is a Version 5 script.
  • The attributes are not shown by Ice Browser.
  • Explorer 5 on Mac has a bug in numbering the LI's in the nodemap: every OL starts with the number following the number of the previous LI, regardless of in which OL this LI is situated.
  • The question for 2003: What shall we do with the W3C DOM?

     

    This developer script prints out (part of the) document structure for you. This is particularly useful when you're writing your own scripts to generate parts of an HTML document and something goes wrong.

    When you're writing your own scripts to generate HTML and something goes wrong (nothing shows up, for example) and you desparately want to see the HTML code your script has generated, you can use the script below. In this situation 'View Source' doesn't help because it only shows the original HTML. This script, however, views the source for you.

    First some words about text nodes between HTML tags, followed by an example of the nodemap and then on to the actual script.

    Text nodes

    When I tested this script, I discovered that the output of Netscape and Explorer (surprise!) don't quite match. Netscape and Explorer 5 on Mac have a lot of empty text nodes wherever there's whitespace between a closing and the next opening tag.

    For instance, take this bit of HTML:

    <H2>The title</H2>
    
    <P>The first paragraph</P>
    

    What about the space between the </H2> and the <P>?

    Only when you place the tags like

    <H2>The title</H2><P>The first paragraph</P>
    

    the text node between the tags disappears.

    Another way to make the text node disappear is not closing your tag. If you do

    <P>The first paragraph
    
    <P>The second paragraph</P>
    

    there's no more text node between the paragraphs.

    To the script below I added a routine that hides empty text nodes from the nodemap.

    Example

    Even with the empty text node problem solved, there are plenty of incompatibilities in the document structure, the strangest of which is that Explorer refuses to print the values of the form fields. Anyway, load this page in Explorer 5 on Windows, Explorer 5 on Mac and Netscape 6 (any platform), view the nodemap of thirdtest (the form) and have fun puzzling out the differences.

    Below you see the form that rules the script. Fill in the ID of the element where the script should start, check whatever you want to check and press the button. For testing purposes, I gave several elements in this page an ID.
    You can fill in id firsttest to view the nodemap of this paragraph, id secondtest for the nodemap of special DIV that contains the document up to this paragraph and id thirdtest for the nodemap of the P containing the form.
    If you don't enter anything or the ID doesn't exist, the script takes the document as root.

    Readroot ID
    Show texts Show attributes Hide empty text nodes

    Especially in Explorer 5, generating the node map may take a while.

    The nodemap

  • Explorer 5 on Mac has a bug in numbering the LI's in the nodemap: every OL starts with the number following the number of the previous LI, regardless of in which OL this LI is situated.
  • The script

    Copy three things to your page:

    1. The script below
    2. The form below that. These are your controls.
    3. The empty DIV below that. The nodemap appears inside this DIV.

    The script (printed smaller than usual because it's so large):

    // PPK's DOMparse
    
    var readroot,writeroot;
    var lvl = 1;
    var xtemp = new Array();
    var ytemp = new Array();
    var ztemp = new Array();
    var atemp = new Array();
    
    function clearIt()
    {
    	if (!writeroot) return;
    	while(writeroot.hasChildNodes())
    	{
    		writeroot.removeChild(writeroot.childNodes[0]);
    	}
    
    }
    
    function init()
    {
    	if (!document.getElementById)
    	{
    		alert('This script doesn\'t work in your browser');
    		return;
    	}
    	formroot = document.forms['nodeform'];
    	read = formroot.write.value;
    	if (read && document.getElementById(read)) readroot = document.getElementById(read);
    	else readroot = document;
    	writeroot = document.getElementById('nodemap');
    	clearIt();
    	tmp1 = document.createElement('P');
    	tmp2 = document.createTextNode('Content of ' + readroot.nodeName + ' with ID = ' + readroot.id);
    	tmp1.appendChild(tmp2);
    	writeroot.appendChild(tmp1);
    	level();
    }
    
    function level()
    {
    	atemp[lvl] = document.createElement('OL');
    	for (var i=0;i<readroot.childNodes.length;i++)
    	{
    		x = readroot.childNodes[i];
    		if (x.nodeType == 3 && formroot.hideempty.checked)
    		{
    			var hide = true;
    			for (j=0;j<x.nodeValue.length;j++)
    			{
    				if (x.nodeValue.charAt(j) != '\n' && x.nodeValue.charAt(j) != ' ')
    				{
    					hide = false;
    					break;
    				}
    			}
    			if (hide) continue;
    		}
    		a1 = document.createElement('LI');
    		a2 = document.createElement('SPAN');
    		if (x.nodeType == 3) a2.className="text";
    		a3 = document.createTextNode(x.nodeName);
    		a2.appendChild(a3);
    		a1.appendChild(a2);
    		atemp[lvl].appendChild(a1);
    		if (x.nodeType == 3 && formroot.showtext.checked)
    		{
    			a6 = document.createElement('BR');
    			a5 = document.createTextNode(x.nodeValue);
    			a2.appendChild(a6);
    			a2.appendChild(a5);
    		}
    		if (x.attributes && formroot.showattr.checked)
    		{
    			a3 = document.createElement('SPAN');
    			a3.className="attr";
    			for (j=0;j<x.attributes.length;j++)
    			{
    				if (x.attributes[j].specified)
    				{
    					a5 = document.createElement('BR');
    					a6 = document.createTextNode(x.attributes[j].nodeName + ' = ' + x.attributes[j].nodeValue);
    					a3.appendChild(a5);
    					a3.appendChild(a6);
    				}
    			}
    			a2.appendChild(a3);
    		}
    		if (x.hasChildNodes())
    		{
    			lvl++;
    			xtemp[lvl] = writeroot;
    			ytemp[lvl] = readroot;
    			ztemp[lvl] = i;
    			readroot = readroot.childNodes[i];
    			writeroot = atemp[lvl-1];
    			level();
    			i = ztemp[lvl];
    			writeroot = xtemp[lvl];
    			readroot = ytemp[lvl];
    			lvl--;
    		}
    	}
    	writeroot.appendChild(atemp[lvl]);
    }
    
    // End PPK's DOMparse
    
    

    The form

    <FORM NAME=nodeform>
    <INPUT STYLE="width: 150px" NAME=write VALUE=secondtest>Readroot ID<BR>
    <INPUT TYPE=checkbox NAME=showtext>Show texts
    <INPUT TYPE=checkbox NAME=showattr>Show attributes
    <INPUT TYPE=checkbox NAME=hideempty CHECKED>Hide empty text nodes<BR>
    <INPUT TYPE=button VALUE="Make nodemap" onClick="init()">
    <INPUT TYPE=button VALUE="Clear nodemap" onClick="clearIt()">
    </FORM>
    

    The DIV

    <DIV ID=nodemap>
    </DIV>
    

    How to use the script

    Copy the script into the head of the page. Copy the FORM and the DIV to wherever you want. Then use the form to generate the nodemap inside the DIV.

    First, assign a readroot. This is an element with an ID of your choice. Fill in the ID in the text field and use the form. Now you get a map of the node and you can (hopefully) find out what goes wrong where.

    Each node is inside a <SPAN>. Text nodes get CLASS="text" and attributes get CLASS="attr" so you can improve the output by writing a style sheet for the two (or copying mine).

    Home