Extending Prototype - getInnerText()

written by Tobie on October 18th, 2006 @ 01:21 AM

It is sometimes useful to be able to access the pure textual content of an HTML element (i.e. with tags removed). For example, when sorting data in a table via JavaScript (something I'm currently working on), getting rid of the tags is crucial: consider the following:

<table>
  <tr>
    <th>Animal</th>               <th>Color</th>
  </tr>
  <tr>
    <td>Snake</td>                <td>green</td>
    <td>Zebra</td>                <td>black and white</td>
    <td><em>Lumpa Lumpa</em></td> <td>pink</td>
  </tr>
</table>

Because of the <em> tags, Lumpa Lumpa would always come last when sorted alphabetically... which is plain wrong.

One WC3 compatible solution would be to loop through the element's node, find the text nodes and concanate them, but that would be very, very slow.

innerText to the Rescue

innerText is a proprietary method which does just what we need... way faster. It works just like innerHTML, plus it strips all tags, line breaks and extra spaces from the element's content. Unfortunately, it is only partially supported by most modern browsers and does not exist in Firefox. Quirksmode has a pretty good rundown of which browsers support innerText and which don't, but it unfortunately misses out a couple of peculiarities without which DOM scripting wouldn't be any fun.

  • if an element is hidden (i.e. has the following CSS applied to it: visibility: hidden; or display: none;) innerText will return an empty string in Safari, whatever the content of the element may be.
  • Not only does Opera leave line breaks and extra white-space intact, it also displays scripts contained within <script> tags:
<div id="myDiv">
  Some text here.
  <script>
    reallyCoolFunction();
  </script>
  More text there.
</div>

$('myDiv').innerText
// -> 'Some text here.\n    reallyCoolFunction();\n  More text there.'

UPDATE: As a sidenote, Firefox's textContent property has the same issues.

Other than that, the code is pretty straightforward (note that I'm using Prototype version 1.5.0_rc1):

Element.addMethods({  
  getInnerText: function(element) {
    element = $(element);
    return element.innerText && !window.opera ? element.innerText
      : element.innerHTML.stripScripts().unescapeHTML().replace(/[\n\r\s]+/g, ' ');
  }
});

step by step rundown

First, we attach our getInnerText() method to every element using Element.addMethods() (you can find more about Element.addMethods() here).

Element.addMethods({ 
  getInnerText: function(element) {
    element = $(element);

We then check that innerText is available, and that it does not return an empty string (in which case we'd rather double check because of the Safari issue mentioned above). If so, we use it... except if the browser is Opera (also see above).

    return element.innerText && !window.opera ? element.innerText

If innerText is not available (or if it rendered an empty string) we use innerHTML instead, striping scripts, tags, replacing HTML entities (&amp;, &ldquo; and the like), removing line breaks, extra white-space, etc.

      : element.innerHTML.stripScripts().unescapeHTML().replace(/[\n\r\s]+/g, ' ');
  }
});

You can download the source code as part of my very small add-on library to Prototype. I'll post a patch along with tests on trac as soon as I find the time to do so.

UPDATE: I finally posted this as a patch. One of the comments, (by Mislav) suggested another name for this method. I tend to agree with the fact that getInnerText is not the most brilliant, however I'm not to sure what he suggested (i.e. getContent()) is descriptive enough. Any suggestions?

Comments are closed