To let you customise various aspects of how your tidied output should look, there is a wide variety of options that can be passed in. As you saw in the previous script, the way to do this is to create an array where the keys are the option names and the values are the settings for those options, then pass that in as the second parameter when creating a Tidy object.
The official list of Tidy options is available online in the Tidy manual (see http://tidy.sourceforge.net/docs/quickref.html), but here are a few to get you started:
logical-emphasis: set to "true" to have Tidy change <I> tags to <EM>, and <B> to <STRONG>.
replace-color: set to "true" to have Tidy change numeric HTML colour values to their string equivalents, wherever possible. That is, #FFFFFF becomes "white".
show-body-only: set to "true" to have Tidy only output the contents of the <BODY> tag - no headers, no titles, not even the body tag itself. This is used to grab the content (and only the content!) of a web page.
word-2000: my favourite. Set to "true" to have Tidy turn Word 2000's mangled attempt at HTML into proper HTML.
vertical-space: set to "true" to have Tidy insert blank lines in the output to make it more readable.
fix-backslash: set to "true" if someone in your company likes writing URLs with a \ rather than a / - this corrects it.
Copyright 2012 Future Publishing Limited (company
registered number 2008885), a company registered
in England and Wales whose registered office is at