Clean HTML from MS Word

By default, Microsoft Word can save any word document in HTML format. But, the word processor is notorious in inserting extra information in the files saved by the ‘Save As’ command.

The HTML file it creates is bloated and is full of unnecessary codes. For example, a two lines text in Word resulted into more than 1,000 lines of HTML code.

word_document -to html

word_document -to html_html source

There is not an easy way to get rid of these extra codes, specially when you have more than a few lines of text in the Word file.

I found a Macro code in this discussion forum. A short tutorial is all you need to use it and yes, it worked in Microsoft Word 2007.


Leave a Reply

Your email address will not be published. Required fields are marked *