Home > General > Recover demaged MS-Word file

Recover demaged MS-Word file

Recently I was asked to try to repair Word file, when you open it you receive the the following message:

The name in the end tag of the element must match the element type in the start tag”

image

The file was loaded with mathematical formulas, and attempt to restore it or put in earlier versions failed. So how do you start?

Starting from a version 2007, Microsoft has adopted an agreed format for Word documents, Excel and Power Point, called Open XML Format. In fact, every Word file built from a collection of xml files, and all together contracted by zip, plus a “docx” extension. Suppose we have a file named MyDoc.docx, by changing the name to MyDoc.docx.zip, we can Extract with simple zip software, and receive a folder with all the xml files. To bring it back to a word file, simply compress back, and download the zip file name extension.

Step 1: Extracting to XML Files

First we’ll copy the corrupted file, in order not to hurt him more than he has been hurt. For example, let’s call him ErrorFile.docx. We will add the .zip extension. to the file name, and double clicking on it will open the actual extracted files. At this point we can see the structure of the xml.

image

Step 2: Finding the damaged line

Pay attention to the error message we got- the last line refers us to word/document.xml, line 2, column 93,496. Unfortunately, Word does not really handle the layout of the xml file, so most of the file is in one line, and that’s why we got this number – 93,496. The Document.xml file, is the file that manage all the structure of the Word document, and is actually the main file of any word document. In order to  work on it properly, we’ll copy it to another library.

To find the specific error in the file, please refer to the exact column that appears in the error description, and find why the file is invalid in this area. In practice, since we are dealing with large files, we might have to scan quite a lot of lines before we can understand where is the mistake. To minimize this time, you can simply open a new xml file on VisualStudio 2010 and paste the file into the design surface, and it will be done automatically. Changing the file lines cause the line number in the error message to be  irrelevant for us, but do not worry – Notice the red dots on the ScrollBar, they indicate the location of the error.image

A quick check by collapsing unnecessary elements resulted the following picture,that explains it all:

image

A sequence of elements that arranged not in the correct order – the opening element for <AlternateContent> is before <oMath>, but also its closing element. The <Choice> element’s location is not so clear.

Step 3: The solution

In order to find the right ratio of those elements for those who unfamiliar with this format, you should look for the rest of the file – how to do it right. You can find many places whereAlternateContent wraps Choice, and all that’s left is to change the location of oMath. 

Step 4: Running the new file

After fixing the file – document.xml, keep it and replace it with the original document.xml file, found under the ErrorFile.docx.zip we opened. No need to compress (and vice versa, compress by zip software usually will not work). Just return  to the parent directory, remove the zip extension, and open the file properly.

Next time, Please, create versions סמיילי.

 

Categories: General Tags: ,
  1. sebeni
    16/08/2013 at 15:42

    oh is so complicate. I need help reparing a word document with such an error.
    Can you help me, please?

  2. 10/06/2014 at 12:43
  3. Joey
    27/08/2014 at 15:19

    Another way is to open the damaged document with wordpad. You will lose some formatting etc.

  4. gardon
    09/10/2014 at 01:58

    mate, you just saved my @ass, thanks

  5. Madhu
    06/02/2015 at 08:25

    Thank you very much for the detailed post it helped me a lot

  6. Chayanan Nawal
    30/01/2016 at 17:40

    can you please help me recover it?

  7. Jamie P
    18/04/2016 at 16:44

    Fixed it, cheers!

  8. Chris W
    30/08/2016 at 17:15

    I managed to fix the document file and correct the code but it wont let me replaced the new one with the old one from the zip file? Any ideas?

  9. David M
    13/10/2016 at 18:59

    can you help me recover my file, I have the same error message but I don’t have the knowledge to follow your instructions.

  10. Peter Alberts
    06/05/2017 at 11:34

    fantastic article, this saved my bacon. Key is opening the XML file in Visual studio to pinpoint which tag is then offending, then used Notepad++ to fix the offending tag as the XML plugin you can show the file in tab view and then easy to fix
    Thanks Asaf

  1. No trackbacks yet.

Leave a comment