Sunday 31 May 2015

Fixing Files

recover files

Not all that is corrupt is lost...

As technology marches onwards, we become more and more reliant on hope that data we store long-term will remain in pristine condition. These days, it's entirely possible that everything from your music collection to your holiday snaps to your complete archive of personal correspondence could exist only digitally, with no hard copies available anywhere.

While having data on a computer makes it easy to search, archive and reproduce, it also makes it easy to destroy. Data in digital form is vulnerable, in some ways more vulnerable than when it's stored on paper and other 'analogue' mediums. After all, in what other manner could a single particle of smoke leave your half-finished novel wiped out or inaccessible? Where else can a single scratch mean that all your schoolwork or research material becomes instantly lost to you?


The threat of data corruption hangs over all computer users in some form or another. Most of the time, we're powerless to stop it, but that doesn't mean that you can't give yourself a better chance to survive it, and indeed, even repair some of the damage. Over the next few pages, we'll examine exactly how that's possible.

What Is File Corruption?


File corruption can occur for many different reasons, but the main symptom of it is always the same: your operating system's inability to find the data it expects to find. What we mean by 'expects' is a little wooly, but if you imagine the file as book, it could either have an incorrect cover or incorrect contents.
If the 'cover' is wrong, it means there's a problem in the file system. When you try to open a file, your file system (a collection of data about the files on your hard drive) tells the operating system what it should find - a file of a certain size or type, for example. Corruption in the file system means that this information is wrong. The file doesn't look like it should, and so the OS can't read it.

Yet even if that process occurs without difficulty - which is to say, the file system isn't corrupt - then the file itself could still be. The data in all files is encoded and structured in a specific way so that programs that want to open them know what to do with the information inside. In most files, you have a header at the start (in the book analogy, this would be the contents page) that tells applications how to understand the contents. Errors anywhere in the file or the header could leave it unreadable, because the two don't match one another - either the contents page is missing or wrong, or the pages it points to are.

To give a simple example, if you open a jpg file in notepad it'll mostly look like gibberish, but you'll see the letters 'JFIF' near the start. This is part of the file header and stands for 'JPEG File Interchange Format'. It essentially tells programs how to turn the 'gibberish' that follows into image data by making it clear what format the data is stored in. If that header is damaged, the program won't know what algorithms to apply to turn the data back into an image, and if the data is damaged, the application will try to convert it back into an image using the standard method for JFIF files but get no image data in return.

Most corruption affects the larger part of the file system/file header/file data organisation, which is to say the file data - the 'pages' of the book. There are several ways data can become damaged, but sheer probability means it usually hits the biggest part. File data can be overwritten due to a driver or software mistake, or lost if the hard drive sector it's stored on becomes physically damaged, or accidentally discarded if your computer loses power halfway through moving or copying the data. A large number of problems occur during saving when the file is being accessed and rewritten (making it highly vulnerable to accidents) but everything from magnetic interference to viral activity can cause corruption to occur.

In some situations, even a small amount of damage can leave render an entire file unusable. Archives especially rely on having the whole file make sense in order to reconstruct compressed data when it gets opened. Some files, such as JPEGs, can simply interpret junk data as valid, meaning you can easily access the rest of the file and even get a clear idea of how and where in a file data has been lost.

Finally, in some rare cases the file system itself might become corrupt, meaning that the file is intact but inaccessible to normal methods. In these cases the problem can normally be solved by repairing the file system's errors, recovering all data, but only if you act quickly enough.

So if the worst does happen, is there anything you can do?

Diagnosis And Prevention


Most people might instantly delete a corrupted file once they've realised that its program isn't going to open it any more. That, though, isn't a solution to anything except recovering hard drive space. If the file is corrupt, it's worth trying to figure out why, and most importantly it's worth trying to see how much of it you can recover.

If you discover a corrupt file and there's no obvious cause for it (i.e. you didn't experience a power cut or recently remove a virus) then the most likely cause is damage to the storage medium, and possibly even incoming hardware failure. CDs, DVDs and other types of disc can be scratched, but hard drives and SSDs can spontaneously fail, and that's the thing to look out for.

Hardware failure is serious for a number of reasons, not least because bad drive sectors rarely come along on their own. If one file has become corrupt, others will surely follow as the hardware begins to degrade. In all probability, the first corrupt file you notice isn't the first one to emerge on your system, so you may already have lost data without realising.

For this reason, the first thing you need to do when you find a corrupt file is check the integrity of your hardware. The easiest way to do this is to run a standard Windows disk scan. As well as checking the filesystem for errors, which will hopefully restore any files that are damaged due to damaged metadata rather than actual corruption, a full surface scan of the hardware will reveal any physically damaged ('bad') sectors.

If you don't find bad sectors, then it's good news: the corruption is the result of a software problem, and therefore not indicative of any larger problem. If you do, however, then your hardware is beginning to fail.

At this point the best thing you could do is stop using your system entirely. In all likelihood this would be very impractical, so the next best step is to back up any essential files and order replacement storage ready to transfer the data to new, intact hardware. Until that happens, the more you use your disk the more likely you are to corrupt more files and cause more physical damage, and both of these things increase the likelihood that your data will become unavailable in a complete failure.

Only when you've successfully verified that your hardware is safe to use and/or shored up the data that might be at risk should you attempt to fix any damaged data that you've discovered.

Bypassing/Fixing Corruption


As we've noted, on a standard Windows-based PC, file corruption can occur in one of two places: the file itself, or the MFT (Master File Table, also called the File Allocation Table), which tells the disk where files are stored. In both cases, files may become lost or inaccessible, but in the cast of MFT table corruption the vast majority of information will still be there.

Because of the way data is stored, it's likely that a partial form of stored data will always be available with the right tools at hand. In the case of physical damage to something like a hard drive, specialist disk plate readers can be used to retrieve the undamaged disk sectors from physically damaged disk plates. A disk editor - software that directly reads the information on the hard drive, bypassing the file system - can do the same for data that has been logically corrupted without physical damage.

That's specialist-level stuff, however, and more-than-likely to be out of the price range of all but the most desperate user, or a business that's reliant on the data they've lost. Home users might prefer to use a file-reconstruction program, which can scan corrupt data and attempt to extract the valid information by overwriting the corrupt data with something less confusing.

The software may, for example, attempt to reconstruct a file's checksum. When corruption happens on a small scale - that is, individual bits and bytes being altered - a CRC check will quickly notice this and declare the file invalid. However, it's then possible for software to read the file, use the CRC check to find the errors and then correct them. On a wider scale, the damaged data might be replaced with valid but non-original data, which makes the file valid to be opened and edited once more even if it doesn't bring back the lost data.

If the damage is to the header, recovering the data is even easier: you just need a program which can construct a new header, which isn't difficult at all as long as you know what type of file has been corrupted. Again, to use the example of a JPEG - as long as you know that the data is supposed to represent a JPEG image, it's a trivial matter for a program to reconstruct the header so that you can view it once more. Some image editors can even do it on the fly, forcing themselves to interpret data as a JPEG even if the necessary header is currently missing.

It's also possible to rewrite the header to take account for corrupted data elsewhere in a file. Microsoft's WMV format famously took the step of including significant data at the end of a file, which meant that incomplete downloads lacked a full header and would not support the seek function of video players. For this reason, many video editors include a 'repair WMV' function which can write a new header using only the downloaded material as a guide. The video file then becomes seekable, so data that could've been considered lost is now accessible again.

In the event of physical damage to a medium it becomes harder to recover lost data, but there is always the possibility of repair. A scratched DVD might seem unreadable and its data lost, but there are ways in which the disc can be repaired. As long as the damage is purely to the plastic surface of the disc rather than on the data storage layer of it, a simple home-repair kit can be used to take a thin layer of plastic off the disc, eliminating (or at least reducing the severity of) any scratches. The data is then recoverable despite the initial appearance of 'corruption'.

Useful Programs


Repairing corrupt files is virtually impossible to do manually, but a number of software programs exist that can do their best to repair them. In many cases data that has been damaged or overwritten can't be brought back, but if the rest of the file can be recovered you might only lose a small amount by comparison. It could mean the difference between rewriting all of your university dissertation and five pages of it - not to be sniffed at!

File Repair (www.filerepair1.com) is a freeware application which can reconstruct and salvage a wide variety of files, including Office documents, archive files, media, images and PDFs. In most cases the corrupt data can't be reconstructed, but the file header can be rewritten so that any valid data that remains will become accessible once again. It's a relatively simple program - if it can't recognise the format there's no recovery you can do, and likewise if the file is unreadable, inaccessible or too large to analyse then you're out of luck too. But as a free piece of software, it's worth trying out first.

The software Office Recovery (www.officerecovery.com) covers similar ground, but it's worth noting that it also offers an online version of its toolset, which allows you to upload corrupt data and subject it to meta-analysis to see if your file can be restored. The catch is that recovering your file in such a way is only free if you are willing or able to wait two weeks for the results. A paid version of the service charges $59 for a 48-hour access to your recovered file, so you've got to weigh up the cost-to-benefit.

Finally, if the problem is with a corrupt storage medium, you can use any number of undelete programs but Piriform's Recuva (www.piriform.com/recuva) is a strong choice. Again, it's freeware, it's able to scan even damaged media in the hope of recovering file fragments, and it can look on both your local drive or removable media to extract damaged files ready for processing by a more sophisticated recovery tool later down the line. Unfortunately, reconstructing corrupt data is all but impossible in the vast majority of cases. All you can hope to do is salvage what's there and make the best of it.

As ever, the only way to be completely immune from this type of failure is to create backups regularly and often. But if the worst happens and you're caught unaware, hopefully we've given you the tools and knowledge you need to get things back in working order!