The big data revolution has come to many industries, but the rise of analytics creates an operational climate in which companies must become much more aware of how their information moves through networks and where it ends up when hardware reaches its end-of-service-life date. A report from The News International puts this issue in focus by exploring the growing role that big data is playing in the oil and gas sector.

According to the news source, the oil and gas industry has begun to recognize the importance of analytics in light of shrinking profit margins, and are therefore deploying sensors, monitoring devices and similar solutions throughout their locations to increase transparency within operations. There is significant hope that these investments pay off down the line, but industry expert Kate Richard explained that companies getting involved in big data need to be prepared to manage all of that information.

“There is a huge amount of data prep, data sanitization and data extraction needed for big data to be totally disruptive,” Richard told The News International.

Data sanitization and destruction is particularly critical as the process is often an afterthought in big data plans. A Government Computing News report warned that businesses becoming more reliant on information to inform operations are running into the problem of data spills – situations in which authorized user activity is mishandled in such a way that data is lost.

Poor sanitization processes are a prime example of a data spill. A company can perform a software erase and ship a storage device off to be shredded, only to have that drive get lost in transit, be discovered by somebody and have the information get compromised. Data sanitization best practices must be followed to avoid these sorts of problems, and it is particularly important in light of the big data movement.

The big data problem when it comes to sanitization
Big data is all about bringing in information from both structured and unstructured sources and applying it to analytics efforts. In one breath, a company may be capturing transaction reports and customer survey results. In the next, it could be poring over social media posts and word processing documents. Analytics systems must capture all of this data, make sense of what is in regardless of whether it is structured or unstructured, and deliver it to end users in an actionable way. This ends up involving business intelligence technologies and a variety of other specialized tools.

“With big data, organizations must contend with highly varied life cycles.”

The problem is, big data isn’t just about immediate analysis. Sure, a manager at a retail store may get a notification that a customer just checked in on social media and use that personal data to send out a coupon. An executive months later may end up looking at all records related to customers stating they are at a retail location, how often managers offer them promotions and how those coupons impact immediate sales and long-term brand loyalty. In both cases, unstructured and structured data is coming into play. The difference is that the executives are taking a top-down look and may end up holding on to consumer information for months, if not years, to glean strategic value.

With big data, organizations must contend with highly varied life cycles, information that moves between diverse hardware systems and is handled by varied user groups. What do you do, for example, if an executive shares a slideshow with sensitive data with a trusted partner, but that individual is not authorized to take control of that information because it is regulated in some way? Where is unused big data going for archival purposes? What is happening to decommissioned hard disks that may have held regulated data at any point in their service life? Businesses must be able to answer these questions if the way to navigate the data sanitization climate successfully. Once these questions have been dealt with, it is time to start thinking about hardware.

Establishing a flexible data destruction setup
Properly sanitizing/destroying storage devices begins with the chain of custody. When it comes to a big data environment, the chain starts the first time somebody writes data onto a device. From that point on, all databases or file directories on that hard disk should be logged so users know precisely what data they are destroying. From there, it is generally best to have at least two workers around when pulling hard disks from storage arrays and setting them aside for destruction. The second worker reduces the likelihood of fraud and errors. A software wipe can be useful, but organizations can also use a degaussing wand to quickly erase data. Software erasure isn’t full sanitization.

“When combined with degaussers, shredding represents a complete data destruction solution.”

Having a powerful degausser becomes useful once an organization has a bunch of drives being decommissioned. Degaussing is an easy way to purge data from a disk, and it completely destroys information if the magnetic force of the degaussers being used is greater than the magnetic resistance of the hard disk. Once again, it is important to have multiple employees present through this process. Furthermore, each phase of destruction should be fully documented.

Now that the data is gone, it is time to physically destroy the disk to add another layer of protection and allow for easier disposal. Hard disk shredder solutions are a popular option here, as they can tear hard drives into tiny pieces that allow for easy disposal. When combined with degaussers, shredding represents a complete data destruction solution. Be careful not to skip degaussing, however, as shredding alone leaves information recoverable.

This process is key in preventing any data from slipping through the cracks in their security strategies, but it is also important to be flexible based on the type of information being handled. Different industries will have various requirements, and some data isn’t going to be sensitive enough to require complete sanitization. In many cases, it is easiest to have a set procedure in place and follow it evenly to avoid error, but don’t neglect device and data type as you put that process into place.