How dlp works. How to get DLP system to work. Solution for protecting printed, graphic and electronic copies of documents

Leakage channels leading to the withdrawal of information outside the company's information system can be network leaks (for example, e-mail or ICQ), local (using external USB drives), stored data (databases). Separately, we can highlight the loss of media (flash memory, laptop). A system can be classified as DLP if it meets the following criteria: multichannel (monitoring of several possible channels of data leakage); unified management (unified management tools for all monitoring channels); active protection (compliance with security policy); taking into account both content and context.

The competitive advantage of most systems is the analysis module. Manufacturers push this module so much that they often refer to their products by it, for example, "DLP-based solution based on tags." Therefore, the user often chooses solutions not for performance, scalability or other traditional for the corporate market information security criteria, namely based on the type of document analysis used.

Obviously, since each method has its own advantages and disadvantages, the use of only one method of document analysis puts the solution in technological dependence on it. Most manufacturers use several methods, although one is usually the "flagship" method. This article is an attempt to classify the methods used in document analysis. An assessment of their strengths and weaknesses is given based on the experience of the practical application of several types of products. The article does not fundamentally consider specific products, since the main task of the user when choosing them is to screen out marketing slogans such as "we will protect everything from everything", "unique patented technology" and the awareness of what he will be left with when the sellers leave.

Container analysis

This method analyzes the properties of a file or other container (archive, cryptodisk, etc.) containing information. The colloquial name of such methods is “tagged solutions”, which quite fully reflects their essence. Each container contains a label that uniquely identifies the type of content contained within the container. These methods practically do not require computational resources to analyze the information being moved, since the tag fully describes the user's rights to move content along any route. In a simplified form, such an algorithm sounds like this: "if there is a label - we prohibit, there is no label - we skip."

The advantages of this approach are obvious: the speed of analysis and the complete absence of errors of the second kind (when the system mistakenly detects an open document as confidential). Some sources call such methods "deterministic".

The disadvantages are also obvious - the system only cares about the tagged information: if the tag is not set, the content is not protected. It is necessary to develop a procedure for placing labels on new and incoming documents, as well as a system for preventing the transfer of information from a marked container to an unmarked one by means of operations with the buffer, file operations, copying information from temporary files, etc.

The weakness of such systems is also manifested in the organization of the placement of labels. If they are placed by the author of the document, then by malicious intent he has the opportunity not to mark the information that he is going to steal. In the absence of malice, negligence or carelessness will sooner or later appear. If you oblige to place tags of a certain employee, for example, an information security officer or a system administrator, then he will not always be able to distinguish confidential content from open, because he does not thoroughly know all the processes in the company. Thus, the "white" balance must be posted on the company's website, and the "gray" or "black" balance must not be taken out of the information system. But only the chief accountant can distinguish one from the other, i.e. one of the authors.

Labels are usually classified into attribute, format, and external. As the name implies, the first are placed in the file attributes, the second - in the fields of the file itself, and the third - are attached to the file (associated with it) by external programs.

Container structures in cybersecurity

Sometimes, low performance requirements for interceptors are also considered an advantage of solutions on labels, because they only check labels, i.e. act like turnstiles in the metro: "if you have a ticket, go in." However, do not forget that miracles do not happen - in this case, the computing load is shifted to workstations.

The place of decisions on labels, whatever they may be, is the protection of document storages. When a company has a document storage, which, on the one hand, is replenished quite rarely, and on the other hand, the category and level of confidentiality of each document are precisely known, then it is easiest to organize its protection just using tags. You can organize the placement of labels on documents entering the repository using an organizational procedure. For example, before sending a document to the repository, the employee responsible for its operation can contact the author and the specialist with the question of what level of confidentiality to set for the document. This task is especially successfully solved with the help of format labels, i.e. each incoming document is saved in a secure format and then issued at the request of the employee, indicating it as admitted to reading. Modern solutions allow assigning access rights for a limited time, and when the key expires, the document simply stops being read. It is according to this scheme, for example, the issuance of documentation for public procurement tenders in the United States is organized: the procurement management system generates a document that can be read without the ability to change or copy the content only to the bidders listed in this document. The access key is valid only until the deadline for submitting documents to the competition, after which the document ceases to be read.

Also, with the help of solutions based on tags, companies organize workflow in closed network segments in which intellectual property and state secrets are circulated. Probably, now, according to the requirements of the Federal Law "On Personal Data", document flow will also be organized in the personnel departments of large companies.

Content analysis

When implementing the technologies described in this section, in contrast to those described earlier, on the contrary, it makes absolutely no difference in which container the content is stored. The purpose of these technologies is to extract meaningful content from a container or intercept transmission over a communication channel and analyze the information for the presence of prohibited content.

The main technologies in identifying prohibited content in containers are signature control, hash control and linguistic methods.

Signatures

The simplest control method is to search the data stream for a sequence of characters. Sometimes a forbidden sequence of characters is called a "stop word", but more generally it can be represented not by a word, but by an arbitrary set of characters, for example, the same label. In general, this method cannot be classified as content analysis in all its implementations. For example, in most devices of the UTM class, the search for prohibited signatures in the data stream occurs without extracting text from the container, when analyzing the "as is" stream. Or, if the system is configured for only one word, then the result of its work is the definition of 100% coincidence, i.e. the method can be classified as deterministic.

However, more often the search for a specific sequence of characters is still used when analyzing text. In the overwhelming majority of cases, signature systems are configured to search for multiple words and the frequency of occurrence of terms, i.e. We will still refer this system to content analysis systems.

The advantages of this method include independence from the language and ease of replenishing the dictionary of forbidden terms: if you want to use this method to search for a word in Pashto in a data stream, you do not need to know this language, you just need to know how it is spelled. It is just as easy to add, for example, transliterated Russian text or "Albany" language, which is important, for example, when analyzing SMS-texts, ICQ messages or blog posts.

The disadvantages become apparent when using non-English language. Unfortunately, most manufacturers of text analysis systems work for the American market, and the English language is very "signature" - word forms are most often formed using prepositions without changing the word itself. In Russian, everything is much more complicated. Take, for example, the word “secret” that is dear to an information security officer. In English, it means the noun "secret", and the adjective "secret", and the verb "secret". In Russian, from the root "secret", you can form several dozen different words. Those. if in an English-speaking organization it is enough for an information security officer to enter one word, in a Russian-speaking one he will have to enter a couple of dozen words and then change them in six different encodings.

Moreover, such methods are not robust against primitive coding. Almost all of them give in to the favorite trick of novice spammers - replacing characters with similar ones. The author has repeatedly demonstrated to security officers an elementary trick - passing confidential text through signature filters. A text is taken containing, for example, the phrase "top secret", and a mail interceptor tuned to this phrase. If the text is opened in MS Word, then the two-second operation: Ctrl + F, "find" o "(Russian layout)", "replace with" o "(English layout)", "replace all", "send document" - makes the document completely invisible to this filter. It is all the more offensive that such a replacement is being carried out regular means MS Word or any other text editor, i.e. they are available to the user even if he does not have local administrator rights and the ability to run encryption programs.

Most often, signature flow control is included in the functionality of UTM devices, i.e. solutions that cleanse traffic from viruses, spam, intrusions and any other threats that are detected by signatures. Since this feature is "free", users often feel that this is enough. Such solutions really protect against accidental leaks, i.e. in cases where the outgoing text is not changed by the sender in order to bypass the filter, but they are powerless against malicious users.

Masks

The search functionality for stop-word signatures is expanded to search for their masks. It is a search for such content, which cannot be precisely indicated in the base of "stop words", but its element or structure can be indicated. This information should include any codes that characterize a person or company: TIN, account numbers, documents, etc. It is not possible to search for them using signatures.

It is unreasonable to set the number of a specific bank card as a search object, but you want to find any credit card number, no matter how it is written - with spaces or together. This is not just a desire, but a requirement of the PCI DSS standard: it is forbidden to send unencrypted plastic card numbers by e-mail, i.e. it is the user's responsibility to find such numbers in e-mail and to dump prohibited messages.

For example, here is a mask that sets a stop word such as the name of a confidential or secret order, the number of which starts with zero. The mask takes into account not only an arbitrary number, but also any case and even the replacement of Russian letters with Latin ones. The mask is written in the standard "REGEXP" notation, although different DLP systems may have their own, more flexible notations. The situation is even worse with phone numbers. This information is classified as personal data, and it can be written in dozens of ways - using various combinations of spaces, different types of brackets, plus and minus, etc. Here, perhaps, the only mask is indispensable. For example, in anti-spam systems, where you have to solve a similar problem, to detect phone number use several dozen masks at the same time.

Many different codes inscribed in the activities of the company and its employees are protected by many laws and represent a commercial secret, banking secrets, personal data and other legally protected information, so the problem of detecting them in traffic is a prerequisite for any decision.

Hash functions

Different types of hash functions for confidential document samples were at one time considered a new word in the leakage protection market, although the technology itself has been around since the 1970s. In the West, this method is sometimes called "digital fingerprints", i.e. "Digital fingerprints", or "shindles" in scientific slang.

The essence of all methods is the same, although the specific algorithms for each manufacturer may differ significantly. Some algorithms are even patented, which confirms the uniqueness of the implementation. The general scenario is as follows: a database of samples of confidential documents is collected. A "fingerprint" is taken from each of them, i.e. meaningful content is extracted from the document, which is reduced to some normal, for example (but not necessarily) textual form, then hashes of all content and its parts, for example, paragraphs, sentences, five words, etc. are removed, the detailing depends on the specific implementation. These prints are stored in a special database.

The intercepted document is in the same way cleared of service information and brought to its normal form, then the shindly prints are removed from it using the same algorithm. The resulting prints are searched for in the database of confidential documents, and if found, the document is considered confidential. Since this method is used to find direct quotes from a sample document, the technology is sometimes called "anti-plagiarism".

Most of the advantages of this method are at the same time its disadvantages. First of all, this is the requirement to use sample documents. On the one hand, the user does not have to worry about stop words, meaningful terms and other information that is completely non-specific for security officers. On the other hand, “no sample, no security,” which poses the same problems with new and incoming documents as when addressing tag-based technologies. A very important advantage of this technology is its focus on working with arbitrary sequences of characters. This implies, first of all, independence from the language of the text - even if hieroglyphs, even Pashto. Further, one of the main consequences of this property is the ability to take prints from non-textual information - databases, drawings, media files. These technologies are used by Hollywood studios and world recording studios to protect media content in their digital storages.

Unfortunately, low-level hash functions are not robust against the primitive encoding discussed in the signature example. They easily deal with reordering of words, rearranging paragraphs and other tricks of the "plagiarists", but, for example, changing letters throughout the document destroys the hash pattern and such a document becomes invisible to the interceptor.

Using this method alone makes it difficult to work with forms. So, an empty loan application form is a freely distributed document, and a filled one is confidential, since it contains personal data. If you simply unprint the blank form, then the intercepted completed document will contain all the information from the blank form, i.e. the prints will match a lot. Thus, the system will either pass confidential information or prevent the free distribution of empty forms.

Despite the mentioned disadvantages, this method is widespread, especially in such a business that cannot afford qualified employees, but operates on the principle of "put all confidential information in this folder and sleep well." In this sense, the requirement of specific documents for their protection is somewhat similar to solutions based on labels, only stored separately from samples and preserved when the file format is changed, part of the file is copied, etc. However, large businesses with hundreds of thousands of documents in circulation are often simply unable to provide samples of confidential documents, since the company's business processes do not require this. The only thing that is (or, more honestly, should be) in every enterprise is the "List of information constituting a commercial secret." Making samples out of it is not a trivial task.

The ease of adding samples to a controlled content base often plays a trick on users. This leads to a gradual increase in the database of fingerprints, which significantly affects the performance of the system: the more samples, the more comparisons of each intercepted message. Since each print takes up 5 to 20% of the original, the base of prints grows gradually. Users notice a sharp drop in performance when the database begins to exceed the amount of RAM on the filter server. Typically, the problem is solved by regularly auditing sample documents and removing obsolete or duplicate samples, i.e. saving on implementation, users lose on operation.

Linguistic methods

The most common method of analysis today is the linguistic analysis of the text. It is so popular that it is often colloquially referred to as "content filtering". bears the characteristic of the entire class of content analysis methods. In terms of classification, both hash analysis, signature analysis, and mask analysis are "content filtering", i.e. filtering traffic based on content analysis.

As the name implies, the method works only with texts. You will not use it to protect a database consisting only of numbers and dates, much less blueprints, drawings and a collection of your favorite songs. But with texts, this method works wonders.

Linguistics as a science consists of many disciplines - from morphology to semantics. Therefore, linguistic methods of analysis also differ from each other. There are methods that use only stop words, only entered at the root level, and the system itself already compiles a complete dictionary; there are weights based on terms found in the text. There are linguistic methods and their own prints based on statistics; for example, a document is taken, the fifty most used words are counted, then the 10 most used words are selected in each paragraph. Such a "dictionary" is an almost unique characteristic of the text and allows you to find meaningful quotations in the "clones".

The analysis of all the subtleties of linguistic analysis is beyond the scope of this article, so we will focus on the advantages and disadvantages.

The advantage of the method is complete insensitivity to the number of documents, i.e. scalability, rare for corporate information security. The content filtering base (a set of keyword classes and rules) does not change in size from the appearance of new documents or processes in the company.

In addition, users note in this method a similarity to "stop words" in the part that if the document is delayed, it is immediately clear why it happened. If a fingerprint-based system reports that a document is similar to another, then the security officer himself will have to compare the two documents, and with linguistic analysis he will receive the already marked content. Linguistic systems, along with signature filtering, are so common because they allow you to start working without changes in the company immediately after installation. There is no need to fiddle with tagging and fingerprinting, inventory documents, and other non-security officer specific work.

The disadvantages are just as obvious, and the first is language dependence. In every country, the language of which is supported by the manufacturer, this is not a disadvantage, however, from the point of view of global companies that have, in addition to a single corporate language (for example, English), many documents in local languages \u200b\u200bin each country, this is a clear disadvantage.

Another drawback is the high percentage of errors of the second kind, which requires qualification in the field of linguistics to reduce it (for fine-tuning the filtering base). Industry standard databases typically give 80-85% filtration accuracy. This means that every fifth or sixth letter is intercepted by mistake. Setting up the base to an acceptable 95-97% response accuracy is usually associated with the intervention of a specially trained linguist. And although to learn how to adjust the filtering base it is enough to have two days of free time and speak the language at the level of a high school graduate, there is no one to do this work, except for a security officer, and he usually considers such work non-core. It is always risky to attract a person from the outside - after all, he will have to work with confidential information. The way out of this situation is usually to purchase an additional module - a self-learning "autolinguist" who is "fed" false positives, and he automatically adapts the standard industry base.

Linguistic methods are chosen when they want to minimize interference with the business, when the information protection service does not have the administrative resource to change the existing processes of creating and storing documents. They work anytime, anywhere, albeit with the disadvantages mentioned.

Popular channels of accidental leaks mobile storage media

InfoWatch analysts believe that mobile storage media (laptops, flash drives, mobile communicators, etc.) remain the most popular channel for accidental leaks, since users of such devices often neglect data encryption tools.

Another frequent cause of accidental leaks is paper media: it is more difficult to control it than electronic one, since, for example, after a sheet leaves the printer, it can only be monitored “manually”: control over paper media is weaker than control over computer information. Many protection against leaks (they cannot be called full-fledged DLP systems) do not control the channel of information output to the printer - this way confidential data can easily go outside the organization.

This problem can be solved by multifunctional DLP systems that block illegal information from being sent to print and check compliance. mailing address and the addressee.

In addition, the growing popularity of mobile devices makes it much more difficult to provide protection against leaks, as there are no corresponding DLP clients yet. In addition, it is very difficult to detect a leak in the case of cryptography or steganography. An insider can always ask for “ best practices" to the Internet. That is, DLP tools protect against organized deliberate leakage rather poorly.

The effectiveness of DLP tools can be hampered by their obvious flaws: modern leakage protection solutions do not allow you to control and block all available information channels. DLP systems will control corporate mail, using the web, instant messaging, working with external media, printing documents and the contents of hard drives. But Skype is still not under control for DLP systems. Only Trend Micro has been able to claim that it has control over this communications program. The rest of the developers promise that the corresponding functionality will be provided in the next version of their security software.

But if Skype promises to open its protocols to DLP developers, then other solutions, such as Microsoft Collaboration Tools for organizing working togetherremain closed to third-party programmers. How to control the transmission of information through this channel? Meanwhile in modern world the practice is developing when specialists remotely unite into teams to work on a common project and break up after its completion.

The main sources of confidential information leaks in the first half of 2010 are still commercial (73.8%) and government (16%) organizations. About 8% of leaks originate from educational institutions. The nature of the leaking confidential information is personal data (almost 90% of all information leaks).

The leaders in leaks in the world are traditionally the United States and the United Kingdom (Canada, Russia and Germany are also among the five countries with the highest number of leaks, with significantly lower rates), which is due to the peculiarity of the legislation of these countries, which prescribes reporting all incidents of confidential data leakage. Infowatch analysts predict a decrease in the share of accidental leaks and an increase in the share of intentional leaks next year.

Difficulties of implementation

In addition to the obvious difficulties, DLP implementation is hampered by the complexity of choosing an appropriate solution, since different DLP system vendors have their own approaches to organizing protection. Some have patented algorithms for analyzing content by keywords, and someone offers a method of digital prints. How to choose the optimal product under these conditions? What is more efficient? It is very difficult to answer these questions, since today there are very few implementations of DLP systems, and there are even fewer real practices of their use (on which one could rely). But those projects that were nevertheless implemented showed that more than half of the volume of work and the budget in them is consulting, and this usually causes great skepticism among the management. In addition, as a rule, according to DLP requirements, it is necessary to rebuild the existing business processes of the enterprise, and this is difficult for companies.

How does DLP implementation help you comply with current regulatory requirements? In the West, the implementation of DLP systems is motivated by laws, standards, industry requirements and other regulations. According to experts, clear legal requirements available abroad, methodological guidelines for meeting requirements are the real engine of the DLP market, since the introduction of special solutions excludes claims from regulators. Our situation in this area is completely different, and the introduction of DLP systems does not help to comply with the legislation.

The need to protect the commercial secrets of companies and comply with the requirements of the federal law "On Commercial Secrets" may become a kind of incentive for the introduction and use of DLP in a corporate environment.

Almost every enterprise has adopted such documents as the "Regulation on trade secrets" and "List of information constituting a trade secret", and their requirements should be followed. There is an opinion that the law "On commercial secrets" (98-FZ) does not work, nevertheless, company managers are well aware that it is important and necessary for them to protect their trade secrets. Moreover, this awareness is much higher than the understanding of the importance of the Law "On Personal Data" (152-FZ), and it is much easier for any manager to explain the need to introduce confidential workflow than to talk about the protection of personal data.

What prevents the use of DLP in the process of automating the protection of trade secrets? According to the Civil Code of the Russian Federation, in order to introduce a regime for protecting commercial secrets, it is only necessary that the information has some value and be included in the corresponding list. In this case, the owner of such information is legally obliged to take measures to protect confidential information.

At the same time, it is obvious that DLP will not be able to resolve all issues. In particular, block access to confidential information to third parties. But there are other technologies for this. Many modern DLP solutions are able to integrate with them. Then, when building this technological chain, a working system for protecting trade secrets can be obtained. Such a system will be more understandable for business, and it is the business that will be able to order the leakage protection system.

Russia and the West

According to analysts, Russia has a different attitude towards security and a different level of maturity of the companies that supply DLP solutions. The Russian market focuses on security professionals and highly specialized issues. The people involved in data loss prevention do not always understand what data is valuable. In Russia, there is a "militaristic" approach to the organization of security systems: a strong perimeter with firewalls and every effort is made to prevent penetration inside.

But what if an employee of the company has access to the amount of information that is not required to perform his duties? On the other hand, if you look at what approach has been formed in the West over the past 10-15 years, we can say that more attention is paid to the value of information. Resources are directed to where the valuable information is, not to all information in a row. This is perhaps the biggest cultural difference between the West and Russia. However, analysts say the situation is changing. Information begins to be perceived as a business asset, and evolution will take some time.

There is no comprehensive solution

No manufacturer has yet developed 100% leakage protection. Some experts formulate problems with the use of DLP products something like this: efficient use DLP leakage experience requires an understanding that a lot of the work to ensure leakage protection must be done on the customer's side, since no one knows their own information flows better than they do.

Others believe that it is impossible to protect against leaks: it is impossible to prevent information leaks. Since the information is valuable to someone, it will be received sooner or later. Software can make obtaining this information more costly and time-consuming. This can significantly reduce the benefits of having information, its relevance. This means that the efficiency of DLP systems should be monitored.

»

28.01.2014 Sergey Korablev

Choosing any enterprise-grade product is not a trivial task for technicians and decision-makers. Choosing a Data Leak Protection (DLP) system is even trickier. The lack of a single conceptual system, regular independent comparative studies and the complexity of the products themselves force consumers to order pilot projects from manufacturers and independently conduct numerous tests, determining the range of their own needs and correlating them with the capabilities of the systems being tested

This approach is certainly correct. A balanced, and in some cases even hard-won decision, simplifies further implementation and avoids frustration when using a particular product. However, the decision-making process in this case can be delayed, if not for years, then for many months. In addition, the constant expansion of the market, the emergence of new solutions and manufacturers further complicate the task of not only choosing a product for implementation, but also creating a preliminary shortlist of suitable DLP systems. In such circumstances, current reviews of DLP systems are of undoubted practical value for technical specialists. Should you include a specific solution on the test list, or would it be too complex to implement in a small organization? Can the solution be scaled to a company of 10 thousand employees? Can a DLP system manage business-critical CAD files? An open comparison will not replace rigorous testing, but it will help answer basic questions that arise at the initial stage of DLP selection.

Participants

The most popular (according to the Anti-Malware.ru analytical center as of mid-2013) in the Russian information security market DLP systems from InfoWatch, McAfee, Symantec, Websense, Zecurion and Jet Infosystems were selected as participants.

For the analysis, we used the commercially available versions of DLP systems at the time of the review, as well as documentation and open product reviews.

The criteria for comparing DLP systems were selected based on the needs of companies of various sizes and different industries. The main task of DLP systems is to prevent leakage of confidential information through various channels.

Examples of products from these companies are shown in Figures 1-6.


Figure 3. Symantec product

Picture 4. Product of InfoWatch company

Figure 5. Websense product

Figure 6. McAfee product

Operating modes

There are two main modes of DLP systems operation - active and passive. Active - usually the main mode of operation, in which actions are blocked that violate security policies, for example, sending confidential information to an external mailbox... The passive mode is most often used during the system configuration phase to check and adjust settings when the rate of false positives is high. In this case, policy violations are recorded, but restrictions on the movement of information are not imposed (Table 1).


In this aspect, all the considered systems turned out to be equivalent. Each of the DLPs can work in both active and passive modes, which gives the customer a certain freedom. Not all companies are ready to start operating DLP immediately in blocking mode - this is fraught with disruption of business processes, discontent from employees of controlled departments and claims (including justified ones) from management.

Technology

Detection technologies make it possible to classify information that is transmitted through electronic channels and to reveal confidential information. Today there are several basic technologies and their varieties, which are similar in essence, but different in implementation. Each of the technologies has both advantages and disadvantages. In addition, different types of technologies are suitable for analyzing information of different classes. Therefore, manufacturers of DLP solutions are trying to integrate the maximum number of technologies into their products (see table 2).

In general, the products provide a large number of technologies that allow, if properly configured, to ensure a high percentage of confidential information recognition. DLP McAfee, Symantec and Websense are rather poorly adapted for the Russian market and cannot offer users support for "language" technologies - morphology, transliteration and masked text analysis.

Controlled channels

Each data transmission channel is a potential leakage channel. Even one open channel can negate all efforts of the information security service that controls information flows. That is why it is so important to block channels that are not used by employees for work, and control the remaining ones using leak prevention systems.

Despite the fact that the best modern DLP systems are able to control a large number of network channels (see Table 3), it is advisable to block unnecessary channels. For example, if an employee works on a computer only with an internal database, it makes sense to disable his Internet access altogether.

Similar conclusions are valid for local leakage channels. True, in this case it can be more difficult to block individual channels, since ports are often used to connect peripherals, I / O devices, etc.

Encryption plays a special role in preventing leaks through local ports, mobile drives and devices. Encryption tools are simple enough to operate, and their use can be transparent to the user. But at the same time, encryption eliminates a whole class of leaks associated with unauthorized access to information and loss of mobile drives.

The situation with the control of local agents is generally worse than with network channels (see Table 4). Only USB devices and local printers are successfully monitored by all products. Also, despite the importance of encryption noted above, this feature is present only in certain products, and the function of forced encryption based on content analysis is present only in Zecurion DLP.

To prevent leaks, it is important not only to recognize confidential data during transmission, but also to limit the dissemination of information in the corporate environment. To do this, manufacturers include tools in DLP systems that can identify and classify information stored on servers and workstations in the network (see Table 5). Data that violates information security policies must be deleted or moved to a secure storage.

To identify confidential information at the nodes of the corporate network, the same technologies are used as for monitoring leaks via electronic channels. The main difference is architectural. If network traffic or file operations are analyzed to prevent leakage, then stored information is examined to detect unauthorized copies of confidential data - the contents of workstations and network servers.

Of the DLP systems under consideration, only InfoWatch and Dozor-Jet ignore the use of means of identifying information storage locations. This is not a critical feature for preventing electronic leakage, but it severely limits the ability of DLP systems to proactively prevent leakage. For example, when a confidential document is located within the corporate network, it is not an information leak. However, if the storage location of this document is not regulated, if the owners of the information and security officers do not know about the location of this document, this can lead to a leak. Unauthorized access to information is possible or the corresponding security rules will not be applied to the document.

Ease of management

Features such as usability and control can be just as important as technical capabilities solutions. After all, a really complex product will be difficult to implement, the project will take more time, effort and, accordingly, finance. An already implemented DLP system requires attention from technical specialists. Without proper maintenance, regular audits and adjustments to the settings, the quality of confidential information recognition will dramatically decrease over time.

The control interface in the native language for the security officer is the first step to simplify the work with the DLP system. It will not only make it easier to understand what this or that setting is responsible for, but also significantly speed up the process of configuring a large number of parameters that need to be configured for correct work systems. The English language can be useful even for Russian-speaking administrators for an unambiguous interpretation of specific technical concepts (see table 6).

Most solutions provide for quite convenient management from a single (for all components) console with a web interface (see Table 7). The only exceptions are Russian InfoWatch (no single console) and Zecurion (no web interface). At the same time, both manufacturers have already announced the appearance of a web console in their future products. The lack of a single console at InfoWatch is due to the different technological basis of the products. The development of our own agency solution was discontinued for several years, and the current EndPoint Security is the successor to a third party product EgoSecure (formerly known as cynapspro) acquired by the company in 2012.

Another point that can be attributed to the disadvantages of the InfoWatch solution is that in order to configure and manage the flagship DLP product InfoWatch TrafficMonitor, knowledge of the special scripting language LUA is required, which complicates the operation of the system. Nevertheless, for most technical specialists, the prospect of raising their own professional level and learning an additional, albeit not very popular language should be perceived positively.

The separation of system administrator roles is necessary to minimize the risks of preventing the emergence of a superuser with unlimited rights and other frauds using DLP.

Logging and reporting

The DLP archive is a database that accumulates and stores events and objects (files, letters, http-requests, etc.) recorded by the system's sensors during its operation. The information collected in the database can be used for various purposes, including for analyzing user actions, for saving copies of critical documents, as a basis for investigating information security incidents. In addition, the database of all events is extremely useful at the stage of implementing a DLP system, since it helps to analyze the behavior of the components of a DLP system (for example, to find out why certain operations are blocked) and to adjust security settings (see Table 8).


In this case, we see a fundamental architectural difference between Russian and Western DLPs. The latter do not keep an archive at all. In this case, the DLP itself becomes simpler for maintenance (there is no need to maintain, store, reserve and study a huge amount of data), but not for operation. After all, the event archive helps to configure the system. The archive helps to understand why the information transfer was blocked, to check if the rule worked correctly, to make the necessary corrections in the system settings. It should also be noted that DLP systems need not only initial setup during implementation, but also regular "tuning" during operation. A system that is not properly maintained, is not brought in by technical specialists, will lose a lot in the quality of information recognition. As a result, both the number of incidents and the number of false positives will increase.

Reporting is an important part of any activity. Information security is no exception. Reports in DLP systems perform several functions at once. First, concise and understandable reports allow managers of information security services to quickly monitor the state of information security without going into details. Second, detailed reports help security officers adjust security policies and system settings. Thirdly, visual reports can always be shown to top managers of the company to demonstrate the results of the DLP system and the information security specialists themselves (see Table 9).

Almost all of the competing solutions reviewed in the review offer both graphical reports that are convenient for top managers and information security managers and tabular reports that are more suitable for technical specialists. Graphical reports are missing only in DLP InfoWatch, for which the rating was downgraded.

Certification

The question of the need for certification for information security tools and DLP in particular is open, and experts often argue on this topic within professional communities. Summarizing the views of the parties, it should be recognized that certification itself does not provide serious competitive advantages. At the same time, there are a number of customers, primarily government organizations, for which the presence of one or another certificate is mandatory.

In addition, the current certification process does not fit well with the software development cycle. As a result, consumers are faced with a choice: buy an outdated, but certified version of the product, or an up-to-date, but not certified. The standard way out in this situation is to purchase a certified product “on the shelf” and use the new product in a real environment (see Table 10).

Comparison results

Let's summarize the impressions of the considered DLP solutions. In general, all participants made a favorable impression and can be used to prevent information leaks. The product differences allow us to specify the area of \u200b\u200btheir application.

DLP-system InfoWatch can be recommended to organizations for which it is fundamentally important to have a FSTEC certificate. However, the last certified version of InfoWatch Traffic Monitor was tested at the end of 2010, and the certificate expires at the end of 2013. Agent solutions based on InfoWatch EndPoint Security (also known as EgoSecure) are more suitable for small businesses and can be used separately from Traffic Monitor. Sharing Traffic Monitor and EndPoint Security can cause scaling problems in large companies.

Products of Western manufacturers (McAfee, Symantec, Websense), according to independent analytical agencies, are much less popular than Russian ones. The reason is the low level of localization. And it's not even about the complexity of the interface or the lack of documentation in Russian. Features of technologies for recognizing confidential information, pre-configured templates and rules are "sharpened" for the use of DLP in Western countries and are aimed at meeting Western regulatory requirements. As a result, the quality of information recognition in Russia turns out to be noticeably worse, and compliance with the requirements of foreign standards is often irrelevant. At the same time, the products themselves are not bad at all, but the specifics of the use of DLP systems in the Russian market will hardly allow them to become more popular than domestic developments in the foreseeable future.

Zecurion DLP is distinguished by good scalability (the only Russian DLP system with a confirmed implementation for more than 10 thousand workplaces) and high technological maturity. Surprisingly, however, is the lack of a web console, which would help simplify the management of an enterprise solution targeting different market segments. Zecurion DLP's strengths include high-quality confidential information recognition and a complete line of leak prevention products, including gateway, workstation and server protection, storage location identification, and data encryption tools.

The Dozor-Jet DLP system, one of the pioneers of the domestic DLP market, is widespread among Russian companies and continues to grow its client base due to the extensive connections of Jet Infosystems, a system integrator, and a DLP developer. Although technologically DLP lags behind its more powerful counterparts, its use can be justified in many companies. In addition, unlike foreign solutions, "Dozor Jet" allows keeping an archive of all events and files.


The rapid development of information technology contributes to the global informatization of modern companies and enterprises. The amount of information transmitted through the corporate networks of large corporations and small companies is growing rapidly every day. There is no doubt that with the growth of information flows, the threats that can lead to the loss of important information, its distortion or theft. It turns out that losing information is much easier than losing any material thing. For this, it is not necessary for someone to take special actions to master the data - sometimes it is enough sloppy behavior when working with information systems or inexperience of users.

A natural question arises, how to protect yourself in order to exclude the factors of loss and leakage of important information for yourself. It turns out that it is quite possible to solve this problem and it can be done at a high professional level. For this purpose, special DLP systems.

Definition of DLP systems

DLP is a system for preventing data leaks in the information environment. It is a special tool with which system administrators corporate networks can monitor and block attempts to unauthorized transmission of information. In addition to the fact that such a system can prevent the facts of illegal acquisition of information, it also allows you to track the actions of all network users that are associated with the use of social networks, chatting, by sending e-mail messages, etc. The main goal of the DLP confidential information leak prevention systems is to support and fulfill all the requirements of the privacy policy and information security that exist in a particular organization, company, enterprise.

Application area

Practical application of DLP systems is most relevant for those organizations where leakage of confidential data can entail huge financial losses, significant damage to reputation, as well as loss of customer base and personal information. The presence of such systems is mandatory for those companies and organizations that set high requirements for the "information hygiene" of their employees.

The best tool to protect data such as numbers bank cards customers, their bank accounts, information on the conditions of tenders, orders for the performance of works and services will become DLP systems - the economic efficiency of such a security solution is quite obvious.

Types of DLP systems

The tools used to prevent information leaks can be divided into several key categories:

  1. standard security tools;
  2. intelligent data protection measures;
  3. data encryption and access control;
  4. specialized DLP security systems.

The standard security suite that should be used by every company is antivirus programs built in firewalls, unauthorized intrusion detection systems.

Intelligent information security tools provide for the use special services and modern algorithms that will allow you to calculate illegal access to data, incorrect use of e-mail, etc. In addition, such modern security tools allow you to analyze requests to the information system coming from outside from various programs and services that can play the role of a kind of spyware. Intelligent security tools allow for a deeper and more detailed check of the information system for possible information leakage in various ways.

Encrypting sensitive information and using restriction on access to certain data is another effective step towards minimizing the likelihood of losing confidential information.

The specialized DLP information leakage prevention system is a complex multifunctional tool that is able to identify and prevent the facts of unauthorized copying and transmission of important information outside the corporate environment. These solutions will make it possible to reveal the facts of access to information without permission or using the powers of those persons who have such permission.

Specialized systems use for their work such tools as:

  • mechanisms for determining the exact match of data;
  • various statistical methods of analysis;
  • use of techniques of code phrases and words;
  • structured fingerprinting, etc .;

Comparison of these systems by functionality

Let's consider a comparison of DLP systems Network DLP and Endpoint DLP.

Network DLP is a special solution at the hardware or software level that is used at those points in the network structure that are located near the "information environment perimeter". With the help of this set of tools, a thorough analysis of confidential information is carried out, which is tried to be sent outside the corporate information environment in violation of established information security rules.

Endpoint DLP are special systems that are used on the end user's workstation, as well as on the server systems of small organizations. The information endpoint for these systems can be used to control both the internal and external side of the "information environment" perimeter. The system allows you to analyze information traffic through which data is exchanged both between individual users and groups of users. Protection of DLP systems of this type is focused on comprehensive check data exchange process, including emails, communication in social networks and other informational activity.

Should these systems be implemented in enterprises?

Implementation of DLP systems is mandatory for all companies that value their information and try to do everything possible to prevent cases of its leakage and loss. The availability of such innovative security tools will allow companies to prevent the spread of sensitive data outside the corporate information environment across all available channels data exchange. By installing a DLP system, the company will be able to control:

  • sending messages using corporate Web-mail;
  • using FTP connections;
  • local connections using such technologies wirelesslike WiFi, Bluetooth, GPRS;
  • instant messaging using clients such as MSN, ICQ, AOL, etc .;
  • the use of external drives - USB, SSD, CD / DVD, etc.
  • documentation that is sent to print using corporate printing devices.

Unlike standard security solutions, a company that has a Securetower DLP system or similar installed will be able to:

  • control all types of channels for the exchange of important information;
  • detect the transfer of confidential information, regardless of how and in what format it is transferred outside the corporate network;
  • block information leakage at any time;
  • automate the process of data processing in accordance with the security policy adopted at the enterprise.

The use of DLP systems will guarantee enterprises effective development and preservation of their trade secrets from competitors and ill-wishers.

How is implementation going?

To install a DLP system in your company in 2017, you must go through several stages, after the implementation of which the company will receive effective protection of its information environment from external and internal threats.

At the first stage of implementation, a survey of the information environment of the enterprise is carried out, which includes the following actions:

  • study of organizational and administrative documentation that regulates the information policy at the enterprise;
  • study information resourcesthat are used by the company and its employees;
  • agreeing on the list of information that may be classified as restricted data;
  • survey of existing methods and channels for transmitting and receiving data.

Based on the results of the survey, a technical task is drawn up, which will describe the security policies that will need to be implemented using the DLP system.

The next step is to regulate the legal side of using DLP systems in the enterprise. It is important to exclude all the subtle points so that later there are no lawsuits from the employees in terms of the fact that the company is following them.

Having settled all the legal formalities, you can start choosing an information security product - this can be, for example, the Infowatch DLP system or any other with similar functionality.

After choosing a suitable system, you can start installing and configuring it for productive work... The system should be configured in such a way as to ensure the fulfillment of all security tasks stipulated in the terms of reference.

Conclusion

Implementation of DLP systems is a rather complicated and painstaking task that requires a lot of time and resources. But do not stop halfway - it is important to go through all the stages to the fullest and get a highly effective and multifunctional system for protecting your confidential information. After all, the loss of data can result in enormous damage to an enterprise or company, both financially and in terms of its image and reputation in the consumer environment.

D LP-system is used when it is necessary to protect confidential data from internal threats. And if information security specialists have sufficiently mastered and apply the tools of protection against external intruders, then the situation is not so smooth with internal ones.

The use of a DLP system in the information security structure assumes that the information security specialist understands:

  • how company employees can organize confidential data leakage;
  • what information should be protected from the threat of a breach of confidentiality.

Comprehensive knowledge will help a specialist to better understand the principles of DLP technology and configure protection against leaks in the correct way.

The DLP system must be able to distinguish confidential information from non-confidential information. If you analyze all the data within the organization's information system, there is a problem of excessive load on IT resources and personnel. DLP works mainly in conjunction with a responsible specialist who not only teaches the system to work correctly, introduces new and removes irrelevant rules, but also monitors current, blocked or suspicious events in the information system.

To configure SearchInform KIB, use- rules for responding to information security incidents. The system has 250 pre-installed policies that can be adjusted to suit the company's objectives.

The functionality of a DLP system is built around a "core" - a software algorithm that is responsible for detecting and categorizing information that needs to be protected from leaks. At the core of most DLP solutions are two technologies: linguistic analysis and technology based on statistical methods. Also, the kernel can use less common techniques such as labeling or formal analysis methods.

Leak mitigation systems developers complement the unique software algorithm with system agents, incident management mechanisms, parsers, protocol analyzers, interceptors and other tools.

Early DLP systems relied on one method at the core: either linguistic or statistical analysis. In practice, the disadvantages of the two technologies were compensated for strengths each other, and the evolution of DLP has led to the creation of systems that are universal in terms of the "core".

Linguistic analysis method works directly with the content of the file and document. This allows you to ignore such parameters as the file name, the presence or absence of the stamp in the document, who created the document and when. Linguistic analytics technology includes:

  • morphological analysis - search for all possible word forms of information that needs to be protected from leakage;
  • semantic analysis - search for occurrences of important (key) information in the content of the file, the impact of occurrences on the quality characteristics of the file, assessment of the context of use.

Linguistic analysis shows the high quality of work with a large amount of information. For voluminous text, a DLP system with a linguistic analysis algorithm will more accurately select the correct class, assign it to the desired category, and launch the configured rule. For small documents, it is better to use the stop word technique, which has proven effective in the fight against spam.

Learnability in systems with a linguistic analysis algorithm is implemented on high level... Early DLP complexes had difficulties with assigning categories and other stages of "learning", however, modern systems have well-established self-learning algorithms: identifying the signs of categories, the ability to independently form and change reaction rules. For setting in information systemswith such software systems for data protection, it is no longer necessary to involve linguists.

The disadvantages of linguistic analysis are attributed to the binding to a specific language, when it is impossible to use a DLP system with an "English" core to analyze Russian-language information flows and vice versa. Another drawback is associated with the complexity of a clear categorization using a probabilistic approach, which keeps the response accuracy within 95%, while the leakage of any amount of confidential information can be critical for a company.

Statistical analysis methodson the contrary, they demonstrate an accuracy close to 100 percent. The lack of a statistical kernel is associated with the analysis algorithm itself.

At the first stage, the document (text) is divided into fragments of an acceptable size (not character by character, but enough to ensure the accuracy of the response). The hash is removed from the fragments (in DLP systems it is found as the term Digital Fingerprint - "digital fingerprint"). The hash is then compared with the hash of the reference fragment taken from the document. If there is a match, the system marks the document as confidential and acts in accordance with security policies.

The disadvantage of the statistical method is that the algorithm is not able to learn on its own, form categories and type. As a result, there is a dependence on the competence of a specialist and the probability of setting a hash of such a size that the analysis will give an excessive number of false positives. It is not difficult to eliminate the flaw if you follow the developer's recommendations for configuring the system.

There is another drawback associated with the formation of hashes. In developed IT systems that generate large amounts of data, the database of fingerprints can reach such a size that checking traffic for matches with a reference will seriously slow down the entire information system.

The advantage of the solutions is that the performance of the statistical analysis is independent of the language and the presence of non-textual information in the document. Hash is equally well removed from an English phrase, from an image, and from a video clip.

Linguistic and statistical methods are not suitable for detecting data of a certain format for any document, for example, account numbers or passport numbers. To identify such typical structures in the array of information, technologies for analyzing formal structures are introduced into the core of a DLP system.

A quality DLP solution uses all analysis tools that work consistently, complementing each other.

It is possible to determine what technologies are present in the kernel.

Just as important as the functionality of the kernel are the control levels at which the DLP system operates. There are two of them:

Developers of modern DLP products have abandoned the separate implementation of layer protection, since both end devices and the network must be protected from leakage.

Network control layer at the same time, it should provide the maximum possible coverage of network protocols and services. We are talking not only about "traditional" channels (, FTP,), but also about newer network exchange systems (Instant Messengers,). Unfortunately, encrypted traffic cannot be controlled at the network level, but this problem in DLP systems it is solved at the host level.

Host-level controlallows you to solve more monitoring and analysis tasks. In fact, the information security service receives a tool for complete control over user actions on the workstation. DLP with a host architecture allows you to keep track of what, what documents, what is typed on the keyboard, recording audio materials, and doing. At the level of the end workstation, encrypted traffic () is intercepted, and the data that are being processed at the moment and which are stored on the user's PC for a long time are open for verification.

Besides the solution routine tasks, DLP systems with control at the host level provide additional measures to ensure information security: control of software installation and changes, blocking of I / O ports, etc.

The disadvantages of the host implementation are that systems with an extensive set of functions are more difficult to administer, they are more demanding on the resources of the workstation itself. The management server regularly contacts the "agent" module on the end device to check the availability and up-to-date settings. In addition, some of the resources of the user workstation will inevitably be "eaten up" by the DLP module. Therefore, even at the stage of selecting a solution to prevent leakage, it is important to pay attention to the hardware requirements.

The principle of technology separation in DLP systems is a thing of the past. Modern software solutions to prevent leaks, methods are used that compensate for each other's shortcomings. An integrated approach makes sensitive data within the information security perimeter more resilient to threats.

The choice of a specific DLP system depends on the required level of data security and is always selected individually. For help in choosing a DLP system and calculating the cost of its implementation in the company's IT infrastructure, leave a request and we will contact you as soon as possible.

What is a DLP system

DLP system (Data Leak Prevention in translation from English - means of preventing data leakage) are technologies and technical devicesthat prevent leakage of confidential information from information systems.

DLP systems analyze data streams and control their movement within a certain perimeter of the information system, which is protected. These can be ftp-connections, corporate and web-mail, local connections, as well as sending instant messages and data to a printer. In the case of converting confidential information in the stream, a system component is activated, which blocks the transmission of the data stream.

In other words, DLP systems stand guard over confidential and strategically important documents, the leakage of which from information systems to the outside can bring irreparable damage to the company, as well as violate Federal laws No. 98-FZ "On commercial secrets" and No. 152-FZ "On personal data". Information protection against leakage is also mentioned in GOST. " Information technology... Practical rules for information security management "- GOST R ISO / IEC 17799-2005.

As a rule, leakage of confidential information can be carried out both after hacking and penetration, and as a result of carelessness, negligence of the company's employees, as well as the efforts of insiders - the deliberate transfer of confidential information by employees of the company. Therefore, DLP systems are the most reliable technologies for protection against leakage of confidential information - they detect protected information by content, regardless of the document language, signature, transmission channels and format.

Also, DLP system controls absolutely all channels that are used on a daily basis to transmit information in electronic form. Information streams are automatically processed based on the established security policy. If, however, the actions of confidential information conflict with the security policy established by the company, then the transfer of data is blocked. At the same time, the company's authorized representative responsible for information security receives an instant message warning of an attempt to transfer confidential information.

DLP system implementation, first of all, ensures compliance with a number of PCI DSS requirements regarding the level of information security of an enterprise. Also, DLP systems carry out automatic audit of protected information, according to its location and provide automated control, in accordance with the rules for the movement of confidential information in the company, processing and preventing incidents of illegal disclosure of classified information. The data loss prevention system, based on incident reports, monitors the overall level of risks, and also, in the modes of retrospective analysis and immediate response, controls information leakage.

DLP systems are installed in both small and large enterprises, preventing information leakage, thereby protecting the company from financial and legal risks that arise from the loss or transmission of important corporate or confidential information.