How the storage system works. Data storage systems (DSS). Comparison of storage connection protocols

As you know, there has been an intensive increase in the amount of accumulated information and data lately. Research conducted by IDC Digital Universe has shown that the global volume of digital information can increase from 4.4 zettebytes to 44 zettebytes by 2020. According to experts, the volume of digital information doubles every two years. Therefore, today the problem of not only information processing, but also its storage is extremely urgent.

To address this issue, there is currently a very active development of such a direction as the development of storage systems (networks / storage systems). Let's try to figure out what exactly the modern IT industry means by the concept of "data storage system".

Storage is a software and hardware integrated solution aimed at organizing reliable and high-quality storage of various information resources, as well as providing uninterrupted access to these resources.

The creation of such a complex should help in solving a variety of problems facing modern business in the course of building an integral information system.

The main components of the storage system:

Storage devices (tape library, internal or external disk array);

Monitoring and control system;

Subsystem of data backup / archiving;

Storage management software;

Infrastructure for accessing all storage devices.

Main tasks

Let's consider the most typical tasks:

Decentralization of information. Some organizations have a developed branch structure. Each separate unit of such an organization should have free access to all the information it needs to work. Modern storage systems interact with users who are located at a great distance from the center where data processing is performed, therefore they are able to solve this problem.

Failure to foresee the final required resources. When planning a project, it can be extremely difficult to determine exactly what amount of information you will have to work with during system operation. In addition, the mass of accumulated data is constantly increasing. Most modern storage systems have support for scalability (the ability to increase their performance after adding resources), so the capacity of the system can be increased in proportion to the increase in loads (upgrade).

Security of all stored information. It can be quite difficult to control and restrict access to information resources of an enterprise. Unskilled actions of service personnel and users, deliberate attempts to sabotage - all this can cause significant harm to stored data. Modern storage systems use various fault tolerance schemes to resist both deliberate sabotage and inept actions of unskilled employees, thereby preserving the system's operability.

The complexity of managing distributed information flows - any action aimed at changing distributed information data in one of the branches inevitably creates a number of problems - from the complexity of synchronizing different databases and versions of developer files to unnecessary duplication of information. The management software products that ship with your storage system can help you optimize the complexity and efficiency of your stored information.

High costs. According to a study conducted by IDC Perspectives, storage costs account for about twenty-three percent of all IT costs. These costs include the cost of software and hardware parts of the complex, payments to service personnel, etc. Using storage systems allows you to save on system administration, and also provides a decrease in personnel costs.


The main types of storage systems

All data storage systems are divided into 2 types: tape and disk storage systems. Each of the two above-mentioned species is divided, in turn, into several subspecies.

Disk storage systems

Such data storage systems are used to create backup intermediate copies, as well as operational work with various data.

Disk storage systems are divided into the following subspecies:

Backup devices (various disk libraries);

Work data devices (high performance equipment);

Devices used for long-term storage of archives.


Tape storage

Used to create archives as well as backups.

Tape storage systems are divided into the following subspecies:

Tape libraries (two or more drives, many tape slots);

Autoloaders (1 drive, multiple tape slots);

Separate drives.

Main connection interfaces

Above, we examined the main types of systems, and now let's take a closer look at the structure of the storage systems themselves. Modern storage systems are categorized according to the type of host interface they use. Consider below the 2 most common external connection interfaces - SCSI and FibreChannel. The SCSI interface resembles the widely used IDE and is a parallel interface that can accommodate sixteen devices on one bus (for IDE, as you know, two devices per channel). The maximum speed of the SCSI protocol today is 320 megabytes per second (a version that will provide speeds of 640 megabytes per second is currently in development). The disadvantages of SCSI are as follows - inconvenient, not having noise immunity, too thick cables, the maximum length of which does not exceed twenty-five meters. The SCSI protocol itself also imposes certain restrictions - as a rule, this is 1 initiator on the bus plus slave devices (tape drives, disks, etc.).

FibreChannel is less commonly used than SCSI because the hardware used for this interface is more expensive. In addition, FibreChannel is used to deploy large SAN storage networks, so it is used only in large companies. Distances can be practically any - from standard three hundred meters for standard equipment to two thousand kilometers for powerful switches ("directors"). The main advantage of the FibreChannel interface is the ability to combine multiple storage devices and hosts (servers) into a common SAN storage area network. Less important advantages are: greater distances than with SCSI, the possibility of link aggregation and redundancy of access paths, the ability to "hot-plug" equipment, and higher noise immunity. Two-core single and multimode optical cables (with SC or LC connectors) are used, as well as SFP - optical transmitters made on the basis of laser or LED emitters (these components determine the maximum distance between the devices used, as well as the transmission speed).

Storage topology options

Traditionally, storage is used to connect servers to DAS - a data storage system. In addition to DAS, there are also NAS - storage devices that connect to the network, as well as SAN - components of storage networks. SAN and NAS systems were created as an alternative to the DAS architecture. At the same time, each of the above solutions was developed in response to the ever-increasing requirements for modern data storage systems and was based on the use of technologies available at that time.

The first network-attached storage architectures were developed in the 1990s to address the most tangible shortcomings of DAS systems. Storage networking solutions were designed to address the above objectives: reduce the cost and complexity of data management, reduce LAN traffic, and improve overall performance and data availability. That being said, SAN and NAS architectures address different aspects of one common problem. As a result, 2 network architectures began to exist simultaneously. Each of them has its own functionality and benefits.

DAS


(Direct Attached Storage) Is an architectural solution used in cases when a device used to store digital data is connected via the SAS protocol via an interface directly to a server or to a workstation.


The main advantages of DAS systems: low cost compared to other storage solutions, ease of deployment and administration, high-speed data exchange between the server and the storage system.

The above advantages allowed DAS systems to become extremely popular in the segment of small corporate networks, hosting providers and small offices. But at the same time, DAS systems also have their drawbacks, for example, not optimal utilization of resources, explained by the fact that each DAS system requires a dedicated server connection, in addition, each such system allows connecting no more than two servers to a disk shelf in a certain configuration.

Benefits:

Affordable cost. The storage system is essentially a disk basket installed outside the server, equipped with hard drives.

Providing high-speed exchange between the server and the disk array.


Disadvantages:

Insufficient reliability - in the event of an accident or any problems in the network, the servers cease to be available to a number of users.

High latency due to the fact that all requests are processed by one server.

Poor manageability - The availability of all capacity to a single server reduces the flexibility of data distribution.

Low resource utilization - the amount of data required is difficult to predict: some DAS devices in an organization may experience excess capacity, while others may lack it, since reallocation of capacity is usually too time-consuming or even impossible.

NAS


(Network Attached Storage) Is an integrated free-standing disk system that includes a NAS server with its own specialized operating system and a set of user-friendly functions that provide quick system startup and access to any files. The system is connected to an ordinary computer network, allowing users of this network to solve the problem of lack of free disk space.

NAS is a storage device that connects to the network like a regular network device, providing file access to digital data. Any NAS device is a combination of the storage system and the server to which the system is connected. The simplest NAS device is a network server that provides file shares.

NAS devices consist of a head unit that performs data processing, and also connects a chain of disks into a single network. NAS provides storage over Ethernet. Shared access to files is organized in them using the TCP / IP protocol. Such devices enable file sharing even among clients with systems running different operating systems. Unlike DAS architecture, NAS systems do not have to take servers offline to increase overall capacity; You can add drives to your NAS structure by simply plugging the device into a network.

NAS technology is developing today as an alternative to universal servers that carry a large number of different functions (e-mail, fax server, applications, printing, etc.). NAS devices, unlike universal servers, perform only one function - a file server, trying to do this as quickly, simply and efficiently as possible.

Connecting the NAS to a LAN provides access to digital information for an unlimited number of heterogeneous clients (that is, clients with different operating systems) or other servers. Almost all NAS devices today are used on Ethernet networks based on TCP / IP protocols. Access to NAS devices is carried out using special access protocols. The most common file access protocols are DAFS, NFS, CIFS. Specialized operating systems are installed inside such servers.

A NAS device can look like a simple box with one Ethernet port and a couple of hard drives, or it can be a huge system equipped with several dedicated servers, a huge number of drives, and external Ethernet ports. Sometimes NAS devices are part of a SAN network. In this case, they do not have their own drives, but only provide access to the data that is located on block devices. In this case, the NAS acts as a powerful dedicated server, and the SAN acts as a storage device. In this case, a single DAS topology is formed from SAN and NAS components.

Benefits

Low cost, availability of resources for individual servers, as well as for any computer in the organization.

Versatility (one server is capable of serving Unix, Novell, MS, Mac clients).

Ease of deployment as well as administration.

Simplicity sharing resources.


disadvantages

Accessing information using network file system protocols is often slower than accessing a local disk.

Most affordable NAS servers fail to provide the flexible, high-speed access that modern SAN systems provide (block, not file).

SAN


(Storage Area Network) - This architectural solution allows you to connect external storage devices (tape libraries, disk arrays, optical drives, etc.) to servers. With this connection, external devices are recognized by the operating system as local. Using a SAN network allows you to reduce the total cost of maintaining a storage system and allows modern organizations to organize reliable storage of their information.

The simplest SAN option is storage systems, servers and switches united by optical communication channels. In addition to disk storage systems, disk libraries, tape drives (tape libraries), devices used to store information on optical disks, etc. can be connected to the SAN.

Benefits

Reliability of access to the data that is located on external systems.

The independence of the SAN topology from the servers and storage systems used.

Centralized data storage security and reliability.

Convenience of centralized data and switching management.

Ability to move I / O traffic to a separate network to offload LAN.

Low latency and high performance.

SAN logical structure flexibility and scalability.

The actual unlimited geographic size of the SAN.

The ability to quickly distribute resources between servers.

The simplicity of the backup scheme, provided that all data is located in one place.

Ability to create failover clustering solutions based on an existing SAN at no additional cost.

Availability of additional services and capabilities, such as remote replication, snapshots, etc.

High security SAN /


The only drawback of such solutions is their high cost. In general, the domestic market for data storage systems lags behind the market of developed Western countries, which is characterized by widespread use of storage systems. The high cost and shortage of high-speed communication channels are the main reasons hindering the development of the Russian storage market.

RAID

Speaking about data storage systems, it is imperative to consider one of the main technologies that underlie the operation of such systems and are ubiquitous in the modern IT industry. We mean RAID arrays.

A RAID array consists of several disks that are controlled by a controller and are interconnected through high-speed data transmission channels. The external system perceives such disks (storage devices) as a single whole. The type of array used has a direct impact on the degree of performance and fault tolerance. RAID arrays are used to increase the reliability of data storage, as well as to increase the read / write speed.

There are several RAID levels used when building storage area networks. The most commonly used levels are:

1. This is a disk array of increased performance, without fault tolerance, with striping.
The information is split into separate data blocks. It is recorded simultaneously on two or several discs.

Pros:

The amount of memory is summed up.

A significant increase in performance (the number of disks directly affects the multiplicity of performance increases)


Minuses:

The reliability of RAID 0 is lower than the reliability of even the most unreliable disk, because if any of the disks fail, the entire array becomes unusable.


2. - disk mirror array. This array consists of a pair of disks that completely copy each other.

Pros:

Providing an acceptable write speed when parallelizing requests, as well as gain in read speed.

Ensuring high reliability - a disk array of this type functions as long as at least 1 disk is working in it. The probability of breaking 2 disks simultaneously, equal to the product of the probabilities of breaking each of them, is much lower than the probability of breaking one disk. In the event of a single disk failure, in practice, action must be taken immediately to restore redundancy. For this, it is recommended to use hot spares with RAID of any level (except for zero).


Minuses:

The only drawback of RAID 1 is that the user gets one hard drive for the price of two drives.



3.. This is a RAID 0 array built from RAID 1 arrays.

4. RAID 2... Used for arrays using Hamming code.

Arrays of this type are based on the use of Hamming code. Disks are divided into 2 groups: for data and also for codes used for error correction. Data on the disks used for storing information is distributed similarly to the distribution in RAID 0, that is, it is divided into small blocks in accordance with the number of disks. The remaining disks store all error correction codes that help restore information in the event that one of the hard drives fails. The Hamming method used in ECC memory makes it possible to correct single errors on the fly, as well as detect double errors.

RAID 3, RAID 4... These are disk arrays with striping, as well as a dedicated parity disk. In RAID 3, data from n disks is split into components smaller than a sector (into blocks or bytes), and then it is distributed across n-1 disks. Parity blocks are stored on one disk. In a RAID 2 array, n-1 disks were used for this purpose, however, most of the information on the control disks was used to correct errors on the fly, while for most users, in the event of a disk failure, simple information recovery is sufficient (for this, information that fits on one hard disk is enough ).

A RAID 4 array is similar to RAID 3, however, the data is not divided into separate bytes, but into blocks. This partly allowed to solve the problem of insufficiently high data transfer rate, having a small volume. Writing is too slow due to the fact that the write generates parity for the block, writing to a single disc.
Unlike RAID 2, RAID 3 is notable for its inability to correct errors on the fly and also has less redundancy.

Pros:

Cloud providers are also actively purchasing for their storage needs, for example, Facebook and Google build their own servers from ready-made components, but these servers are not counted in the IDC report.

IDC also expects that emerging markets will soon overtake developed markets in terms of storage consumption, as they experience higher rates of economic growth. For example, the region of Eastern and Central Europe, Africa and the Middle East will surpass Japan in terms of storage costs in 2014. By 2015, the Asia-Pacific region, excluding Japan, will surpass Western Europe in terms of storage consumption.

The sale of data storage systems carried out by our company Navigator gives everyone the opportunity to get a reliable and durable basis for storing their multimedia data. A wide selection of Raid arrays, network storages and other systems makes it possible to individually select RAID from the second to the fourth for each order; it is impossible to carry out parallel write operations, due to the fact that a separate control disk is used to store digital parity information. RAID 5 lacks the aforementioned drawback. Checksums and data blocks are written automatically to all disks; there is no asymmetric disk configuration. Checksums mean the result of the XOR operation. XOR makes it possible to replace any operand with the result and, using the XOR algorithm, get the missing operand as a result. To save the XOR result, only one disk is needed (its size is identical to the size of any disk in raid).

Pros:

The popularity of RAID5 is primarily due to its cost effectiveness. Writing to a RAID5 volume takes up additional resources, resulting in performance degradation as additional computation and writes are required. But on the other hand, when reading (in comparison with a separate hard disk), there is a certain gain, which consists in the fact that data streams coming from several disks can be processed in parallel.


Minuses:

RAID 5 has much lower performance, especially when performing random writes (such as Random Write), in which performance is reduced by 10-25 percent of the performance of RAID 10 or RAID 0. This is because this process requires more disk operations (each server write operation on the RAID controller is replaced by 3 operations - 1 read operation and 2 write operations). The disadvantages of RAID 5 appear when one disk fails - at the same time, the entire volume goes into critical mode, all read and write operations are accompanied by additional manipulations, which leads to a sharp drop in performance. In this case, the reliability level drops to the reliability level of RAID 0 equipped with the corresponding number of disks, becoming n times less than the reliability of a single disk. If, before the array is restored, at least one more disk fails or an unrecoverable error occurs on it, the array will be destroyed, and the data on it cannot be restored using conventional methods. Also note that the RAID data redundancy rebuild process, known as RAID Reconstruction, after a disk fails will cause an intense continuous read load from all disks that will persist for hours. As a result, one of the remaining drives may fail. Also, previously undetected failures of reading data in cold data arrays (those data that are not accessed during normal operation of the array - inactive and archived) may also be revealed, which leads to an increased risk of failure during data recovery.



6. is a RAID 50 array built of RAID5 arrays;

7. - striped disk array, which uses 2 checksums, calculated in 2 independent ways.

RAID 6 is in many ways similar to RAID 5, but it differs from it in a higher degree of reliability: it allocates the capacity of two disks for checksums, two sums are calculated using different algorithms. Higher power RAID controller required. Helps protect against multiple failures by ensuring uptime after two drives fail simultaneously. Arranging requires a minimum of four drives. Using RAID-6 typically results in about 10-15 percent degradation in disk group performance. This is due to the large amount of information that the controller has to process (it becomes necessary to calculate the second checksum, as well as read and write more disk blocks during the write process of each of the blocks).

8. is a RAID 0 array that is built from RAID6 arrays.

9. Hybrid RAID... This is another level of the RAID array that has become quite popular lately. These are the usual RAID levels used in conjunction with optional software, and SSDs used as read cache. This leads to an increase in system performance due to the fact that SSDs, in comparison with HDDs, have much better speed characteristics. Today there are several implementations, for example, Crucial Adrenaline, as well as several budget Adaptec controllers. Hybrid RAID is currently not recommended due to the small resource of SSDs.


Hybrid RAID reads from the faster solid state drive, and writes to both solid state drives and hard drives (for redundancy purposes).
Hybrid RAID is great for lower-tier data applications (virtual machine, file server, or Internet gateway).

Features of the modern storage market

In the summer of 2013, the analytical company IDC released its next forecast for the storage market, calculated by it until 2017. Analysts' calculations show that in the next four years, global enterprises will purchase storage systems with a total capacity of one hundred and thirty-eight exabytes. The aggregate realizable storage capacity will grow by about thirty percent annually.

However, compared to previous years, when there was a rapid growth in data storage consumption, the rate of this growth will slow down somewhat, as today most companies use cloud solutions, preferring technologies that optimize data storage. Storage space savings are achieved through tools such as virtualization, data compression, data deduplication, and more. All of the above tools provide space savings, allowing companies to avoid spontaneous purchases and resort to purchasing new storage systems only when they are really needed.

Of the 138 exabytes expected to be sold in 2017, 102 exabytes will be external storage and 36 internal storage. In 2012, twenty exabytes of storage for external systems and eight for internal systems was implemented. The financial cost of industrial storage will increase by approximately 4.1 percent annually and by 2017 will amount to about forty two and a half billion dollars.

We have already noted that the global storage market, which has recently experienced a real boom, has gradually started to decline. In 2005, the growth in storage consumption at the industrial level was sixty-five percent, and in 2006 and 2007 - fifty-nine percent each. In subsequent years, the growth in storage consumption decreased even more due to the negative impact of the global economic crisis.

Analysts predict that increased use of cloud storage will lead to less consumption of storage solutions at the enterprise level. Cloud providers are also actively purchasing for their storage needs, for example, Facebook and Google build their own servers from ready-made components, but these servers are not counted in the IDC report.

IDC also expects that emerging markets will soon overtake developed markets in storage consumption, as they experience higher economic growth. For example, the region of Eastern and Central Europe, Africa and the Middle East will surpass Japan in terms of storage costs in 2014. By 2015, Asia-Pacific, excluding Japan, will surpass Western Europe in terms of storage consumption.

Prompt sale of storage systems

The sale of data storage systems carried out by our company Navigator gives everyone the opportunity to get a reliable and durable basis for storing their multimedia data. A wide selection of Raid arrays, network storages and other systems makes it possible to individually select for each customer the complex that suits him best.

Wide technical capability, the literacy and experience of the company's personnel guarantee a quick and comprehensive implementation of the task. At the same time, we are not limited exclusively to the sale of data storage systems, since we also carry out its configuration, start-up and subsequent service and maintenance.

It is information that is the driving force of modern business and is currently considered the most valuable strategic asset of any enterprise. The amount of information grows exponentially with the growth global networks and the development of e-commerce. To achieve success in an information war, it is necessary to have an effective strategy for storing, protecting, sharing and managing the most important digital asset - data - both today and in the near future.

Storage resource management has become one of the most pressing strategic challenges facing IT staff. Due to the development of the Internet and fundamental changes in business processes, information is accumulating at an unprecedented rate. In addition to the urgent problem of ensuring the possibility of a constant increase in the volume of stored information, the problem of ensuring the reliability of data storage and constant access to information is no less acute on the agenda. For many companies, the “24 hours a day, 7 days a week, 365 days a year” data access formula has become the norm.

In the case of a separate PC, a storage system (DSS) can be understood as a separate internal hard drive or disk system. If it comes to corporate storage, then traditionally there are three technologies for organizing data storage: Direct Attached Storage (DAS), Network Attach Storage (NAS) and Storage Area Network (SAN).

Direct Attached Storage (DAS)

DAS technology implies direct (direct) connection of drives to a server or PC. In this case, drives (hard drives, tape drives) can be both internal and external. The simplest case of a DAS system is one disk inside a server or PC. In addition, the organization of an internal RAID array of disks using a RAID controller can also be attributed to a DAS system.

It should be noted that, despite the formal possibility of using the term DAS system in relation to a single disk or to an internal array of disks, a DAS system is usually understood as an external rack or cage with disks, which can be considered as a stand-alone storage system (Fig. 1). In addition to independent power supply, such autonomous DAS systems have a specialized controller (processor) for managing the storage array. For example, a RAID controller with the ability to organize RAID arrays of various levels can act as such a controller.

Figure: 1. An example of a DAS storage system

It should be noted that autonomous DAS systems can have several external I / O channels, which makes it possible to connect several computers to the DAS system at the same time.

SCSI (Small Computer Systems Interface), SATA, PATA and Fiber Channel interfaces can be used as interfaces for connecting drives (internal or external) in DAS technology. If SCSI, SATA and PATA are used primarily for connecting internal storage, then the Fiber Channel interface is used exclusively for connecting external drives and stand-alone storage systems. The advantage of the Fiber Channel interface in this case is that it does not have a hard limit on the length and can be used when the server or PC connected to the DAS system is at a considerable distance from it. SCSI and SATA interfaces can also be used to connect external storage systems (in this case, the SATA interface is called eSATA), however, these interfaces have a strict limitation on the maximum length of the cable connecting the DAS system and the connected server.

The main advantages of DAS systems include their low cost (in comparison with other storage solutions), ease of deployment and administration, and high speed of data exchange between the storage system and the server. Actually, it is thanks to this that they have gained great popularity in the segment of small offices and small corporate networks. At the same time, DAS systems have their drawbacks, which include poor manageability and suboptimal utilization of resources, since each DAS system requires a dedicated server.

Currently, DAS systems occupy a leading position, but the share of sales of these systems is constantly decreasing. DAS systems are gradually being replaced by either universal solutions with the ability to seamlessly migrate from NAS systems, or systems that provide the ability to use them both as DAS and NAS and even SAN systems.

DAS systems should be used when you need to increase the disk space of one server and take it out of the chassis. Also, DAS systems can be recommended for use for workstations that process large amounts of information (for example, for nonlinear video editing stations).

Network Attached Storage (NAS)

NAS systems are network-attached storage systems that are directly attached to a network, just like a network print server, router, or any other network device (Figure 2). In fact, NAS systems are the evolution of file servers: the difference between a traditional file server and a NAS device is about the same as between a hardware network router and a software dedicated server router.

Figure: 2. An example of a NAS storage system

To understand the difference between a traditional file server and a NAS device, let's remember that a traditional file server is a dedicated computer (server) that stores information available to network users. To store information, hard disks installed in the server can be used (as a rule, they are installed in special baskets), or DAS devices can be connected to the server. The file server is administered using the server operating system. This approach to organizing data storage systems is currently the most popular in the segment of small local area networks, but it has one significant drawback. The fact is that a universal server (and even in combination with a server operating system) is by no means a cheap solution. At the same time, most functionality, inherent in the universal server, is simply not used in the file server. The idea is to create an optimized file server with an optimized operating system and balanced configuration. It is this concept that the NAS device embodies. In this sense, NAS devices can be thought of as "thin" file servers, or, as they are otherwise called, filers (filers).

In addition to an optimized OS that is free of all functions not related to file system maintenance and data I / O, NAS systems have a speed-optimized file system. NAS systems are designed in such a way that all their computing power is focused solely on serving and storing files. The operating system itself is located in flash memory and is pre-installed by the manufacturer. Naturally, with the release of a new version of the OS, the user can independently "reflash" the system. Connecting NAS devices to the network and configuring them is a fairly straightforward task and can be done by any power user, let alone a system administrator.

Thus, NAS devices are more efficient and less expensive than traditional file servers. Nowadays, almost all NAS devices are oriented to use in Ethernet networks (Fast Ethernet, Gigabit Ethernet) based on TCP / IP protocols. NAS devices are accessed using special file access protocols. The most common file access protocols are CIFS, NFS, and DAFS.

CIFS(Common Internet File System - Common Internet File System) is a protocol that provides access to files and services on remote computers (including the Internet) and uses a client-server interaction model. The client creates a request to the server to access files, the server fulfills the client's request and returns the result of its work. The CIFS protocol is traditionally used in local area networks with Windows OS to access files. CIFS uses the TCP / IP protocol to transport data. CIFS provides functionality similar to FTP (File Transfer Protocol) but provides clients with improved control over files. It also allows you to share access to files between clients, using blocking and automatic restoration of communication with the server in the event of a network failure.

Protocol NFS (Network File System) is traditionally used on UNIX platforms and is a combination of a distributed file system and a network protocol. NFS also uses a client-server communication model. The NFS protocol provides access to files on a remote host (server) as if they were on the user's computer. NFS uses TCP / IP to transport data. The WebNFS protocol was developed for NFS to work on the Internet.

Protocol DAFS(Direct Access File System) is a standard file access protocol that is based on NFS. This protocol allows applications to transfer data bypassing the operating system and its buffer space directly to transport resources. DAFS provides high file I / O speeds and lowers CPU utilization by dramatically reducing the number of operations and interrupts typically required when processing network protocols.

DAFS has been designed with a cluster and server environment in mind for databases and a variety of end-to-end Internet applications. It provides the lowest latency in file and data access and intelligent recovery mechanisms for system and data health, making it attractive for NAS applications.

Summarizing the above, NAS systems can be recommended for use in multi-platform networks in the case when network access to files is required and the ease of installation of the storage system administration is quite important factors. A great example is the use of a NAS as a file server in a small company office.

Storage Area Network (SAN)

Actually, a SAN is no longer a separate device, but a complex solution, which is a specialized network infrastructure for data storage. SANs are integrated as separate specialized subnets within a local area (LAN) or wide area (WAN) network.

Basically, SANs link one or more servers (SANs) to one or more storage devices. SANs allow any SAN server to access any storage device without overloading other servers or the local area network. In addition, data exchange between storage devices is possible without the participation of servers. In fact, SANs allow very large numbers of users to store information in one place (with fast, centralized access) and share it. As data storage devices can be used RAID-arrays, various libraries (tape, magneto-optical, etc.), as well as JBOD-systems (disk arrays not combined in RAID).

Data storage networks began to develop intensively and are introduced only since 1999.

Just as local area networks can in principle be built on the basis of different technologies and standards, various technologies can also be used to build SANs. But just as the Ethernet (Fast Ethernet, Gigabit Ethernet) standard has become the de facto standard for local area networks, the Fiber Channel (FC) standard dominates in storage networks. Actually, it was the development of the Fiber Channel standard that led to the development of the SAN concept itself. At the same time, it should be noted that the iSCSI standard is becoming more and more popular, on the basis of which it is also possible to build SAN networks.

Along with the speed parameters, one of the most important advantages of Fiber Channel is its long distance capability and topology flexibility. The concept of building a storage network topology is based on the same principles as traditional local area networks based on switches and routers, which greatly simplifies the construction of multi-node system configurations.

It is worth noting that for data transmission in the Fiber Channel standard, both fiber-optic and copper cables are used. When organizing access to geographically remote sites at a distance of up to 10 km, standard equipment and single-mode fiber are used for signal transmission. If the nodes are separated by a greater distance (tens or even hundreds of kilometers), special amplifiers are used.

SAN topology

A typical Fiber Channel SAN is shown in Fig. 3. The infrastructure of such a SAN network consists of Fiber Channel storage devices, SAN servers (servers connected to both a local network via Ethernet and a SAN network via Fiber Channel) and a switching fabric (Fiber Channel Fabric) , which is built on the basis of Fiber Channel switches (hubs) and is optimized for transferring large blocks of data. Access of network users to the storage system is realized through SAN servers. It is important that the traffic inside the SAN network is separated from the IP traffic of the local network, which, of course, allows to reduce the load on the local network.

Figure: 3. Typical SAN network layout

Benefits of SANs

The main advantages of SAN technology include high performance, high data availability, excellent scalability and manageability, the ability to consolidate and virtualize data.

Fiber Channel fabrics with non-blocking architecture allow multiple SAN servers to access storage devices concurrently.

In a SAN architecture, data can be easily moved from one storage device to another to optimize data placement. This is especially important when multiple SAN servers require simultaneous access to the same storage devices. Note that the process of data consolidation is not possible in the case of using other technologies, such as when using DAS devices, that is, storage devices that are directly connected to servers.

Another opportunity provided by the SAN architecture is data virtualization. The idea behind virtualization is to give SAN servers access to resources rather than individual storage devices. That is, servers should not "see" storage devices, but virtual resources. For practical implementation of virtualization, a special virtualization device can be placed between SAN servers and disk devices, to which storage devices are connected on one side, and SAN servers on the other. In addition, many modern FC switches and HBAs provide virtualization capabilities.

The next capability provided by SANs is the implementation of remote data mirroring. The principle of data mirroring is to duplicate information on several media, which increases the reliability of information storage. An example of the simplest case of data mirroring is combining two disks into a RAID 1 array. In this case, the same information is written simultaneously to two disks. The disadvantage of this method is that both drives are located locally (as a rule, drives are in the same cage or rack). SANs overcome this drawback and provide the ability to mirror not just individual storage devices, but the SANs themselves, which can be hundreds of kilometers apart from each other.

Another advantage of SANs is the ease of organizing data backups. Traditional backup technology, which is used on most LANs, requires a dedicated backup server and, most importantly, dedicated network bandwidth. In fact, during a backup operation, the server itself becomes inaccessible to users on the local network. Actually, this is why, as a rule, backups are made at night.

SAN architecture allows a fundamentally different approach to the problem of backup. In this case, the Backup server is a part of the SAN network and connects directly to the switching fabric. In this case, the Backup traffic is isolated from the LAN traffic.

Equipment used to create SAN networks

As noted, a SAN deployment requires storage devices, SAN servers, and switch fabric hardware. Switch factories include both physical layer devices (cables, connectors) and interconnect devices for connecting SAN nodes to each other, translation devices (translation devices) that perform functions of converting Fiber Channel (FC) protocol to other protocols, for example SCSI, FCP, FICON, Ethernet, ATM, or SONET.

Cables

As noted, Fiber Channel allows both fiber and copper cables to connect SAN devices. At the same time, different types of cables can be used in one SAN network. Copper cable is used for short distances (up to 30 m), while fiber optic cables are used for both short and for distances up to 10 km or more. Both multimode and Singlemode fiber-optic cables are used, with multimode being used for distances up to 2 km, and singlemode for long distances.

The coexistence of various types of cables within the same SAN network is ensured by means of special interface converters GBIC (Gigabit Interface Converter) and MIA (Media Interface Adapter).

The Fiber Channel standard provides several possible transfer rates (see table). Note that currently the most common FC devices of the 1, 2 and 4 GFC standards. This provides backward compatibility of higher-speed devices with lower-speed ones, that is, a 4 GFC device automatically supports the connection of devices of the 1 and 2 GFC standards.

Interconnect Device

The Fiber Channel standard allows for a variety of device networking topologies such as Point-to-Point, Arbitrated Loop (FC-AL), and switched fabric.

Point-to-point topology can be used to connect a server to dedicated storage. In this case, the data is not shared with the SAN servers. In fact, this topology is a variant of the DAS system.

At a minimum, a point-to-point topology requires a server with a Fiber Channel adapter and a Fiber Channel storage device.

A split-access ring (FC-AL) topology refers to a device connection scheme in which data is transmitted in a logical closed loop. In an FC-AL ring topology, the connectivity devices can be Fiber Channel hubs or switches. With hubs, the bandwidth is shared among all nodes in the ring, while each port on the switch provides protocol bandwidth for each node.

In fig. Figure 4 shows an example of a shared Fiber Channel ring.

Figure: 4. Example of a Fiber Channel Shared Ring

The configuration is similar to the physical star and logical ring used in Token Ring LANs. In addition, as with Token Ring networks, data travels around the ring in one direction, but unlike Token Ring networks, a device can request the right to transmit data rather than waiting for a blank token from the switch. Shared Fiber Channel rings can address up to 127 ports, however, as practice shows, typical FC-AL rings contain up to 12 nodes, and after 50 nodes are connected, performance is dramatically reduced.

The Fiber Channel switched-fabric topology is implemented using Fiber Channel switches. In this topology, each device has a logical connection to any other device. In fact, Fiber Channel fabric switches perform the same functions as traditional Ethernet switches. Recall that, unlike a hub, a switch is a high-speed device that provides an “one-to-one” connection and handles multiple simultaneous connections. Any node connected to the Fiber Channel switch gets the protocol bandwidth.

In most cases, large SANs are built using a mixed topology. At the lower level, FC-AL rings are used, connected to low-performance switches, which, in turn, are connected to high-speed switches that provide the highest possible bandwidth. Several switches can be connected to each other.

Broadcasting devices

Translators are intermediate devices that convert the Fiber Channel protocol to higher layer protocols. These devices are designed to connect the Fiber Channel network to an external WAN network, to a local network, as well as to connect various devices and servers to the Fiber Channel network. These devices include bridges (Bridges), Fiber Channel adapters (Host Bus Adapters (HBA), routers, gateways and network adapters... The classification of translation devices is shown in Fig. five.

Figure: 5. Classification of broadcasting devices

The most common translation devices are PCI HBAs, which are used to connect servers to a Fiber Channel network. Network adapters allow Ethernet LANs to be connected to Fiber Channel networks. Bridges are used to connect SCSI storage devices to a Fiber Channel network. It should be noted that in recent years, almost all storage devices that are intended for use in the SAN have built-in Fiber Channel and do not require bridging.

Storage devices

Both hard disks and tape drives can be used as storage devices in SAN networks. If we talk about possible configurations of using hard drives as storage devices in SAN networks, then these can be both JBOD arrays and RAID arrays of disks. Traditionally, storage devices for SAN networks come in the form of external racks or baskets equipped with a dedicated RAID controller. Unlike NAS or DAS devices, SAN devices are equipped with a Fiber Channel interface. Moreover, the disks themselves can have both SCSI and SATA interfaces.

In addition to hard disk storage devices, tape drives and libraries are widely used in SANs.

SAN servers

SAN servers differ from conventional application servers in only one detail. In addition to an Ethernet network adapter, they are equipped with an HBA adapter for server interaction with a local network, which allows them to be connected to Fiber Channel-based SAN networks.

Intel storage systems

Next, we'll look at a few specific examples of Intel storage devices. Strictly speaking, Intel does not release complete solutions and is engaged in the development and production of platforms and individual components for building data storage systems. Based on these platforms, many companies (including a number of Russian companies) produce complete solutions and sell them under their own logos.

Intel Entry Storage System SS4000-E

The Intel Entry Storage System SS4000-E is a NAS device designed for use in small to medium-sized offices and multi-platform LANs. With the Intel Entry Storage System SS4000-E, Windows, Linux and Macintosh clients can access shared data. In addition, the Intel Entry Storage System SS4000-E can act as both a DHCP server and a DHCP client.

The Intel Entry Storage System SS4000-E is a compact external rack that supports up to four SATA drives (Figure 6). Thus, the maximum system capacity can be 2TB using 500GB drives.

Figure: 6. Intel Entry Storage System SS4000-E

The Intel Entry Storage System SS4000-E uses a SATA RAID controller with support for RAID levels 1, 5 and 10. Since this system is a NAS device, that is, in fact, a "thin" file server, the storage system must have a specialized processor, memory, and firmware operating system. The processor in the Intel Entry Storage System SS4000-E uses Intel 80219 with a clock frequency of 400 MHz. In addition, the system is equipped with 256 MB DDR memory and 32 MB flash memory for storing the operating system. The operating system is Linux Kernel 2.6.

To connect to a local network, the system provides a two-channel gigabit network controller. In addition, there are also two USB ports.

The Intel Entry Storage System SS4000-E supports CIFS / SMB, NFS, and FTP, and is configured using a web interface.

In the case of using Windows-clients (supported by Windows 2000/2003 / XP), it is additionally possible to implement backup and data recovery.

Intel Storage System SSR212CC

The Intel Storage System SSR212CC is a versatile storage platform for DAS, NAS and SAN storage. This system is housed in a 2 U chassis and is designed to be mounted in a standard 19-inch rack (Figure 7). The Intel Storage System SSR212CC supports up to 12 hot-swappable SATA or SATA II drives for up to 6 TB of storage capacity with 550 GB drives.

Figure: 7. Intel Storage System SSR212CC

In fact, the Intel Storage System SSR212CC is a full-fledged high-performance server running under the operating systems Red Hat Enterprise Linux 4.0, Microsoft Windows Storage Server 2003, Microsoft Windows Server 2003 Enterprise Edition and Microsoft Windows Server 2003 Standard Edition.

The server is based on intel processor Xeon 2.8 GHz (800 MHz FSB, 1 MB L2 cache). The system supports SDRAM DDR2-400 with ECC up to a maximum of 12GB (six DIMM slots are provided for memory modules).

The Intel Storage System SSR212CC has two Intel RAID Controller SRCS28Xs with the ability to create RAID levels 0, 1, 10, 5, and 50. In addition, the Intel Storage System SSR212CC has a dual channel Gigabit LAN controller.

Intel Storage System SSR212MA

The Intel Storage System SSR212MA is a platform for building iSCSI-based IP SAN storage systems.

The system is housed in a 2 U chassis and is designed to be mounted in a standard 19 ”rack. The Intel Storage System SSR212MA supports up to 12 SATA drives (hot-swappable), allowing for up to 6TB of storage capacity with 550GB drives.

The hardware configuration of the Intel Storage System SSR212MA does not differ from the Intel Storage System SSR212CC.

In this article, we will consider what types of storage systems (DSS) exist today, I will also consider one of the main components of storage systems - external connection interfaces (interaction protocols) and drives that store data. We will also make a general comparison of them in terms of the capabilities provided. For examples, we will refer to the range of storage systems provided by DELL.

  • DAS Model Examples
  • Examples of NAS models
  • SAN Model Examples
  • Media types and storage protocol Fiber Channel
  • ISCSI protocol
  • SAS protocol
  • Comparison of storage connection protocols

Existing types of storage systems

In the case of a separate PC, the storage system can be understood as an internal hard disk or a disk system (RAID array). When it comes to data storage systems of different levels of enterprises, then traditionally three technologies for organizing data storage can be distinguished:

  • Direct Attached Storage (DAS);
  • Network Attach Storage (NAS);
  • Storage Area Network (SAN).

DAS (Direct Attached Storage) devices are a solution when a storage device is connected directly to a server or to a workstation, usually via a SAS interface.

NAS (Network Attached Storage) devices are a stand-alone integrated disk system, in fact, a NAS server, with its own specialized OS and a set of useful functions to quickly start the system and provide access to files. The system is connected to a regular computer network (LAN), and is a quick solution to the problem of lack of free disk space available to users of this network.

A Storage Area Network (SAN) is a dedicated network that connects storage devices to application servers, usually based on Fiber Channel or iSCSI.

Now let's take a closer look at each of the above storage types, their positive and negative sides.

DAS (Direct Attached Storage) storage architecture

The main advantages of DAS systems include their low cost (in comparison with other storage solutions), ease of deployment and administration, and high speed of data exchange between the storage system and the server. Actually, it is thanks to this that they have gained great popularity in the segment of small offices, hosting providers and small corporate networks. At the same time, DAS systems have their disadvantages, which include suboptimal resource utilization, since each DAS system requires a dedicated server connection and allows a maximum of 2 servers to be connected to a disk shelf in a certain configuration.

Figure 1: Direct Attached Storage Architecture

  • Quite low cost. In fact, this storage system is a disk basket with hard drives outside the server.
  • Ease of deployment and administration.
  • High speed of exchange between the disk array and the server.
  • Low reliability. If the server to which this storage is connected fails, the data will no longer be available.
  • Low degree of resource consolidation - all capacity is available to one or two servers, which reduces the flexibility of distributing data between servers. As a result, it is necessary to purchase either more internal hard drives or install additional disk shelves for other server systems.
  • Low resource utilization.

DAS Model Examples

Of the interesting models of this type of device, I would like to mention the DELL PowerVault MD series. The initial JBOD models MD1000 and MD1120 allow you to create disk arrays with up to 144 drives. This is achieved due to the modular architecture; up to 6 devices can be connected to the array, three disk shelves for each channel of the RAID controller. For example, if you use a rack of 6 DELL PowerVault MD1120, then we will implement an array with an effective data volume of 43.2 TB. These DAEs are connected with one or two SAS cables to the external ports of the RAID controllers installed in Dell PowerEdge servers and are controlled by the server's management console.

If there is a need to create an architecture with high fault tolerance, for example, to create a failover cluster for MS Exchange, SQL server, then the DELL PowerVault MD3000 model is suitable for these purposes. This system already has active logic inside the disk enclosure and is fully redundant by using two on-board, active-active RAID controllers with a mirrored cache of data buffered in the cache.

Both controllers in parallel process the streams of reading and writing data, and in case of failure of one of them, the second "picks up" the data from the neighboring controller. At the same time, connection to a low-level SAS controller inside 2 servers (cluster) can be made via multiple interfaces (MPIO), which provides redundancy and load balancing in Microsoft environments. To increase disk space, 2 additional MD1000 disk enclosures can be connected to the PowerVault MD3000.

NAS (Network Attached Storage) storage architecture

NAS (Network Attached Storage) technology is evolving as an alternative to universal servers that carry many functions (printing, applications, fax server, e-mail, etc.). In contrast, NAS devices perform only one function - a file server. And they try to do it as best as possible, easier and faster.

NAS connects to a LAN and access data for an unlimited number of heterogeneous clients (clients with different operating systems) or other servers. Nowadays, almost all NAS devices are focused on use in Ethernet networks (Fast Ethernet, Gigabit Ethernet) based on TCP / IP protocols. NAS devices are accessed using special file access protocols. The most common file access protocols are CIFS, NFS, and DAFS. Inside such servers are specialized operating systems such as MS Windows Storage Server.

Figure 2: Network Attached Storage Architecture

  • The cheapness and availability of its resources not only for individual servers, but also for any computers in the organization.
  • Ease of sharing resources.
  • Ease of deployment and administration
  • Versatility for clients (one server can serve MS, Novell, Mac, Unix clients)
  • Accessing information through "network file system" protocols is often slower than accessing a local disk.
  • Most low-cost NAS servers do not provide the fast and flexible method of accessing data at the block level inherent in SAN systems, rather than at the file level.

Examples of NAS models

Currently, classic NAS solutions such as the PowerVault NF100 / 500/600. These are systems based on Dell's massive 1 and 2-way servers, optimized for fast deployment of NAS services. They allow you to create file storage up to 10 TB (PowerVault NF600) using SATA or SAS disks, and connecting this server to a LAN. Higher performance integrated solutions are also available, such as the PowerVault NX1950, which can accommodate 15 drives and expand to 45 with additional MD1000 drive enclosures.

A serious advantage of the NX1950 is the ability to work not only with files, but also with data blocks at the iSCSI protocol level. Also, a variant of the NX1950 can work as a "gateway", allowing you to organize file access to storage systems based on iSCSI (with a block access method), for example, the MD3000i or to the Dell EqualLogic PS5x00.

Storage Area Network (SAN) storage architecture

A Storage Area Network (SAN) is a dedicated network that connects storage devices with application servers, usually based on Fiber Channel or the increasingly popular iSCSI protocol. Unlike NAS, SAN has no concept of files: file operations are performed on SAN-attached servers. A SAN operates in blocks like a large hard drive. The ideal result of SAN operation is the ability of any server under any operating system to access any part of the disk capacity located in the SAN. SAN leaves are application servers and storage systems (disk arrays, tape libraries, etc.). And between them, as in a normal network, there are adapters, switches, bridges, hubs. ISCSI is a friendlier protocol as it is based on standard Ethernet infrastructure - network cards, switches, cables. Moreover, iSCSI-based storage systems are the most popular for virtualized servers due to the simplicity of protocol configuration.

Figure 3: Storage Area Network Architecture

  • High reliability of access to data located on external storage systems. The independence of the SAN topology from the storage systems and servers used.
  • Centralized data storage (reliability, security).
  • Convenient centralized switching and data management.
  • Offload heavy I / O traffic to a separate network, offloading the LAN.
  • High performance and low latency.
  • SAN Logical Scalability and Flexibility
  • The ability to organize backup, remote storage systems and a remote backup and data recovery system.
  • The ability to build fault-tolerant cluster solutions at no additional cost based on the existing SAN.
  • Higher cost
  • Difficulty setting up FC systems
  • The need for certification of specialists in FC networks (iSCSI is a simpler protocol)
  • More stringent requirements for component compatibility and validation.
  • The emergence due to the high cost of DAS-"islands" in networks based on the FC-protocol, when enterprises appear single servers with internal disk space, NAS servers or DAS-systems due to lack of budget.

SAN Model Examples

At the moment, there is a fairly large selection of disk arrays for building a SAN, ranging from models for small and medium-sized enterprises, such as the DELL AX series, which allow you to create storage with a capacity of up to 60 TB, and ending with disk arrays for large corporations DELL / EMC CX4 series. they allow you to create storage capacities up to 950 TB. There is an inexpensive solution based on iSCSI, this is PowerVault MD3000i - the solution allows you to connect up to 16-32 servers, you can install up to 15 disks in one device, and expand the system with two MD1000 shelves, creating an array of 45TB.

The Dell EqualLogic iSCSI system deserves a special mention. It is positioned as enterprise-scale storage and is comparable in price to Dell systems | EMC CX4, modular port architecture supporting both FC protocol and iSCSI protocol. EqualLogic is a peer-to-peer system, which means that each disk enclosure has active RAID controllers. When these arrays are connected into a single system, the performance of the disk pool grows smoothly with the growth of the available storage volume. The system allows you to create arrays over 500TB, is configured in less than an hour, and does not require specialized knowledge of administrators.

The licensing model is also different from the others and already includes in the initial cost all possible options for snapshots, replication and integration tools for various operating systems and applications. This system is considered one of the fastest systems in the tests for MS Exchange (ESRP).

Types of storage media and the protocol for interacting with storage systems

Having decided on the type of storage system that suits you best for solving certain tasks, you need to go to the choice of the protocol for interacting with the storage system and the choice of drives that will be used in the storage system.

At the moment, SATA and SAS drives are used to store data in disk arrays. Which disks to choose in storage depends on specific tasks. Several facts are worth noting.

SATA II drives:

  • Available in single drive sizes up to 1 TB
  • Rotational speed 5400-7200 RPM
  • I / O speeds up to 2.4Gbps
  • MTBF is about half that of SAS drives.
  • Less reliable than SAS drives.
  • About 1.5 times cheaper than SAS drives.
  • Available in single drive sizes up to 450 GB
  • Rotational speed 7200 (NearLine), 10000 and 15000 RPM
  • I / O speed up to 3.0 Gbps
  • MTBF is twice that of SATA II drives.
  • More reliable drives.

Important! Last year, the industrial production of SAS drives with a reduced rotation speed - 7200 rpm (Near-line SAS Drive) began. This allowed to increase the amount of stored data on one disk up to 1 TB and reduce the power consumption of disks with a high-speed interface. Given that the cost of such drives is comparable to the cost sATA drives II, while reliability and I / O speed remained at the level of SAS disks.

Thus, at the moment, it is worth really seriously thinking about the storage protocols that you are going to use in the framework of corporate storage.

Until recently, the main protocols for interacting with storage systems were FibreChannel and SCSI. Now, replacing SCSI, expanding its functionality, came the iSCSI and SAS protocols. Let's take a look at the pros and cons of each of the protocols and the corresponding interfaces for connecting to the storage system below.

Fiber Channel Protocol

In practice, modern Fiber Channel (FC) has speeds of 2 Gbps (Fiber Channel 2 Gb), 4 Gbps (Fiber Channel 4 Gb) full-duplex or 8 Gbps, that is, this speed is provided simultaneously in both directions. At such speeds, the connection distances are practically unlimited - from standard 300 meters using the most "ordinary" equipment to several hundred or even thousands of kilometers using specialized equipment. The main advantage of the FC protocol is the ability to combine many storage devices and hosts (servers) into a single storage area network (SAN). At the same time, there is no problem with the distribution of devices over long distances, the possibility of channel aggregation, the possibility of redundant access paths, "hot plugging" of equipment, high noise immunity. But on the other hand, we have a high cost and high labor intensity of installation and maintenance of disk arrays using FC.

Important! The two terms should be distinguished between Fiber Channel protocol and Fiber Channel. The Fiber Channel protocol can work on different interfaces - both on a fiber-optic connection with different modulations, and on copper connections.

  • Flexible storage scalability;
  • Allows you to create storage systems over significant distances (but less than in the case of the iSCSI protocol; where, in theory, the entire global IP network can act as a carrier.
  • Great redundancy options.
  • High cost of the solution;
  • Even higher cost when organizing an FC network for hundreds or thousands of kilometers
  • High labor intensity in implementation and maintenance.

Important! In addition to the emergence of the FC8 Gb / s protocol, the emergence of the FCoE (Fiber Channel over Ethernet), which will allow using standard IP networks to organize the exchange of FC packets.

ISCSI protocol

ISCSI (SCSI over IP encapsulation) allows users to create IP-based SANs using Ethernet infrastructure and RJ45 ports. Thus, iSCSI can circumvent the limitations of directly attached storage, including the inability to share resources across servers and the inability to expand capacity without shutting down applications. The transfer speed is currently limited to 1 Gb / s (Gigabit Ethernet), but this speed is sufficient for most business applications of the size of medium enterprises and this is confirmed by numerous tests. It is interesting that it is not so much the speed of data transfer on one channel that is important, but the algorithms of the RAID controllers and the ability to aggregate arrays into a single pool, as in the case of DELL EqualLogic, when three 1GB ports are used on each array, and the load is balanced among the arrays one group.

It is important to note that iSCSI SANs provide the same benefits as Fiber Channel SANs, while simplifying network deployment and management and significantly reducing the cost of the storage system.

  • High availability;
  • Scalability;
  • Ease of administration, since Ethernet technology is used;
  • Lower cost of SAN organization on iSCSI protocol than on FC.
  • Easy to integrate into virtualization environments
  • There are certain restrictions on the use of storage systems with the iSCSI protocol with some OLAP and OLTP applications, with Real Time systems and when working with a large number of video streams in HD format
  • High-level storage systems based on iSCSI, as well as storage systems with FC-protocol, require the use of fast, expensive Ethernet switches
  • We recommend using either dedicated Ethernet switches or VLANs to separate data streams. Network design is no less important part of the project than in the development of FC networks.

Important! Soon, manufacturers promise to launch mass production of SANs based on the iSCSI protocol with support for data transfer rates up to 10 Gb / s. The final version of the DCE (Data Center Ethernet) protocol is also being prepared, the mass appearance of devices supporting the DCE protocol is expected by 2011.

In terms of the interfaces used, the iSCSI protocol uses 1Gb / C Ethernet interfaces, and these can be both copper and fiber-optic interfaces when working over long distances.

SAS protocol

The SAS protocol and interface of the same name are designed to replace parallel SCSI and achieve higher throughput than SCSI. Although SAS uses a serial interface as opposed to the parallel interface used by traditional SCSI, SCSI commands are still used to control SAS devices. SAS enables physical connectivity between a dataset and multiple servers over short distances.

  • Acceptable price;
  • Ease of storage consolidation - although SAS-based storage cannot connect to as many hosts (servers) as SAN configurations that use FC or iSCSI protocols, when using the SAS protocol, there is no difficulty with additional equipment to organize shared storage for multiple servers.
  • SAS allows for more bandwidth with 4 lane connections within a single interface. Each channel provides 3 Gb / s, which allows you to achieve a data transfer rate of 12 Gb / s (currently the highest data transfer rate for storage systems).
  • Limited reach - cable length cannot exceed 8 meters. Thus, storages with a SAS connection will be optimal only when servers and arrays are located in the same rack or in the same server room;
  • The number of connected hosts (servers) is usually limited to a few nodes.

Important! In 2009, SAS technology is expected to appear with a data transfer rate over one channel - 6 Gb / s, which will significantly increase the attractiveness of using this protocol.

Comparison of storage connection protocols

Below is the pivot table comparison of the capabilities of various protocols of interaction with storage systems.

Parameter

Storage connection protocols

Architecture SCSI commands are encapsulated in an IP packet and transmitted over Ethernet, serial transmission Serial SCSI commands Switched
Distance between disk array and node (server or switch) Limited only by the distance of the IP networks. No more than 8 meters between devices. 50,000 meters without the use of specialized ripeters
Scalability Millions of devices - using IPv6. 32 devices 256 devices
16 million devices using FC-SW (fabric switches) architecture
Performance 1 Gb / s (development up to 10 Gb / s is planned) 3 Gb / s when using 4 ports, up to 12 Gb / s (in 2009 up to 6 Gb / s on one port) Up to 8 Gb / s
Investment level (implementation costs) Minor - using Ethernet Middle Significant

Thus, the presented solutions, at first glance, are quite clearly divided according to their compliance with customer requirements. However, in practice, everything is not so unambiguous, additional factors are included in the form of budget restrictions, the dynamics of the organization's development (and the dynamics of the increase in the amount of stored information), industry specificity, etc.

This article focuses on entry-level and mid-range storage systems and the trends that are emerging in the industry today. For convenience, we will call data storage systems drives.

First, we will dwell a little on the terminology and technological foundations of autonomous storage, and then we will move on to new products and discussion of modern achievements in various technology and marketing groups. We will also be sure to tell you why you need systems of one type or another and how effective their use is in different situations.

Standalone disk subsystems

In order to better understand the features of autonomous drives, let's dwell a little on one of the simpler technologies for building data storage systems - bus-oriented technology. It provides for the use of a disk enclosure and a PCI RAID controller.

Figure 1. Bus-oriented storage technology

Thus, between the disks and the host PCI bus (from the English. Host - in this case, an autonomous computer, for example, a server or workstation) there is only one controller, which to a large extent determines the speed of the system. Drives built on this principle are the most productive. But due to architectural peculiarities, their practical use, with the exception of rare cases, is limited to single-host configurations.

The disadvantages of the bus-oriented drive architecture include:

  • effective use only in single host configurations;
  • dependence on the operating system and platform;
  • limited scalability;
  • limited opportunities for organizing fault-tolerant systems.

Naturally, none of this matters if the data is needed for one server or workstation. On the contrary, in such a configuration you will get maximum performance for the minimum money. But if you need a storage system for a large data center, or even two servers that need the same data, a bus-oriented architecture is completely inappropriate. The disadvantages of this architecture are avoided by the architecture of the stand-alone disk subsystems. The basic principle of its construction is quite simple. The controller that controls the system is moved from the host computer to the drive enclosure, providing a host-independent operation. It should be noted that such a system can have a large number of external input / output channels, which makes it possible to connect several, or even many, computers to the system.


Figure 2. Standalone storage system

Any intelligent storage system consists of hardware and software code. In an autonomous system there is always memory, which stores the program of algorithms for the operation of the system itself and the processing elements that process this code. Such a system functions regardless of which host systems it is associated with. Thanks to their intelligence, standalone drives often independently implement many of the functions to ensure the safety and management of data. One of the most important basic and almost ubiquitous functions is RAID (Redundant Array of Independent Disks). Another, already belonging to mid- and high-end systems, is virtualization. It provides such features as instant copy or remote backup, as well as other rather sophisticated algorithms.

Briefly about SAS, NAS, SAN

As part of the consideration of autonomous data storage systems, it is imperative to dwell on how host systems access drives. This largely determines the scope of their use and internal architecture.

There are three main options for organizing access to drives:

  • SAS (Server Attached Storage) - a drive connected to the server [the second name is DAS (Direct Attached Storage) - a directly attached drive];
  • NAS (Network Attached Storage) - a storage device connected to a network;
  • SAN (Storage Area Network) is a storage area network.

We have already written about SAS / DAS, NAS and SAN technologies in the article dedicated to SAN, if anyone is interested in this information, we recommend that you refer to the iXBT pages. But still, let us refresh the material a little with an emphasis on practical use.

SAS / DAS - This is a fairly simple traditional connection method, which implies direct (hence the DAS) connection of the storage system to one or more host systems through a high-speed channel interface. Often in such systems, the same interface is used to connect the drive to the host that is used to access the internal disks of the host system, which generally provides high performance and easy connection.

The SAS system can be recommended for use if there is a need for high-speed processing of large data on one or several host systems. This, for example, can be a file server, graphics station, or a failover cluster system consisting of two nodes.



Figure 3. Clustered system with shared storage

NAS - a drive that is connected to the network and provides file (note - file, not block) access to data for host systems on the LAN / WAN. Clients that work with NAS usually use NSF (Network File System) or CIFS (Common Internet File System) protocols to access data. The NAS interprets the commands of the file protocols and executes a request to disk drives in accordance with the channel protocol used in it. In fact, NAS architecture is the evolution of file servers. The main advantage of such a solution is the speed of deployment and the quality of organizing access to files, due to specialization and narrow focus.

Based on the foregoing, NAS can be recommended for use if you need network access to files and sufficiently important factors are: ease of solution (which is usually a kind of quality guarantor) and ease of maintenance and installation... A great example of this is when a NAS is used as a file server in a small company office where ease of installation and administration is important. But at the same time, if you need access to files from a large number of host systems, a powerful NAS drive, thanks to a sophisticated specialized solution, is able to provide intensive traffic exchange with a huge pool of servers and workstations at a fairly low cost of the used communication infrastructure (for example , Gigabit Ethernet and copper twisted pair switches).

SAN - data storage network. SANs typically use block data access, although storage networks can be connected to devices that provide file services, such as NAS. In modern implementations of storage networks, the Fiber Channel protocol is most often used, but in general this is not required, and therefore, it is customary to allocate a separate class of Fiber Channel SANs (storage area networks based on Fiber Channel).

The SAN is based on a network separate from the LAN / WAN, which serves to organize access to data from servers and workstations directly involved in processing. This structure makes it relatively easy to build high availability, high demand systems. While SANs remain expensive today, the TCO (Total Cost of Ownership) for medium to large systems built using SAN technology is quite low. For a description of ways to reduce the TCO of enterprise storage with SANs, see the techTarget resource pages: http://searchstorage.techtarget.com.

Today, the cost of disk drives supporting Fiber Channel, as the most common interface for building SANs, is close to the cost of systems with traditional low-cost channel interfaces (such as parallel SCSI). The main cost components in the SAN remain the communication infrastructure, as well as the cost of its deployment and maintenance. In this connection, within the framework of SNIA and many commercial organizations, active work is underway on IP Storage technologies, which allows using much more inexpensive equipment and infrastructure for IP networks, as well as the colossal experience of specialists in this area.

There are many examples of effective use of SAN. A SAN can be used almost everywhere where there is a need to use multiple servers with shared storage. For example, for organizing teamwork on video data or pre-processing of printed products. In such a network, each participant in the digital content processing process gets the opportunity to work almost simultaneously on Terabytes of data. Or, for example, organizing backups of large amounts of data that are used by many servers. When building a SAN and using a LAN / WAN-independent data backup algorithm and “snapshot” technologies, you can back up almost any amount of information without compromising the functionality and performance of the entire information complex.

Fiber Channel in SAN

It is an undeniable fact that today it is FC (Fiber Channel) that dominates storage networks. And it was the development of this interface that led to the development of the SAN concept itself.

Experts with significant experience in the development of both channel and network interfaces took part in the design of the FC, and they managed to combine all the important positive features of both directions. One of the most important advantages of Fiber Channel, along with speed parameters (which, by the way, are not always the main ones for SAN users, and can be implemented using other technologies) is the ability to work over long distances and topology flexibility, which came to the new standard from network technologies ... Thus, the concept of building a storage network topology is based on the same principles as traditional local area networks, based on hubs, switches and routers, which greatly simplifies the construction of multi-node system configurations, including without a single point of failure.

It is also worth noting that Fiber Channel uses both fiber and copper media for data transmission. When organizing access to geographically remote sites at a distance of up to 10 kilometers, standard equipment and single-mode fiber are used for signal transmission. If the nodes are separated by 10 or even 100 kilometers, special amplifiers are used. When building such SANs, parameters that are rather unconventional for data storage systems are taken into account, for example, the speed of signal propagation in fiber.

Storage Trends

The storage world is extremely diverse. The capabilities of data storage systems and the cost of solutions are quite differentiated. There are solutions that combine the capabilities of serving hundreds of thousands of requests per second to tens and even hundreds of Terabytes of data, as well as solutions for one computer with inexpensive IDE disks.

IDE RAID

Recently, the maximum capacity of IDE disks has tremendously increased and surpasses SCSI disks by about two times, and if we talk about the price per unit volume ratio, IDE disks are in the lead by more than 6 times. This, unfortunately, did not positively affect the reliability of IDE disks, but nevertheless the scope of their use in stand-alone data storage systems is inexorably increasing. The main factor in this process is that the demand for large amounts of data is growing faster than the volume of single disks.

A few years ago, rare manufacturers decided to release standalone subsystems focused on using IDE disks. Today they are produced by virtually every manufacturer focused on the entry-level system market. The most common in the class of stand-alone subsystems with IDE disks is observed in the entry-level NAS systems. After all, if you use a NAS as a file server with a Fast Ethernet interface or even Gigabit Ethernet, then in most cases the performance of such disks is more than sufficient, and their low reliability is compensated by the use of RAID technology.

Where block access to data is needed at the lowest price per unit of stored information, today systems with IDE disks inside and with an external SCSI interface are actively used. For example, on the JetStor IDE system produced by the American company AC&NC for building a fault-tolerant archive with a storage volume of 10 Terabytes and the possibility of fast block access to data, the cost of storing one megabyte will be less than 0.3 cents.

Another interesting and rather original technology that I had to get acquainted with quite recently was the Raidsonic SR-2000 system with an external parallel IDE interface.


Figure 4. Entry-level standalone IDE RAID

It is a stand-alone disk system designed to use two IDE disks and is designed to be mounted inside a host system enclosure. It is completely independent of the operating system on the host machine. The system allows you to organize RAID 1 (mirror) or simply copy data from one disk to another with hot swappable disks, without any damage or inconvenience on the part of the computer user, which cannot be said about bus-oriented subsystems built on PCI IDE RAID controllers ...

It should be noted that leading manufacturers of IDE drives have announced the release of mid-range drives with Serial ATA interface, which will use high-level technologies. This should positively affect their reliability and increase the share of ATA solutions in data storage systems.

What Serial ATA will bring us

The first and most pleasant thing you can find in Serial ATA is the cable. Due to the fact that the ATA interface became serial, the cable became round and the connector narrower. If you've had to route IDE parallel cables across eight IDE channels on your system, I'm sure you'll love this feature. Of course, round IDE cables have existed for a long time, but their connector still remained wide and flat, and the maximum allowable length of a parallel ATA cable is not encouraging. When building systems with a large number of disks, the presence of a standard cable does not help much at all, since the cables have to be made independently, and at the same time their laying becomes almost the main task in time during assembly.

In addition to the peculiarities of the cable system, Serial ATA has other innovations that cannot be implemented independently for the parallel version of the interface with the help of a clerical knife or other handy tool. Disks with the new interface should soon support the Native Command Queuing instruction set. With Native Command Queuing, the Serial ATA controller analyzes I / O requests and optimizes the order of execution to minimize seek time. The similarity of the idea of \u200b\u200bSerial ATA Native Command Queuing with the organization of command queuing in SCSI is quite obvious, however, for Serial ATA up to 32 commands will be supported, and not the traditional for SCSI - 256. Native support for hot swapping of devices has also appeared. Of course, such a possibility existed before, but its implementation was outside the scope of the standard and, accordingly, could not be widely used. Speaking about the new high-speed capabilities of Serial ATA, it should be noted that now there is no great joy from them, but the main thing here is that there is a good Roadmap for the future, which would be very difficult to implement within the framework of parallel ATA.

Given the above, there is no doubt that the share of ATA solutions in entry-level storage systems should increase precisely due to the new Serial ATA drives and storage systems focused on the use of such devices.

Where Parallel SCSI Goes

Anyone who works with storage systems, even entry-level ones, can hardly say that they like systems with IDE disks. The main advantage of ATA drives is their low price compared to SCSI devices and, probably, a lower noise level. And all this happens for a simple reason, since the SCSI interface is better suited for use in storage systems and is still much cheaper than even more functional interface - Fiber Channel, SCSI drives are of better quality, more reliable and faster than those with cheap IDE drives.

Many manufacturers today use Ultra 320 SCSI, the newest interface in the family, to design parallel SCSI storage systems. Once in many Roadmaps there were plans to release devices with an Ultra 640 and even Ultra 1280 SCSI interface, but everything went to the fact that something needed to be radically changed in the interface. Already now, at the stage of using the Ultra 320, parallel SCSI does not suit many people, mainly due to the inconvenience of using classic cables.

Fortunately, a new Serial Attached SCSI (SAS) interface has recently been introduced. The new standard will have interesting features. It combines some of the capabilities of Serial ATA and Fiber Channel. Despite this oddity, it should be said that there is some common sense in such an interweaving. The standard originated from the physical and electrical specifications of Serial ATA, with improvements such as increasing signal levels to increase cable lengths accordingly, and increasing the maximum addressability of devices. And the most interesting thing is that the technologists promise to ensure compatibility of Serial ATA and SAS devices, but only in the next versions of the standards.

The most important features of SAS include:

  • point-to-point interface;
  • two-channel interface;
  • support for 4096 devices in the domain;
  • standard set of SCSI commands;
  • cable up to 10 meters long;
  • 4-wire cable;
  • full duplex.

Due to the fact that the new interface offers the use of the same miniature connector as Serial ATA, developers have new opportunity to build more compact devices with high performance. The SAS standard also provides for the use of expanders. Each expander will support 64-device addressing with the ability to cascade up to 4096 devices within a domain. This is, of course, significantly less than the capabilities of Fiber Channel, but for entry-level and mid-range storage systems, with drives directly attached to the server, this is sufficient.

For all its delights, Serial Attached SCSI is unlikely to quickly replace the conventional parallel interface. In the enterprise world, development tends to be more rigorous and naturally take longer than desktop development. And old technologies do not go away very quickly, since the period for which they work out is also rather long. But still, in the year 2004, devices with SAS interface should enter the market. Naturally, at first it will be mainly disks and PCI controllers, but in a year or so the data storage systems will catch up.

For a better generalization of information, we suggest that you familiarize yourself with a comparison of modern and new interfaces for data storage systems in the form of a table.

1 - The standard regulates a distance of up to 10 km for single-mode fiber, there are implementations of devices for transmitting data over a distance of more than 105 m.

2 - Hubs and some FC switches operate within the internal virtual ring topology, there are also many switch implementations that provide point-to-point connectivity to any devices connected to them.

3 - There are implementations of devices with SCSI, FICON, ESCON, TCP / I, HIPPI, VI protocols.

4 - The fact is that the devices will be mutually compatible (as the manufacturers promise to do in the near future). That is, SATA controllers will support SAS drives, and SAS controllers will support SATA drives.

Mass NAS craze

Recently, there has been a massive fascination with NAS drives abroad. The fact is that with the increasing relevance of a data-oriented approach to building information systems, the attractiveness of the specialization of classic file servers and the formation of a new marketing unit - NAS - increased. At the same time, the experience in building such systems was sufficient for a quick start of the technology of storage devices connected to the network, and the cost of their hardware implementation was extremely low. Today, NAS-drives are produced by virtually all manufacturers of storage systems, among them are entry-level systems for very little money, and medium, and even systems responsible for storing tens of Terabytes of information, capable of processing a huge number of requests. Each class of NAS systems has its own interesting original solutions.

PC based NAS in 30 minutes

We want to describe a little one original entry-level solution. One can argue about the practical value of its implementation, but it cannot be denied originality.

Basically, an entry-level NAS, and not just an entry-level NAS, is simple enough. personal computer with a certain number of disks and a software part that provides other network members with access to data at the file level. Thus, to build a NAS device, it is enough to take these components and connect them together. The whole point is how well you do it, the same reliable and high-quality access to data will be received by the working group working with the data that your device provides access to. It is taking into account these factors, as well as the deployment time of the solution, plus some design research, an entry-level NAS drive is being built.

The difference between a good entry-level NAS solution with self-assembled and customized staff within the chosen OS, if we again omit the design, will be:

  • how quickly you will do it;
  • how easy this system can be maintained by unqualified personnel;
  • how well this solution will work and be supported.

In other words, in the case of professional selection of components and the existence of a certain initially configured set of software, a good result can be achieved. The truth seems to be banal, the same can be said about any task that is solved according to the scheme of ready-made component solutions: "hardware" plus "software".

What does Company X propose to do? A rather limited list of compatible components is being formed: motherboards with all integrated facilities, required for an entry-level NAS server of hard drives. You buy a FLASH disk with recorded software installed in the IDE connector on the motherboard and get a ready-made NAS drive. The operating system and utilities written to this disk, when booting, configure the necessary modules in an adequate way. And as a result, the user gets a device that can be controlled both locally and remotely via the HTML interface and provide access to disk drives connected to it.

File protocols in modern NAS

CIFS (Common Internet File System) is a standard protocol that provides access to files and services on remote computers (including the Internet). The protocol uses a client-server interaction model. The client creates a request to the server to access files or send a message to a program that resides on the server. The server fulfills the client's request and returns the result of its work. CIFS is an open standard that arose on the basis of the Server Message Block Protocol (SMB) developed by Microsoft, but, unlike the latter, CIFS takes into account the possibility of long timeouts, as it is focused on use also in distributed networks. The SMB protocol has traditionally been used on Windows LANs for file access and printing. CIFS uses TCP / IP protocol to transport data. CIFS provides functionality similar to FTP (File Transfer Protocol), but provides clients with improved (direct-like) control over files. It also allows you to share access to files between clients, using blocking and automatic restoration of communication with the server in the event of a network failure.

NFS (Network File System) is an IETF standard that includes Distributed File System and Networking Protocol. NFS was developed by Sun Microsystem Computer Corporation. It was originally used only on UNIX systems, later implementations of client and server chat became widespread in other systems.

NFS, like CIFS, uses a client-server communication model. It provides access to files on a remote computer (server) for writing and reading as if they were on the user's computer. Earlier versions of NFS used the UDP protocol to transport data, while modern versions use TCP / IP. For the operation of NFS on the Internet, Sun has developed the WebNFS protocol, which uses extensions to the functionality of NFS for its correct operation on the worldwide network.

DAFS (Direct Access File System) is a standard file access protocol based on NFSv4. It allows applications to transfer data bypassing the operating system and its buffer space directly to transport resources, while retaining the semantics inherent in file systems. DAFS takes advantage of the latest technologies data transfer according to the memory-memory scheme. Its use provides high file I / O speeds, minimal CPU and system load due to a significant reduction in the number of operations and interrupts that are usually required when processing network protocols. The use of hardware support for the VI (Virtual Interface) is especially effective.

DAFS has been designed for use in a clustered and server environment for databases and a variety of end-to-end Internet applications. It provides the lowest latency in accessing file shares and data, and also supports intelligent system and data recovery mechanisms, which makes it very attractive for use in high-end NAS-drives.

All roads lead to IP Storage

There are many exciting new technologies that have emerged in high- and mid-range storage systems over the past few years.

Fiber Channel SAN is a well-known and popular technology today. At the same time, their mass distribution today is problematic due to a number of features. These include the high cost of implementation and the complexity of building geographically distributed systems. On the one hand, these are just the features of enterprise-level technology, but on the other hand, if the SAN becomes cheaper and the construction of distributed systems becomes easier, this should simply give a colossal breakthrough in the development of storage networks.

As part of work on network technologies data storage in the Internet Engineering Task Force (IETF), a working group and an IP Storage (IPS) forum was created in the following areas:

FCIP - Fiber Channel over TCP / IP, a TCP / IP-based tunneling protocol whose function is to connect geographically distant FC SANs without any impact on the FC and IP protocols.

iFCP - Internet Fiber Channel Protocol, created on the basis of TCP / IP protocol for connecting FC storage systems or FC storage networks, using IP infrastructure together or instead of FC switching and routing elements.

iSNS - Internet Storage Name Service, a protocol for supporting the names of drives on the Internet.

iSCSI - Internet Small Computer Systems Interface, is a protocol that is based on TCP / IP and is designed to communicate and manage storage systems, servers and clients (SNIA Definition - IP Storage Forum:).

The most rapidly developing and most interesting of the listed areas is iSCSI.

iSCSI is the new standard

On February 11, 2003, iSCSI became the official standard. Ratification of iSCSI is bound to influence broader interest in the standard, which is already developing quite actively. The fastest development of iSCSI will serve as an impetus for the spread of SANs in small and medium-sized businesses, since the use of a standard equipment and service approach (including those common within the framework of standard Ethernet networks) will make SANs much cheaper. As for the use of iSCSI on the Internet, today FCIP has already taken root here, and competition with it will be difficult.

Well-known IT companies willingly supported the new standard. There are, of course, opponents, but still, almost all companies that actively participate in the market for entry and mid-level systems are already working on devices with iSCSI support. In Windows and Linux iSCSI drivers are already included, iSCSI data storage systems are produced by IBM, adapters - by Intel, HP, Dell, EMC promise to join the process of mastering the new standard in the near future.

One of the very interesting features of iSCSI is that you can use not only carriers, switches and routers of existing LAN / WAN networks to transfer data on an iSCSI drive, but also regular Fast Ethernet or Gigabit Ethernet network adapters on the client side. However, this creates a significant overhead for the processing power of a PC that uses such an adapter. According to the developers, the software implementation of iSCSI can achieve the speeds of the Gigabit Ethernet data transmission medium with a significant, up to 100% load of modern CPUs. In this connection, it is recommended to use special network cards that will support mechanisms to offload the CPU from processing the TCP stack.

SAN virtualization

Virtualization is another important technology in the construction of modern storage and storage networks.

Storage virtualization is the presentation of physical resources in a logical, more convenient way. This technology allows flexible allocation of resources between users and efficient management of them. Within the framework of virtualization, remote copying, snapshot copying, distribution of I / O requests to the most suitable storage devices and many other algorithms are successfully implemented. The implementation of virtualization algorithms can be carried out both by means of the drive itself, and with the help of external virtualization devices, or with the help of control servers running specialized software under standard operating systems.

This, of course, is a very small part of what can be said about virtualization. This topic is very interesting and extensive, so we decided to devote a separate publication to it.

If the Servers are universal devices that perform, in most cases,
- either the function of the application server (when special programs are executed on the server and intensive calculations are in progress),
- either the function of a file server (i.e. some place for centralized storage of data files)

then DSS (Data Storage Systems) are devices specially designed to perform such server functions as data storage.

The need to purchase storage systems
usually occurs in fairly mature enterprises, i.e. those who think about how
- store and manage information, the most valuable asset of the company
- ensure business continuity and protection against data loss
- increase the adaptability of the IT infrastructure

Storage and virtualization
Competition forces SMEs to work more efficiently, without downtime and with high efficiency. Production models, tariff plans, and types of services are changing more and more often. The entire business of modern companies is "tied" to information technology... Business needs change quickly and are instantly reflected in IT - the requirements for the reliability and adaptability of the IT infrastructure are growing. Virtualization provides these capabilities, but it requires low-cost, easy-to-maintain storage systems.

Storage classification by connection type

DAS... The first disk arrays were connected to servers via SCSI. Moreover, one server could work with only one disk array. This is a Direct Attached Storage (DAS) connection.

NAS... For a more flexible organization of the structure of the data center - so that each user can use any storage system - it is necessary to connect the storage system to a local network. This is NAS - Network Attached Storage). But the exchange of data between the server and the storage system is many times more intensive than between the client and the server, therefore, in this version, there were objective difficulties associated with the bandwidth of the Ethernet network. And from a security point of view, it is not entirely correct to show storage systems to a shared network.

SAN... But you can create your own, separate, high-speed network between servers and storage systems. This network was called SAN (Storage Area Network). High-speed performance is ensured by the fact that the physical transmission medium there is optics. Special adapters (HBA) and optical FC switches provide data transfer at 4 and 8Gbit / s. The reliability of such a network was increased by redundancy (duplication) of channels (adapters, switches). The main disadvantage is the high price.

iSCSI... With the advent of inexpensive 1Gbit / s and 10Gbit / s Ethernet technologies, 4Gbit / s optics no longer look so attractive, especially considering the price. Therefore, the iSCSI (Internet Small Computer System Interface) protocol is increasingly used as the SAN environment. An iSCSI SAN can be built on any reasonably fast physical foundation that supports IP.

Classification of Data Storage Systems by application:

class description
personal

Most often, they are a regular 3.5 "or 2.5" or 1.8 "hard drive placed in a special case and equipped with USB and / or FireWire 1394 and / or Ethernet and / or eSATA interfaces.
Thus, we have a portable device that can be connected to a computer / server and act as an external drive. Sometimes, for convenience, wireless access, printer and USB ports are added to the device.

small workgroup

Usually this is a stationary or portable device, in which you can install several (most often from 2 to 5) SATA hard drives, hot-swappable or not, with an Ethernet interface. Disks can be organized into arrays - RAID of various levels to achieve high storage reliability and access speed. The storage system has a specialized OS, usually based on Linux, and allows you to differentiate the access level by username and password, organize disk space quotas, etc.
Such storage systems are suitable for small workgroups as a replacement for file servers.

workgroup

Typically 19 "rack-mountable device that can accommodate 12-24 HotSwap SATA or SAS hard drives. Has external Ethernet, and / or iSCSI interface. Drives are organized in arrays - RAID to achieve high reliability of storage and speed of access.Storage system comes with specialized software that allows you to differentiate the level of access, organize quotas for disk space, organize BackUp (information backup), etc.
Such storage systems are suitable for medium and large enterprises, and are used in conjunction with one or more servers.
enterprise
Stationary device or a 19 "rack-mount device that can hold up to hundreds of hard drives.
In addition to the previous class, storage systems can have the ability to expand, upgrade and replace components without stopping the system, monitoring system. The software can support "snapshot" and other "advanced" functions.
These storage systems are suitable for large enterprises and provide increased reliability, speed and protection of critical data.

high-end enterprise

In addition to the previous class, storage can support thousands of hard drives.
Such storage systems occupy several 19 "cabinets, the total weight reaches several tons.
Storage systems are designed for non-stop work with the highest degree reliability, storage of strategically important data of the state / corporation level.

History of the issue.

The first servers combined all functions (like computers) in one case - both computing (application server) and data storage (file server). But as the demand for applications grows computing power on the one hand, and as the amount of processed data grows, on the other hand, it has become simply inconvenient to place everything in one case. It turned out to be more efficient to move the disk arrays into separate enclosures. But then the question arose of connecting the disk array to the server. The first disk arrays were connected to servers via SCSI. But in this case, one server could work with only one disk array. People wanted a more flexible organization of the structure of the data center - so that any server could use any storage system. Connecting all devices directly to a local network and organizing data exchange via Ethernet is, of course, a simple and universal solution. But the exchange of data between servers and storage systems is many times more intensive than between clients and servers, therefore, in this version (NAS - see below), there were objective difficulties associated with the bandwidth of the Ethernet network. The idea arose to create a separate high-speed network between servers and storage systems. This network was called SAN (see below). It is similar to Ethernet, except that the physical transmission medium is optics. There are also adapters (HBA) that are installed in servers and switches (optical). Optical data transfer rate standards - 4Gbit / s. With the advent of 1Gbit / s and 10Gbit / s Ethernet technologies, as well as the iSCSI protocol, Ethernet is increasingly being used as the SAN medium.