home » the Internet » Failure rate of item reference. Reliability and survivability of onboard computing systems (BCVS). Analysis of the reliability of systems with multiple failures

Failure rate of item reference. Reliability and survivability of onboard computing systems (BCVS). Analysis of the reliability of systems with multiple failures

1.1 Probability of uptime

The probability of failure-free operation is the probability that under certain operating conditions, within a given operating time, not a single failure will occur.
The probability of failure-free operation is denoted as P(l) , which is determined by formula (1.1):

where N 0 - the number of elements at the beginning of the test;r(l) is the number of failures of elements by the time of operating time.It should be noted that the larger the valueN 0 , the more accurate you can calculate the probabilityP(l).
At the beginning of operation of a serviceable locomotive P(0) \u003d 1, since when running l \u003d 0, the probability that none of the elements will fail takes the maximum value - 1.With increasing mileage l probability P(l) will decrease. In the process of approaching the service life to an infinitely large value, the probability of failure-free operation will tend to zero P(l→∞) \u003d 0. Thus, in the process of operating time, the value of the probability of no-failure operation varies from 1 to 0. The nature of the change in the probability of no-failure operation in the mileage function is shown in Fig. 1.1.

Figure 2.1. Uptime Probability Graph P (l)depending on the operating time

The main advantages of using this indicator in the calculations are two factors: firstly, the probability of failure-free operation covers all the factors affecting the reliability of the elements, making it easy to judge its reliability, since the larger the valueP(l), the higher the reliability; second, the probability of failure-free operation can be used in calculating the reliability of complex systems consisting of more than one element.

1.2 Probability of failure

The probability of failure is the probability that under certain operating conditions, within the limits of a given operating time, at least one failure will occur.
The probability of failure is denoted as Q(l), which is determined by formula (1.2):

At the beginning of operation of a serviceable locomotiveQ(0) \u003d 0, since when runningl \u003d 0 the probability that at least one element will fail takes minimum value - 0.With increasing mileagel probability of failureQ(l) will increase. In the process of approaching the service life to an infinitely large value, the probability of failure will tend to unityQ(l→∞ ) \u003d 1. Thus, in the course of operating time, the value of the probability of failure varies from 0 to 1. The nature of the change in the probability of failure in the mileage function is shown in Fig. 1.2.The probability of failure-free operation and the probability of failure are opposite and incompatible events.

Figure 2.2. Changes in the probability of failure Q (l)depending on the operating time

1.3 Failure rate

The failure rate is the ratio of the number of elements per unit of time or the mileage referred to the initial number of tested elements. In other words, the failure rate is a measure of the rate at which the probability of failure and the likelihood of failure-free operation change as the duration of operation increases.
The failure rate is denoted as and is determined by the formula (1.3):

where is the number of failed elements during the run interval.
This indicator makes it possible to judge by its value about the number of elements that will fail at a certain period of time or mileage, and by its value, you can calculate the number of required spare parts.
The nature of the change in the failure rate in the run function is shown in Fig. 1.3.

Figure: 1.3. The graph of the change in the failure rate depending on the operating time

1.4 Failure rate

The failure rate is the conditional density of the object's failure, determined for the considered moment of time or operating time, provided that no failure has occurred before this moment. Otherwise, the failure rate is the ratio of the number of failed elements per unit of time or mileage to the number of properly operating elements in a given period of time.
The failure rate is denoted as and is determined by the formula (1.4):

where

Typically, the failure rate is a non-decreasing function of time. Failure rates are commonly used to assess the propensity to fail at various points in the operation of objects.
In fig. 1.4. the theoretical nature of the change in the failure rate in the mileage function is presented.

Figure: 1.4. The graph of the change in the failure rate depending on the operating time

On the graph of changes in the failure rate shown in Fig. 1.4. Three main stages can be distinguished, reflecting the process of exploitation of an element or object as a whole.
The first stage, also called the running-in stage, is characterized by an increase in the failure rate during the initial period of operation. The reason for the increase in the failure rate at this stage is latent manufacturing defects.
The second stage, or period normal work, is characterized by the tendency of the failure rate to a constant value. During this period, random failures may occur due to the appearance of a sudden concentration of load exceeding the ultimate strength of the element.
The third stage, the so-called period of forced aging. It is characterized by the occurrence of wear failures. Further operation of the element without replacing it becomes economically unreasonable.

1.5 Mean time to failure

Mean time to failure is the mean time to failure of an element to fail.
Mean time to failure is denoted as L 1 and is determined by the formula (1.5):

where l i - time to failure of an element; r i is the number of refusals.
MTBF can be used to pre-determine when to repair or replace an item.

1.6 Average value of the parameter of the flow of failures

The average value of the parameter of the flow of failures characterizes the average density of the probability of an object's failure, determined for the considered moment in time.
The average value of the parameter of the flow of failures is denoted as W Wed and is determined by the formula (1.6):

1.7 An example of calculating reliability indicators

Initial data.
During the run from 0 to 600 thousand km, information was collected in the locomotive depot on TED failures. At the same time, the number of serviceable traction electric motors at the beginning of the operation period was N0 \u003d 180 pcs. The total number of failed traction motors for the analyzed period was ∑r (600000) \u003d 60. The run interval should be taken equal to 100 thousand km. In this case, the number of failed TEDs for each section was: 2, 12, 16, 10, 14, 6.

Required.
It is necessary to calculate the reliability indicators and build their dependences of changes over time.

First, you need to fill in the source data table as shown in table. 1.1.

Table 1.1.

Initial data for calculation

, thousand km	0 - 100	100 - 200	200 - 300	300 - 400	400 - 500	500 - 600
	2	12	16	10	14	6
	2	14	30	40	54	60

Initially, using equation (1.1), we determine the probability of no-failure operation for each section of the run. So, for a section from 0 to 100 and from 100 to 200 thousand km. run the probability of failure-free operation will be:

Let's calculate the failure rate according to equation (1.3).

Then the failure rate in the section 0-100 thousand km. will be equal to:

In a similar way, let us determine the value of the failure rate for the interval 100-200 thousand km.

Using equations (1.5 and 1.6), we determine the mean time to failure and the mean value of the failure flow parameter.

Let us systematize the obtained calculation results and present them in the form of a table (Table 1.2.).

Table 1.2.

The results of calculating the reliability indicators

, thousand km	0 - 100	100 - 200	200 - 300	300 - 400	400 - 500	500 - 600
	2	12	16	10	14	6
	2	14	30	40	54	60
P (l)	0,989	0,922	0,833	0,778	0,7	0,667
Q (l)	0,011	0,078	0,167	0,222	0,3	0,333
10 -7, 1 / km	1,111	6,667	8,889	5,556	7,778	3,333
10 -7, 1 / km	1,117	6,977	10,127	6,897	10,526	4,878

Here is the nature of the change in the probability of trouble-free operation of the traction electric motor depending on the mileage (Fig. 1.5.). It should be noted that the first point on the chart, i.e. when the mileage is equal to 0, the value of the probability of no-failure operation will take the maximum value - 1.

Figure: 1.5. The graph of the change in the probability of no-failure operation depending on the operating time

Here is the nature of the change in the probability of failure of traction electric motor depending on the mileage (Fig. 1.6.). It should be noted that the first point on the chart, i.e. when the mileage is equal to 0, the value of the probability of failure will take the minimum value - 0.

Figure: 1.6. The graph of the change in the probability of failure depending on the operating time

Here is the nature of the change in the failure rate of the traction electric motor depending on the mileage (Fig. 1.7.).

Figure: 1.7. The graph of the change in the failure rate depending on the operating time

In fig. 1.8. the dependence of the change in the failure rate on the operating time is presented.

Figure: 1.8. The graph of the change in the failure rate depending on the operating time

2.1 Exponential law of distribution of random variables

The exponential law quite accurately describes the reliability of nodes in case of sudden failures of a random nature. Attempts to apply it for other types and cases of failures, especially gradual ones caused by wear and changes in the physicochemical properties of elements, have shown its insufficient acceptability.

Initial data.
As a result of testing ten high-pressure fuel pumps, their operating times to failure were obtained: 400, 440, 500, 600, 670, 700, 800, 1200, 1600, 1800 hours. Assuming that the operating time to failure of fuel pumps obeys an exponential distribution law.

Required.
Estimate the value of the failure rate, as well as calculate the probability of failure-free operation for the first 500 hours and the probability of failure in the interval between 800 and 900 hours of diesel operation.

First, let us determine the average operating time of fuel pumps to failure using the equation:

Then we calculate the magnitude of the failure rate:

The value of the probability of failure-free operation of fuel pumps with an operating time of 500 hours will be:

The probability of failure between 800 and 900 hours of pump operation will be:

2.2 Weibull-Gnedenko distribution law

The Weibull-Gnedenko distribution law has become widespread and is used in relation to systems consisting of rows of elements connected in series from the point of view of ensuring system reliability. For example, systems serving a diesel generator set: lubrication, cooling, fuel supply, air, etc.

Initial data.
The idle time of diesel locomotives in unscheduled repairs due to the fault of auxiliary equipment obeys the Weibull-Gnedenko distribution law with parameters b \u003d 2 and a \u003d 46.

Required.
It is necessary to determine the likelihood of diesel locomotives leaving unscheduled repairs after 24 hours of downtime and the downtime during which the operability will be restored with a probability of 0.95.

Let us find the probability of restoring the operability of the locomotive after its idle time in the depot for a day using the equation:

To determine the recovery time of the locomotive with a given value of the confidence level, we also use the expression:

2.3 Rayleigh distribution law

The Rayleigh distribution law is mainly used to analyze the operation of elements with a pronounced aging effect (electrical equipment, various types of seals, washers, gaskets made of rubber or synthetic materials).

Initial data.
It is known that the operating time of contactors to failure according to the aging parameters of the insulation of the coils can be described by the Rayleigh distribution function with the parameter S \u003d 260 thousand km.

Required.
For an operating time of 120 thousand km. it is necessary to determine the probability of no-failure operation, the failure rate and the mean time to the first failure of the coil of the electromagnetic contactor.

3.1 Basic connection of elements

A system consisting of several independent elements, functionally connected in such a way that the failure of any of them causes a system failure, is displayed by the calculated block diagram of failure-free operation with series-connected events of failure-free operation of the elements.

Initial data.
The non-redundant system consists of 5 elements. Their failure rates are respectively equal to 0.00007; 0.00005; 0.00004; 0.00006; 0.00004 h-1

Required.
It is necessary to determine the indicators of system reliability: failure rate, mean time to failure, probability of uptime, failure rate. Reliability indices P (l) and a (l) should be obtained in the range from 0 to 1000 hours with a step of 100 hours.

Let's calculate the failure rate and mean time to failure using the following equations:

The values \u200b\u200bof the probability of no-failure operation and the failure rate are obtained using the equations reduced to the form:

Calculation results P (l) and a (l) in the interval from 0 to 1000 hours of operation, we present it in the form of a table. 3.1.

Table 3.1.

Results of calculating the probability of no-failure operation and the frequency of system failures in the time interval from 0 to 1000 h.

l, hour	P (l)	a (l), hour -1
0	1	0,00026
100	0,974355	0,000253
200	0,949329	0,000247
300	0,924964	0,00024
400	0,901225	0,000234
500	0,878095	0,000228
600	0,855559	0,000222
700	0,833601	0,000217
800	0,812207	0,000211
900	0,791362	0,000206
1000	0,771052	0,0002

Graphic illustration P (l)and a (l) in the section up to the mean time to failure is shown in Fig. 3.1, 3.2.

Figure: 3.1. System uptime probability.

Figure: 3.2. System failure rate.

3.2 Redundant connection of elements

Initial data.
In fig. 3.3 and 3.4 show two structural diagrams of connecting elements: general (Fig. 3.3) and element-by-element redundancy (Fig. 3.4). The probabilities of failure-free operation of the elements are respectively equal to P1 (l) \u003d P '1 (l) \u003d 0.95; P2 (l) \u003d P'2 (l) \u003d 0.9; P3 (l) \u003d P'3 (l) \u003d 0.85.

Figure: 3.3. General redundancy system diagram.

Figure: 3.4. System diagram with element-by-element redundancy.

The probability of failure-free operation of a block of three elements without redundancy is calculated by the expression:

The probability of failure-free operation of the same system with total redundancy (Fig. 3.3) will be:

The probabilities of failure-free operation of each of the three units with element-by-element redundancy (Fig. 3.4) will be equal:

The probability of failure-free operation of the system with item-by-item redundancy will be:

Thus, item-by-item redundancy provides a more significant increase in reliability (the probability of no-failure operation increased from 0.925 to 0.965, i.e. by 4%).

Initial data.
In fig. 3.5 shows a system with a combined connection of elements. In this case, the probabilities of failure-free operation of the elements have the following values: P1 \u003d 0.8; P2 \u003d 0.9; P3 \u003d 0.95; P4 \u003d 0.97.

Required.
The reliability of the system must be determined. It is also necessary to determine the reliability of the same system, provided that there are no redundant elements.

Figure 3.5. System diagram with combined operation of elements.

For the calculation in the original system, it is necessary to select the main blocks. In the presented system there are three of them (Fig. 3.6). Next, we will calculate the reliability of each unit separately, and then we will find the reliability of the entire system.

Figure: 3.6. Interlocked circuit.

The system reliability without redundancy will be:

Thus, a non-redundant system is 28% less reliable than a redundant system.

Availability

LECTURE No. 14. Ensuring accessibility

The information system provides its users with a certain set of services (services). They say that the required level of availability of these services is provided if the following indicators are within the specified limits:

Service efficiency... Service efficiency is defined in terms of the maximum request service time, the number of supported users, and the like. It is required that the efficiency does not fall below a predetermined threshold.
Time of unavailability. If the efficiency of the information service does not meet the imposed restrictions, the service is considered unavailable. It is required that the maximum duration of the period of unavailability and the total time of unavailability for a certain period (month, year) do not exceed predetermined limits.

In essence, the information system is required to almost always operate with the desired efficiency. For some critical systems (eg control systems), the downtime should be zero, without any "almost". In this case, they talk about the likelihood of a situation of unavailability and require that this probability does not exceed a given value. To solve this problem, special fault-tolerant systems have been created and are being created, the cost of which, as a rule, is very high.

Less stringent requirements are imposed on the vast majority of commercial systems, but modern business life here imposes rather severe restrictions too, when the number of served users can be measured in thousands, the response time should not exceed a few seconds, and the unavailability time should not exceed several hours a year.

The problem of ensuring high availability must be solved for modern configurations built in technologies client / server. This means that the entire chain needs protection - from users (possibly remote ones) to critical servers (including security servers).

The main threats to accessibility were discussed by us earlier.

In accordance with GOST 27.002, a failure is understood as an event that is a violation of the product's performance. In the context of this work, a product is an information system or its component.

In the simplest case, we can assume that failures of any component of a composite product lead to a general failure, and the distribution of failures over time is a simple Poisson stream of events. In this case, the concept of failure rate and mean time between failures is introduced, which are interconnected by the relationship

where is the component number,

- failure rate,

Is the mean time between failures.

The failure rates of independent components are summed up:

and the mean time between failures for a composite product is given by the ratio

Already these simple calculations show that if there is a component whose failure rate is much higher than that of the others, then it is this component that determines the mean time between failures of the entire information system... This is a theoretical basis for the principle of strengthening the weakest link first.

The Poisson model makes it possible to substantiate another very important proposition, which is that an empirical approach to building high availability systems cannot be implemented in acceptable time... In a traditional test / debug cycle software system optimistically, each error correction results in an exponential decrease (by about half the decimal order) in the failure rate. It follows from this that in order to make sure by experience that the required level of availability is achieved, regardless of the testing and debugging technology used, you will have to spend time almost equal to the mean MTBF. For example, it will take more than 10 4.5 hours to reach an MTBF of 10 5 hours, which is more than three years. This means that other methods of constructing high-availability systems are needed, methods whose effectiveness has been proven analytically or practically over more than fifty years of development of computer technology and programming.

The Poisson model is applicable in cases where the information system contains single points of failure, that is, components whose failure leads to the failure of the entire system. A different formalism is used to study redundant systems.

In accordance with the formulation of the problem, we will assume that there is a quantitative measure of the effectiveness of information services provided by the product. In this case, the concepts of performance indicators are introduced individual elements and the efficiency of the entire complex system.

As a measure of accessibility, we can take the likelihood of the acceptability of the efficiency of the services provided by the information system throughout the considered period of time. The more efficiency the system has, the higher its availability.

In the presence of redundancy in the system configuration, the probability that the efficiency of information services will not fall below the acceptable limit during the considered period of time depends not only on the probability of component failure, but also on the time during which they remain inoperative, since the overall efficiency decreases. and each subsequent refusal can be fatal. To maximize system availability, the downtime of each component must be minimized. In addition, it should be borne in mind that, generally speaking, repair work may require a decrease in efficiency or even a temporary shutdown of operable components; this kind of influence also needs to be minimized.

Several terminological notes. Typically in the reliability theory literature, availability (including high availability) is referred to instead of availability. We have chosen the term “accessibility” to emphasize that information service should not only be "ready" by itself, but be available to its users in conditions when situations of inaccessibility can be caused by reasons that, at first glance, have no direct relation to service (example - lack of consulting services).

Further, instead of the time of unavailability, one usually speaks of the availability factor. We wanted to draw attention to two indicators - the duration of a single downtime and the total duration of downtime, so we preferred the term "unavailable time" as more capacious.

Methodology for assessing the failure rate of functional units of integrated circuits

Baryshnikov A.V.

(FSUE NII "Avtomatiki")

1. Introduction

The problem of predicting the reliability of radio electronic equipment (REA) is relevant for almost all modern technical systems. Considering that REA includes electronic components, the task arises of developing methods that allow assessing the failure rates (FF) of these components. Often, the technical requirements for reliability in terms of reference (TOR) for the development of electronic equipment are in conflict with the requirements for the weights and dimensions of the electronic equipment, which does not allow fulfilling the requirements of the technical specification due to, for example, duplication.

For a number of types of electronic equipment, increased reliability requirements are imposed on control devices located in the same crystal with the main functional units of the equipment. For example, to the addition scheme modulo 2, which provides control of the operation of the main and backup nodes of any equipment blocks. Increased reliability requirements can also be imposed on memory areas in which information is stored, which is necessary for the execution of the algorithm of the equipment operation.

The proposed methodology makes it possible to evaluate the IO of different functional areas of microcircuits. In memory chips: random access memory (RAM), read only memory (ROM), reprogrammable memory (EPROM), these are the failure rates of storage devices, decoders and control circuits. In microcontroller and microprocessor circuits, the technique allows you to determine the IO of memory areas, an arithmetic logic device, analog-to-digital and digital-to-analog converters, etc. In programmable logic integrated circuits (FPGA), IO of the main functional units that make up the FPGA: configurable logic block, I / O block, memory areas, JTAG, etc. The technique also allows you to determine the EUT of one output of the microcircuit, one memory cell, and, in some cases, the EUT of individual transistors.

2. Purpose and scope of the technique

The technique is designed to assess the operational IO λ e of different functional units of microcircuits: microprocessors, microcontrollers, memory microcircuits, programmable logic integrated circuits. In particular, inside the crystal regions of the memory, as well as the IO of the storage cells of the memory of foreign-made microcircuits, including microprocessors, FPGAs. Unfortunately, the lack of information about the IO of the packages does not allow us to apply the technique to domestic microcircuits.

The EUT, determined by this method, are the initial data for calculating the reliability characteristics when carrying out engineering studies of equipment.

The methodology contains an algorithm for calculating an IO, an algorithm for checking the obtained calculation results, examples of calculating an IO of functional units of a microprocessor, memory circuits, programmable logic circuits.

3. Method assumptions

The methodology is based on the following assumptions:

Element failures are independent;

IO of the microcircuit is constant.

In addition to these assumptions, the possibility of separating IO microcircuits into IO packages and the failure rate of the crystal will be shown.

4. Initial data

1. The functional purpose of the microcircuit: microprocessor, microcontroller, memory, FPGA, etc.

2. Microcircuit manufacturing technology: bipolar, CMOS.

3. The value of the failure rate of the microcircuit.

4. Block diagram of the microcircuit.

5. Type and volume of storage circuits of memory.

6. Number of body leads.

5.1. According to the known values \u200b\u200bof the IO of the microcircuit, the IO of the package and the crystal are determined.

5.2. According to the found value of the IO of the crystal, for the memory microcircuit, based on its type and manufacturing technology, the IO of the drive, decoder circuits, control circuits are calculated. Calculation is based on standard construction electrical circuitsserving the drive.

5.3. For a microprocessor or microcontroller, using the calculation results obtained in the previous paragraph, the IO of the memory areas is determined. The difference between the IO of the crystal and the found values \u200b\u200bof the IO of the memory areas will be the value of the IO of the rest of the microcircuit.

5.4. Based on the known values \u200b\u200bof IO crystals for the FPGA family, their functional composition and the number of nodes of the same type, a system of linear equations is drawn up. Each of the equations of the system is compiled for one type from the FPGA family. The right side of each of the equations of the system is the sum of the products of the IO values \u200b\u200bof functional units of a certain type by their number. The left side of each of the equations of the system is the value of the IO of the crystal of a particular type of FPGA from the family.

The maximum number of equations in the system is equal to the number of FPGAs in the family.

The solution of the system of equations allows obtaining the values \u200b\u200bof the IO of the functional units of the FPGA.

5.5. Based on the results of the calculation obtained in the previous paragraphs, the values \u200b\u200bof the IO of a separate memory cell, the output of a microcircuit or a transistor of a specific block of the block diagram, can be found if the electrical circuit of the node is known.

5.6. Checking the calculation results for a memory microcircuit is made by comparing the IO value for another memory microcircuit, obtained by the standard method, with the IO value of this microcircuit calculated using the data obtained in clause 5.2 of this section.

5.7. The verification of the calculation results for FPGAs is performed by calculating the IO of the crystal of one of the standard types of the FPGA family under consideration, which was not included in the system of equations. The calculation is carried out using the values \u200b\u200bof the IO of functional units obtained in clause 5.4 of this section, and comparing the obtained value of the IO of the FPGA with the value of the IO calculated using standard methods.

6. Analysis of the model for predicting the failure rate of microcircuits from the point of view of the possibility of dividing the failure rate of a microcircuit by the sum of the failure rates of the crystal and the case

The IO of the crystal, the case and the external outputs of the microcircuit are determined from the mathematical model of predicting the IO of foreign integrated circuits for each type of the IC.

Let us analyze the terms of the mathematical model for calculating the operational

iO λ e digital and analog integrated circuits of foreign production:

λ e \u003d (С 1 π т + С 2 π E) π Q π L, (1),

where: C 1 - component of IO IS, depending on the degree of integration;

π т - coefficient taking into account overheating of the crystal relative to the environment;

C 2 - a component of the IO IS, depending on the type of package;

- π Е - coefficient taking into account the severity of the operating conditions of the electronic equipment (equipment operation group);

- π Q - coefficient taking into account the quality level of ERI manufacturing;

- π L -coefficient, taking into account the development of the technological process of manufacturing ERI;

This expression is valid for microcircuits manufactured using both bipolar and MOS technology, and includes digital and analog circuits, programmable logic matrices and FPGAs, memory microcircuits, microprocessors.

The mathematical model of the predicted IO of integrated circuits, for which the US Department of Defense standard is taken as the primary source, is the sum of two terms. The first term characterizes failures determined by the degree of crystal integration and the electrical operating mode of the microcircuit (coefficients C 1, π t), the second term characterizes failures associated with the type of case, the number of case leads and operating conditions (coefficients C 2, - π E).

This division is explained by the possibility of producing the same microcircuit in different types of cases, which differ significantly in their reliability (vibration resistance, tightness, hygroscopicity, etc.). Let us denote the first term as the IR determined by the crystal (λcr ), and the second - by the body (λcorp).

From (1) we get:

λкр \u003d С 1 π т π Q π L, λкр \u003d С 2 π E π Q π L (2)

Then the IO of one output of the microcircuit is:

λ 1In \u003d λcorp / N Out \u003d C 2 π E π Q π L / N Out,

where N Pin is the number of terminals in the integrated circuit package.

Let us find the ratio of the IO of the case to the operational IO of the microcircuit:

λcorp / λ e \u003d С 2 π E π Q π L / (С 1 π т + С 2 π E) π Q π L \u003d С 2 π E / (С 1 π т + С 2 π E) (3)

Let us analyze this expression from the point of view of the effect on it of the type of package, the number of leads, overheating of the crystal due to the power dissipated in the crystal, and the severity of the operating conditions.

6.1. Influence of severity of operating conditions

Dividing the numerator and denominator of expression (3) by the coefficient π E we get:

λcorp / λ e \u003d C 2 / (C 1 π t / π E + C 2) (4)

Analysis of expression (4) shows that the percentage ratio of the IO of the case and the operational IO of microcircuits depends on the operation group: the more severe the operating conditions of the equipment (the greater the value of the coefficient π E), the greater the proportion of failures fall on the case failures (the denominator in equation 4 decreases) and attitudeλcorp / λe tend to 1.

6.2. Influence of package type and number of package pins

Dividing the numerator and denominator of expression (3) by the coefficient C 2 we get:

λcorp / λ e \u003d π E / (С 1 π т / С 2 + π E) (5)

Analysis of expression (5) shows that the percentage ratio of the IO of the case and the operational IO of microcircuits depends on the ratio of the coefficients C 1 and C 2, i.e. on the ratio of the degree of integration of the microcircuit and the parameters of the case: the greater the number of elements in the microcircuit (the greater the coefficient C 1), the smaller the share of failures falls on the failures of the case (the ratioλcorp / λ e tend to zero) and the greater the number of terminals in the package, the more weight the package failures acquire (the ratioλcorp / λ e to strive for 1).

6.3. Effect of power dissipated in the crystal

From expression (3), it can be seen that with an increase in πt (the coefficient reflecting overheating of the crystal due to the power dissipated in the crystal), the value of the denominator of the equation increases, and, therefore, the share of failures attributable to the case decreases and the failures of the crystal acquire a greater relative weight.

Conclusion:

Analysis of Ratio Value Changes λcorp / λ e (equation 3) depending on the type of package, the number of leads, overheating of the crystal due to the power dissipated in the crystal, and the severity of the operating conditions showed that the first term in equation (1) characterizes the operational IO of the crystal, the second - the operational IO of the case and equations (2) can be used to evaluate the operational EUT of the semiconductor crystal itself, the package and the EUT of the package leads. The value of the operational IO of the crystal can be used as a starting material for evaluating the IO of functional units of microcircuits.

7. Calculation of the failure rate of the memory cell of storage devices that are part of the memory chips, microprocessors and microcontrollers.

To determine the EUT per bit of information of semiconductor memory, consider their composition. Any type of semiconductor memory includes :

1) Storage

2) Framing scheme:

o address part (line and column decoders)

o numerical part (amplifiers for recording and reading)

o local control unit - coordinates the operation of all nodes in the storage, recording, regeneration (dynamic storage) and erasing of information (EPROM) modes.

7.1. Estimation of the number of transistors in various areas of the memory.

Let's consider each component of the IO memory. The total value of IO memory for microcircuits of different types with different storage volumes can be determined using... The IO of the package and the crystal are calculated in accordance with Section 5 of this work.

Unfortunately, the technical materials for foreign memory microcircuits do not contain the total number of elements included in the microcircuit, and only the information capacity of the drive is given. Taking into account the fact that each type of memory contains standard blocks, we will estimate the number of elements included in the memory microcircuit based on the volume of the drive. To do this, consider the circuitry for constructing each memory unit.

7.1.1. RAM storage

In the electrical schematic diagrams of memory cells of RAM, made according to TTLSh, ESL, MOS and CMOS technologies. Table 1 shows the number of transistors from which one memory cell is built (1 bit of RAM information).

Table 1. The number of transistors in one memory cell

RAM type	Manufacturing technology
RAM type		TTLSh	ESL	MNP	CMOS
Static	Amount of elements				4, 5, 6
Dynamic	Amount of elements

7.1.2. ROM and EPROM drives

In bipolar ROMs and EPROMs, the storage element of the storage device is realized on the basis of diode and transistor structures. They are performed in the form of emitter followers onn - p - n and p - n - p transistors, collector-base transitions, emitter-base, Schottky diodes. As a storage element in circuits manufactured using MOS and CMOS technologies, are usedp and n -channel transistors. The storage element consists of 1 transistor or diode. The total number of transistors in the ROM or EPROM is equal to the information capacity of the LSI memory.

7.1.3. EPROM drive

The information recorded in the EPROM is stored from several to tens of years. Therefore, EPROMs are often called non-volatile memory. The mechanism is based on

bypassing and storing information are the processes of accumulating charge during writing, storing it during reading and when turning off the power supply in special MOS transistors. The memory elements of the EPROM are usually built on two transistors.

Thus, the number of transistors in the EPROM drive is equal to the information capacity of the EPROM multiplied by 2.

7.1.4. Address part

The address part of the memory is built on the basis of decoders (decoders). They allow you to determineN - bit input binary number by obtaining a single value of a binary variable at one of the device outputs. For the construction of integrated circuits, it is customary to use linear decoders or a combination of linear and rectangular decoders. Linear decoder hasN inputs and 2 N logic circuits "AND". Let's find the number of transistors needed to build such decoders in the CMOS basis (as the most often used to create LSIs). Table 2 shows the number of transistors required to build decoders for a different number of inputs.

Table 2. The number of transistors required to build decoders

Qty Inputs	Addressable inverters		Schemes "I"		The total number of transistors in the de-encoder 2 * N * 2 N + 2 * N
Qty Inputs	Qty Inverters	Qty Transistors	Qty scheme	Number of transistors 2 * N * 2 N

				4*4=16	16+4=20

				6*8=48	48+6=54
				8*16=128	128+8=136
				10*32 = 320	320+10 = 330
				64*12 = 768	768+12 = 780
				128*14=1792	1792+14=1806
				256*16=4096	4096+16=4112
				512*18=9216	9216+18=9234
			1024	1024*20=20480	20480+20=20500

For linear decoders, the word length of the decrypted number does not exceed 8-10. Therefore, with an increase in the number of words in the memory over 1K, the modular principle of constructing the memory is used.

7.1.5. Numeric part

(amplifiers for recording and reading)

These circuits are designed to convert the levels of readout signals to the levels of output signals of logic elements of a particular type and to increase the load capacity. They are typically open collector (bipolar) or tri-state (CMOS). Each of the output circuits can consist of several (two or three) inverters. The maximum number of transistors in these circuits with a maximum microprocessor capacity of 32 is no more than 200.

7.1.6. Local control unit

The local control unit, depending on the type of memory, may include line and column buffer registers, address multiplexers, regeneration control units in dynamic memory, information erasure schemes.

7.1.7. Estimation of the number of transistors in various areas of the memory

The quantitative ratio of RAM transistors included in the drive, decoder and local control unit is approximately equal to: 100: 10: 1, which is 89%, 10% and 1%, respectively. The number of transistors in the storage cell of RAM, ROM, PROM, EPROM is given in Table 1. Using the data in this table, the percentage ratios of the elements included in different RAM areas, and also assuming that the number of elements in the decoder and the local control unit for the same storage volume of different types of storage devices remains approximately constant, it is possible to estimate the ratio of transistors included in the storage device, decoder and local control unit of different types of storage devices. Table 3 shows the results of this assessment.

Table 3 The quantitative ratio of transistors in different functional areas of the memory

	The quantitative ratio of elements of different areas of memory
	Storage device	Decoder	Local control unit

ROM, EPROM

Thus, knowing the volume of the drive and the IO of the memory crystal, it is possible to find the IO of the drive, the address part, the numerical part, the local control unit, as well as the IO of the memory cell and transistors that are part of the framing circuits.

8. Calculation of the failure rate of functional units of microprocessors and microcontrollers

The section provides an algorithm for calculating the IO of functional units of microprocessor and microcontroller microcircuits. The technique is applicable for microprocessors and microcontrollers with a capacity of no more than 32 bits.

8.1. Initial data for calculating the failure rate

Below are the initial data required for calculating IO microprocessors, microcontrollers and parts of their electrical circuits. By part of the electrical circuit we mean both functionally complete units of a microprocessor (microcontroller), namely, different types of memories (RAM, ROM, EPROM, EPROM, ADC, DAC, etc.), and individual gates or even transistors.

Initial data

Bit capacity of the microprocessor or microcontroller;

Microcircuit manufacturing technology;

View and organization inside the crystal memory;

Information storage capacity;

Power consumption;

Thermal resistance crystal - case or crystal - environment;

Chip case type;

Number of case pins;

Increased operating ambient temperature.

Workmanship level.

8.2. Algorithm for calculating the failure rate of a microprocessor (microcontroller) and functional units of a microprocessor (microcontroller)

1. Determine the operational IO of the microprocessor or microcontroller (λe mp), using the initial data using one of the automated calculation programs: "ASRN", "Asonika-K" or using the "Military HandBook 217F" standard.

Note: below all calculations and comments will be given from the point of view of the use of ASRN, since The methodology of use and the content of the programs, "Asonika-K" and the standard "Military HandBook 217F" have a lot in common.

2. Determine the value of IO memory included in the microprocessor (λ E RAM, λ E ROM, EPROM, λ E EPROM), assuming that each memory is a separate microcircuit in its own package.

λ E RAM \u003d λ RAM + λcorp,

λ E ROM, PROM \u003d λ ROM, PROM + λcorp,

λ E EPROM \u003d λ EPROM + λcorp,

where λ E - operational values \u200b\u200bof IO of different types of memory, λcorp, - IO of the cases for each type of memory: λ RAM, λ ROM, PROM, λ EPROM - IO RAM, ROM, EPROM, EPROM excluding the case, respectively.

The search for initial data for calculating the operational values \u200b\u200bof the IO of different types of memory is carried out according to technical information (Data Sheet) and IC catalogs. In the specified literature, it is necessary to find memory devices, the type of which (RAM, ROM, EPROM, EPROM), the volume of the drive, the organization and manufacturing technology are the same or close to the memory of the microprocessor (microcontroller). The found technical characteristics of memory microcircuits are used in ASRN to calculate the operational IO of memory microcircuits. The power consumed by the charger is selected based on the electrical operating mode of the microprocessor (microcontroller).

3. Determine the IO values \u200b\u200binside the crystal areas of the microprocessor (microcontroller), memory and ALU without regard to the case: λcr mp, λ RAM, λ ROM, EPROM, λ EPROM,. λ ALU

IO inside the crystal regions of the microprocessor, RAM, ROM, EPROM, EPROM are determined from the ratio: λcr \u003d С 1 π т π Q π L.

IO ALU and parts of the crystal without memory circuits is determined from the expression:

. λ ALU \u003d λcr mp - λ RAM - λ ROM, PROM - λ EPROM

The IO values \u200b\u200bof other functionally complete parts of the microprocessor (microcontroller) are found in a similar way.

4. Determine the IO drives inside the crystal memory: λ H RAM, λ H ROM, EPROM, λ N EPROM.

Based on the data in Table 3, it is possible to express the percentage of the number of transistors in different functional areas of the memory, assuming that the total number of transistors in the memory is 100%. Table 4 shows this percentage of the transistors included in the crystal memory of different types.

Based on the percentage of the number of transistors included in different functional areas of the memory and the found value of the IO inside the crystal part of the memory, the IO of the functional units is determined.

Table 4. Percentage ratio of transistors

	The quantitative ratio of transistors of the functional areas of the memory (%)
	Storage device	Decoder	Local control unit

ROM, EPROM

λ H RAM \u003d 0.89 * λ RAM;

λ N ROM, EPROM \u003d 0.607 * λ ROM, EPROM;

λ N RPZU \u003d 0.75 * λ RPZU,

where: λ N RAM, λ N ROM, EPROM, λ N RPZU - IO of the drives of RAM, ROM, PROM, EPROM, respectively.

8.3. Calculation of the failure rate of functional units of the memory: decoders, address part, control circuits.

Using data on the ratio of the number of transistors in each part of the memory (Table 4), it is possible to find the failure rates of decoders, the address part and the memory control circuits. Knowing the number of transistors in each part of the memory, you can find the failure rate of a group or individual transistors of the memory.

9. Calculation of the failure rate of functionally complete units of memory microcircuits

The section provides an algorithm for calculating the IO of functionally complete nodes of memory microcircuits. The technique is applicable for memory microcircuits listed in the ASRN.

9.1. Initial data for calculating the failure rate

Below are the initial data required for calculating the IO of functionally complete units of memory microcircuits. By functionally complete units of memory microcircuits, we mean a drive, an address part, a control circuit. The technique also allows calculating the IO of parts of functional units, individual gates, transistors.

Initial data

Memory type: RAM, ROM, EPROM, EPROM;

Information storage capacity;

Organization of RAM;

Manufacturing technology;

Power consumption;

Chip case type;

Number of case pins;

Thermal resistance crystal - case or crystal - environment;

Equipment operation group;

Increased operating ambient temperature;

Workmanship level.

9.2. Algorithm for calculating the failure rate of memory circuits and functionally complete nodes of memory circuits

1, Determine the operational IO of the memory microcircuit (λe p) using the initial data using one of the automated calculation programs: "ASRN", "Asonika-K" or using the "Military HandBook 217F" standard.

2. Determine the values \u200b\u200bof the IO of the crystal of the memory without the case λcr zu.

λкр Зу \u003d С 1 π т π Q π L.

3. Calculation of the storage device IO inside the crystal memory and IO functional units should be carried out in accordance with Section 8.2.

10. Calculation of the failure rate of functionally complete nodes of programmable logic integrated circuits and basic matrix crystals

Each FPGA family consists of a set of microcircuits of the same architecture. The crystal architecture is based on the use of the same functional units of several types. Microcircuits of different standard sizes within the family differ from each other by the type of package and the number of functional units of each type: configurable logic block, I / O block, memory, JTAG, and the like.

It should be noted that in addition to configurable logic blocks and I / O blocks, each FPGA contains a matrix of keys that form connections between FPGA elements. Taking into account the fact that the named areas are evenly distributed throughout the crystal, except for the input / output blocks, which are located on the periphery, we can assume that the matrix of keys is part of the configurable logic blocks and input / output blocks.

To calculate the values \u200b\u200bof the failure rates of functional units, it is necessary to draw up a system of linear equations. The system of equations is compiled for each FPGA family.

Each of the equations of the system is an equality, on the left side of which the value of the IC of the crystal is written for a particular type of microcircuit from the selected family. The right side is the sum of the products of the number of functional nodes n of category i by the IO of these nodes λni.

Below is a general view of such a system of equations.

λ e a \u003d a 1 λ 1 + a 2 λ 2 +… + a n λ n

λ e b \u003d b 1 λ 1 + b 2 λ 2 +… + b n λ n

……………………………

λ e k \u003d k 1 λ 1 + k 2 λ 2 +… + k n λ n

where

λ e a, λ e b, ... λ e k - operational IO microcircuits of the FPGA family (microcircuits a, b, ... k, respectively),

a 1, a 2, ..., a n - the number of functional nodes 1, 2, ... n categories in the microcircuit a, respectively,

b 1, b 2, ..., b n - the number of functional units of categories 1, 2, ... n, in the microcircuit in, respectively,

k 1, k 2, ..., k n is the number of functional units of categories 1, 2, ... n, in the microcircuit k, respectively,

λ 1, λ 2,…, λ n –– IO of functional nodes of categories 1, 2,… n, respectively.

The values \u200b\u200bof operational IO microcircuits λ e a, λ e b, ... λ e k are calculated by the ASRN, the number and type of functional units are given in the technical documentation on the FPGA (Data Sheet or in domestic periodicals).

The values \u200b\u200bof the IO of functional nodes of the FPGA family λ 1, λ 2,…, λ n are found from the solution of the system of equations.

11. Checking the calculation results

The verification of the calculation results for a memory microcircuit is performed by calculating the IO of the crystal of another memory microcircuit using the obtained value of the IO of the memory cell and comparing the obtained value of the IO of the crystal with the value of the IO calculated using standard methods (ASRN, Asonika, etc.).

The verification of the calculation results for FPGAs is performed by calculating the IO of the FPGA crystal of another type from the same family using the found values \u200b\u200bof the IO of the functional units of the FPGA and comparing the obtained value of the IO of the FPGA with the value of the IO calculated using standard methods (ASRN, Asonic, etc.) ...

12. An example of calculating the failure rates of functional units of FPGA and checking the calculation results

12.1. Calculation of IO of functional units and conclusions of FPGA packages

The IO calculation was carried out using the example of FPGAs of the Spartan family, developed by Xilinx.

The Spartan family consists of 5 types of FPGAs, which include a matrix of configurable logic blocks, I / O blocks, boundary scan logic (JTAG).

FPGAs in the Spartan family differ in the number of logic gates, the number of configurable logic blocks, the number of I / O blocks, the types of packages, and the number of pins in the package.

Below is the calculation of the IO of configurable logic blocks, I / O blocks, JTAG for FPGAs XCS 05XL, XCS 10XL, XCS 20XL.

To verify the results obtained, the operational IO of the FPGA XCS 30XL is calculated. The operational IO of the FPGA XCS 30XL is calculated using the values \u200b\u200bof the IO of the functional units of the FPGA XCS 05XL, XCS 10XL, XCS 20XL. The obtained ROI value of the XCS 30XL FPGA is compared with the ROI value calculated using the ACRN. Also, to check the results obtained, the IO values \u200b\u200bof one pin are compared for different FPGA packages.

12.1.1. Calculation of failure rates of functional units of FPGA XCS 05XL, XCS 10XL, XCS 20XL

In accordance with the above calculation algorithm for calculating the IO of FPGA functional units, it is necessary:

Make a list and values \u200b\u200bof the initial data for the FPGA XCS 05XL, XCS 10XL, XCS 20XL, XCS 30XL;

Calculate operational IO FPGAХСS 05XL, ХСS 10XL, ХСS 20XL, ХСS 30XL (calculation is carried out according to using initial data);

Make a system of linear equations for FPGA crystals XCS 05XL, XCS 10XL, XCS 20XL;

Find a solution to a system of linear equations (unknowns in the system of equations are the IO of functional units: configurable logic blocks, input-output blocks, boundary scanning logic);

Compare the values \u200b\u200bof the IO of the crystal FPGA XCS 30XL, obtained in the previous paragraph, with the value of the IO of the crystal, obtained using the ACRN;

Compare output IO values \u200b\u200bfor different packages;

Formulate a conclusion about the fairness of the calculations;

Upon receipt of a satisfactory coincidence of the failure rates (from 10% to 20%), stop the calculations;

If there is a large discrepancy in the calculation results, correct the initial data.

In accordance with The initial data for calculating the operational FPGA IO are: manufacturing technology, number of valves, power consumption, temperature of overheating of the crystal relative to the environment, package type, number of package terminals, thermal resistance of the crystal-package, quality level, equipment operation group in which the FPGA is used ...

All initial data, except for power consumption, crystal overheating temperature and equipment operation group, are given in... The power consumption can be found either in the technical literature, or by calculation or by measurement on the board. The overheating temperature of the crystal relative to the environment is found as the product of the power consumption and thermal resistance crystal-case. The equipment operation group is given in the technical specifications for the equipment.

Initial data for calculating the operational failure rate of FPGAs ХСS 05XL, ХСS 10XL, ХСS 20XL, ХСS 30XL are given in Table 5.

Table 5. Initial data

The original	Type FPGA
The original	XCS 05XL	XCS 10XL	XCS 20XL	XCS 30XL
Technology making
Maximum number of logs iCal valves
Number of configurable logical. blocks, N clb
Number of used inputs / outputs, N inputs / outputs
Type of shell	VQFP	TQFP	PQFP	PQFP
Number of case pins
Thermal resistance crystal - case, 0 С / W
Quality level of workmanship	Commercial
Device operation group

To determine the overheating temperature of the crystal relative to the ambient temperature, it is necessary to find the power consumption for each microcircuit.

In most CMOS integrated circuits, almost all power dissipation is dynamic and is determined by the charge and discharge of the internal and external load capacitors. Each pin in the microcircuit dissipates power according to its own capacitance, which is constant for each type of output, and the frequency at which each pin switches may differ from the clock frequency of the microcircuit. The total dynamic power is the sum of the power dissipated at each pin. Thus, to calculate the power, you need to know the number of elements used in the FPGA. For the Spartan family, the values \u200b\u200bof the current consumption of the input / output units (12mA) are given at a load of 50 pF, supply voltage 3.3 and maximum FPGA operating frequency of 80 MHz. Assuming that the power consumption of the FPGA is determined by the number of switching input / output units (as the most powerful energy consumers), and due to the lack of experimental data on power consumption, we will estimate the power consumed by each FPGA, taking into account that 50% of the input / output units are simultaneously switched at some fixed frequency (when calculating the frequency was chosen 5 times lower than the maximum).

Table 6 shows the values \u200b\u200bof the power consumed by the FPGA and the overheating temperature of the crystals relative to the microcircuit case.

Table 6. Power consumed by FPGA

XCS 05XL

XCS 10XL

XCS 20XL

XCS 30XL

Consumed

power, W

Crystal overheating temperature, 0 С

Let's calculate the values \u200b\u200bof the coefficients in equation (1):

λ e \u003d (С 1 π т + С 2 π E) π Q π L

Coefficients π т, С 2, π E, π Q, π L are calculated according to ASRN. We find the coefficients С 1 using the approximation of the values \u200b\u200bof the coefficient С 1 given in the ASRN for FPGAs of different degrees of integration.

The values \u200b\u200bof the coefficient C 1 for FPGAs are shown in Table 7.

Table 7. Values \u200b\u200bof coefficient С 1

Number of gates in FPGA	Coefficient values \u200b\u200bС 1
Up to 500	0,00085
501 to 1000	0,0017
2001 to 5000	0,0034
5001 to 20,000	0,0068

Then for the maximum number of FPGA gates ХСS 05XL, ХСS 10XL, ХСS 20XL, ХСS 30XL we obtain the values \u200b\u200bof the coefficient С 1, 0.0034, 0.0048, 0.0068, 0.0078, respectively.

Coefficient values π т, С 2, π E, π Q, π L, values \u200b\u200bof IO crystals and packages, as well as operational values \u200b\u200bof IO microcircuitsХСS 05XL, ХСS 10XL, ХСS 20XL, ХСS 30XL are shown in Table 8.

Table 8. Operating values \u200b\u200bof IO FPGA

Designation and name of coefficients	Coefficient values
Designation and name of coefficients	XCS 05XL	XCS 10XL	XCS 20XL	XCS 30XL
π t	0,231	0,225	0,231	0,222
C 2	0,04	0,06	0,089	0,104
π E
π Q
π L
Crystal failure rate,λcr \u003d С 1 π t π Q π L * 10 6 1 / hour	0,0007854	0,0011	0,00157	0,0018
Cortex failure rate,λcorp \u003d С 2 π E π Q π L * 10 6 1 / hour			0,445	0,52
FPGA operational failure rateλe * 10 6 1 / hour	0,2007854	0,3011	0,44657	0,5218

Let us find the values \u200b\u200bof IO of configurable logical blocks λ klb, input / output blocks λ in / out and boundary scan logicλ JTAG for FPGA XCS 05XL, XCS 10XL, XCS 20XL ... For this, we compose a system of linear equations:* S 05 XL - IO of the crystal, the number of configurable logic blocks, the number of input / output blocks for FPGA XCS 05XL, respectively;

λкр ХС S 10 XL, N klb ХС S 10 XL, N I / O ХС S 10 XL - crystal IO, the number of configurable logic blocks, the number of input / output blocks for FPGA ХСS 10XL, respectively;

λкр ХС S 20 XL, N klb ХС S 20 XL, N I / O ХС S 20 XL - IO of the crystal, the number of configurable logic blocks, the number of input / output blocks for FPGA ХСS 20XL, respectively.

Substituting the values \u200b\u200bof IO crystals into the system of equations, the number of configurable logic blocks and input / output blocks, we get: 0.00157 * 10 -6 \u003d 400 * λ clb + 160 * λ input / output + λ JTAG

The system of three linear equations with three unknowns has a unique solution:

λ clb \u003d 5.16 * 10 -13 1 / hour;λ in / out \u003d 7.58 * 10 -12 1 / hour; λ JTAG \u003d 1.498 * 10 -10 1 / hour.

12.1.2. Checking the calculation results

To check the obtained solution, we calculate the IO of the FPGA crystalХС S 30 XL λкр ХС S 30 XL using the found valuesλ clb, λ in / out, λ JTAG.

By analogy with the equations of the systemλcr ХС S 30 XL 1 is equal to:

λкр ХС S 30 XL 1 \u003d λ klb * N klb ХС S 30 XL + λ in / out * N in / out ХС S 30 XL + λ JTAG \u003d

576* 5,16*10 -13 + 192*7,58*10 -12 + 1.498 * 10 -10 \u003d 0.0019 * 10 -6 1 / hour.

Crystal IO value obtained using ACRN is (table 9): 0.0018* 10 -6. The percentage of these values \u200b\u200bis: (λcr XC S 30 XL 1 - λcr XC S 30 XL) * 100% / λcr XC S 30 XL 1 ≈ 5%.

IO of one pin, obtained by dividing the IO by the number of pins in the packages for FPGA XCS 05 XL, XC S 10 XL, XC S 20 XL, XC S 20 XL , are equal to 0.002 * 10 -6, 0.00208 * 10 -6, 0.0021 * 10 -6, 0.0021 * 10 -6, respectively, i.e. differ by no more than 5%.

The difference in the IO values \u200b\u200bof about 5% is determined, probably, by the approximate values \u200b\u200bof the dissipation powers adopted in the calculation, and, as a consequence, by inaccurate values \u200b\u200bof the coefficientsπ t, as well as the presence of unaccounted FPGA elements, information about which is missing in the documentation.

The appendix contains a block diagram for calculating and checking the failure rates of the functional areas of the FPGA.

13. Conclusions

1. A method for evaluating the IO of functional units of integrated circuits is proposed.

2. It allows you to calculate:

a) for memory circuits - IO of storage devices, memory cells, decoders, control circuits;

b) for microprocessors and microcontrollers - IO of memory devices, registers, ADC, DAC and functional blocks built on their basis;

c) for programmable logic integrated circuits - IO, blocks of various functional purposes included in them - configurable logic blocks, input / output blocks, memory cells, JTAG and functional blocks built on their basis.

3. A method for checking the calculated values \u200b\u200bof the IO of functional units is proposed.

4. The application of the methodology for checking the calculated values \u200b\u200bof the IO of the functional units of integrated circuits has shown the adequacy of the proposed approach for evaluating the IO.

application

Block diagram for calculating the failure rate of functional units of FPGA

Literature

Porter D.C, Finke W.A. Reability characterization an prediction of IC. PADS-TR-70, p.232.

Military Handbook 217F. “Reability prediction of electronic equipment”. Department of Defense, Washington, DC 20301.

“Automated system calculation of reliability ", developed by 22TSNII MO RF with the participation of RNII" Electronstandard "and JSC" StandartElectro ", 2006.

“Semiconductor memory devices and their application”, VP Andreev, VV Baranov, NV Bekin and others; Edited by Gordonov. M. Radio and communication. 1981.-344p.

Prospects for the development of computer technology: V. 11 kn .: Ref. manual / Edited by YM Smirnov. Book. 7: “Semiconductor memory devices”, A.B.Akinfiev, V.I.Mirontsev, G.D.Sofiyskiy, V.V. Tsyrkin. - M .: Higher. shk. 1989 .-- 160 p .: ill.

“Circuitry of LSI read-only memory devices”, O. Petrosyan, I. Ya. Kozyr, L. A. Koledov, Yu. I. Shchetinin. - M .; Radio and communication, 1987, 304 p.

"Reliability of operative storage devices", computer, Leningrad, Energoizdat, 1987, 168 p.

TIER, vol. 75, issue 9, 1987.

Xilinx. The Programmable Logic. Date Book, 2008g. http: www.xilinx.com.

“Sector of electronic components”, Russia-2002-M .: Publishing house “Dodeka-XXI”, 2002.

DS00049R-page 61  2001 Microchip Technology Inc.

TMS320VC5416 Fixed-Point Digital Signal Processor, Data Manual, Literature Number SPRS095K.

Company CD-ROM Integrated Device Technology.

CD-ROM from Holtec Semiconductor.

When considering the laws of distribution of failures, it was found that the failure rates of elements can be either constant or change depending on the time of operation. For long-term systems, which include all transportation systems, preventive maintenance is envisaged, which practically eliminates the effect of wear failures, so only sudden failures occur.

This greatly simplifies the calculation of reliability. However, complex systems consist of many elements connected in different ways... When the system is in operation, some of its elements work continuously, others - only at certain intervals, and still others - perform only short turn-on or connection operations. Consequently, during a given period of time, only some of the elements have the same operating time as the operating time of the system, while others work for a shorter time.

In this case, to calculate the operating time of a given system, only the time during which the element is turned on is considered; such an approach is possible if it is assumed that during periods when the elements are not included in the operation of the system, their failure rate is zero.

From the point of view of reliability, the most common scheme of serial connection of elements. In this case, the calculation uses the rule of the product of reliability:

where R (t i) - reliability i-th element that turns on t i hours of the total system uptime t h.

For calculations, the so-called

employment rate equal to

that is, the ratio of the operating time of the element to the operating time of the system. The practical meaning of this coefficient is that for an element with a known failure rate, the failure rate in the system, taking into account the operating time, will be equal to

The same approach can be used in relation to individual system nodes.

Another factor to consider when analyzing system reliability is the level of workload with which elements operate in the system, as it largely determines the magnitude of the expected failure rate.

The failure rate of elements changes significantly even with small changes in the workload acting on them.

In this case, the main difficulty in the calculation is caused by a variety of factors that determine both the concept of element strength and the concept of load.

The strength of an element combines its resistance to mechanical stress, vibration, pressure, acceleration, etc. The category of strength also includes resistance to thermal stress, electrical strength, moisture resistance, corrosion resistance and a number of other properties. Therefore, strength cannot be expressed in some numerical value and there are no strength units that take into account all these factors. The manifestations of load are also manifold. Therefore, to assess the strength and load, statistical methods are used, with the help of which the observed effect of element failure in time is determined under the action of a number of loads or under the action of a predominant load.

The elements are designed to withstand the rated loads. During the operation of the elements under the conditions of rated loads, a certain regularity of the intensity of their sudden failures is observed. This rate is called the nominal sudden failure rate of the elements, and it is the starting point for determining the actual sudden failure rate of the real element (taking into account the operating time and workload).

For a real element or system, three main environmental influences are currently considered: mechanical, thermal and workloads.

The influence of mechanical influences is taken into account by a coefficient, the value of which is determined by the place of installation of the equipment, and can be taken equal to:

for laboratories and comfortable rooms - 1

, stationary ground installations - 10

, railway rolling stock - 30.

Rated sudden failure rate selected by

tab. 3, should be increased in times depending on the place of installation of the device in operation.

Curves in Fig. 7 illustrate the general nature of the change in the intensity of sudden failures of electrical and electronic components depending on the heating temperature and the magnitude of the workload.

The intensity of sudden failures with an increase in the workload, as can be seen from the curves above, increases according to the logarithmic law. These curves also show how you can reduce the rate of sudden failures of elements even to a value below the nominal value. A significant reduction in the rate of sudden failures is achieved if the elements operate at loads below their rated values.

Figure: 16

Figure: 7 can be used when carrying out approximate (educational) calculations of the reliability of any electrical and electronic elements. The nominal mode in this case corresponds to a temperature of 80 ° C and 100% of the working load.

If the calculated parameters of the element differ from the nominal values, then according to the curves in Fig. 7, the increase for the selected parameters can be determined and the ratio by which the value of the failure rate of the element in question is multiplied.

High reliability can be incorporated in the design of elements and systems. For this, it is necessary to strive to reduce the temperature of the elements during operation and to use elements with increased nominal parameters, which is tantamount to a decrease in working loads.

The increase in the cost of manufacturing a product in any case pays off by reducing operating costs.

Failure rates for electrical circuit elements
drink depending on the load can be defined as follows
the same by empirical formulas. In particular, depending
on operating voltage and temperature

Table value at rated voltage and temperature t i.

- failure rate at operating voltage U 2 and temperature t 2.

It is assumed that mechanical stress remain at the same level. Depending on the type and type of elements, the value p,changes from 4 to 10, and the value TOwithin 1.02 1.15.

When determining the real failure rate of elements, it is necessary to have a good understanding of the expected load levels at which the elements will operate, to calculate the values \u200b\u200bof electrical and thermal parameters taking into account transient modes. Correct identification of loads acting on individual elements leads to a significant increase in the accuracy of reliability calculations.

When calculating reliability taking into account wear failures, the operating condition must also be taken into account. Durability values M,given in table. 3, as well as refer to the nominal load and laboratory conditions. All elements operating under different conditions have a durability that differs from noah by an amount TO The quantity TOcan be taken equal to:

for laboratory - 1.0

, ground installations - 0.3

, railway rolling stock - 0.17

Small coefficient fluctuations TOare possible for equipment for various purposes.

To determine the expected durability Mit is necessary to multiply the average (nominal) durability, determined from the table, by the coefficient K.

In the absence of the materials necessary to determine the failure rate depending on the load levels, the coefficient method for calculating the failure rate can be used.

The essence of the coefficient calculation method is reduced to the fact that when calculating the reliability criteria of equipment, coefficients are used that connect the failure rate of elements different types with the failure rate of an element whose reliability characteristics are reliably known.

It is assumed that the exponential law of reliability is valid, and the failure rates of elements of all types vary depending on the operating conditions to the same extent. The last assumption means that under different operating conditions the ratio

Failure rate of an element whose quantitative characteristics are known;

Reliability factor i-th element. An element with a failure rate of ^ 0 is called the main element of the system's calculation. When calculating the coefficients K ithe main element of the calculation of the system is the wire-wired resistance. In this case, to calculate the reliability of the system, it is not required to know the failure rate of elements of all types. It is enough to know only the reliability factors K i, the number of elements in the circuit and the failure rate of the main element of the calculation Since K i has a scatter of values, then the reliability is checked for both TO min and for TO swing. The values K i,determined on the basis of analysis of data on failure rates for equipment for various purposes are given in table. 5.

Table 5

The failure rate of the main calculation element (in this case, resistance) should be determined as the weighted average of the failure rates of the resistances used in the designed system, i.e.

AND N R- failure rate and number of resistances i-th type and denomination;

t- the number of types and ratings of resistances.

It is desirable to construct the resulting dependence of the system reliability on the operating time as for the values TO min , so for TO swing

Having information about the reliability of individual elements included in the system, it is possible to give a general assessment of the reliability of the system and determine the blocks and units that require further refinement. For this, the system under study is divided into nodes according to a constructive or semantic criterion (a structural diagram is drawn up). Reliability is determined for each selected unit (units with lower reliability require revision and improvement in the first place).

When comparing the reliability of nodes, and even more so of different variants of systems, it should be remembered that the absolute value of reliability does not reflect the behavior of the system in operation and its efficiency. The same value of system reliability can be achieved in one case due to the main elements, the repair and replacement of which requires considerable time and large material costs (for an electric locomotive, removal from train operation), in another case, these are small elements, which are replaced by the operator. by personnel without removing the machine from work. Therefore for comparative analysis of the designed systems, it is recommended to compare the reliability of elements that are similar in their significance and the consequences arising from their failures.

For approximate reliability calculations, you can use the data of operating experience of similar systems. which to some extent takes into account the operating conditions. The calculation in this case can be carried out in two ways: by the average level of reliability of the same type of equipment or by the conversion factor to real operating conditions.

The calculation for the average level of reliability is based on the assumption that the designed equipment and the operated sample are equal. This can be allowed with the same elements, similar systems and the same ratio of elements in the system.

The essence of the method is that

And - the number of elements and the MTBF of the equipment - sample;

And - the same for the designed equipment. From this ratio, it is easy to determine the MTBF for the designed equipment:

The advantage of the method is its simplicity. Disadvantages - the absence, as a rule, of a sample of the operating equipment suitable for comparison with the designed device.

The calculation by the second method is based on the determination of the conversion factor, which takes into account the operating conditions of similar equipment. To determine it, a similar system is selected that is operated under specified conditions. Other requirements may not be met. For the selected operating system, reliability indicators are determined using the data in Table. 3, the same performance data are determined separately.

The conversion factor is defined as the ratio

- MTBF according to operation data;

T oz- MTBF by calculation.

For the designed equipment, the reliability indicators are calculated using the same tabular data as for the operated system. Then the results obtained are multiplied by To e.

Coefficient To etakes into account the real operating conditions, - preventive repairs and their quality, replacement of parts between repairs, the qualifications of the maintenance personnel, the condition of the depot equipment, etc., which cannot be foreseen with other calculation methods. The values To ethere may be more than one.

Any of the considered calculation methods can be performed for a given reliability, that is, by the opposite method - from the reliability of the system and the MTBF to the choice of indicators of the constituent elements.

Annotation: Two types of means of maintaining high availability are considered: ensuring fault tolerance (neutralization of failures, survivability) and ensuring safe and fast recovery after failures (serviceability).

Availability

Basic concepts

Service efficiency... Service efficiency is defined in terms of the maximum request service time, the number of supported users, and the like. It is required that the efficiency does not fall below a predetermined threshold.
Time of unavailability... If the efficiency of the information service does not meet the imposed restrictions, the service is considered unavailable. It is required that the maximum duration of the unavailable period and the total time of unavailability for a certain period (month, year) did not exceed predetermined limits.

In essence, the information system is required to almost always operate with the desired efficiency. For some critical systems (e.g. control systems) time of unavailability should be zero, without any "almost". In this case, they talk about the likelihood of a situation of unavailability and require that this probability does not exceed a given value. To solve this problem, special fault-tolerant systems, the cost of which is usually very high.

Less stringent requirements are imposed on the vast majority of commercial systems, however, modern business life here imposes rather severe restrictions too, when the number of served users can be measured in thousands, the response time should not exceed a few seconds, and time of unavailability - several hours a year.

The task of providing high availability needs to be addressed for modern configurations built in client / server technology. This means that the entire chain needs protection - from users (possibly remote ones) to critical servers (including security servers).

The main threats to accessibility were discussed by us earlier.

In accordance with GOST 27.002, under rejection an event is understood that consists in a malfunction of the product. In the context of this work, a product is an information system or its component.

In the simplest case, we can assume that failures of any component of a composite product lead to general failure, and the distribution of failures over time is a simple Poisson stream of events. In this case, the concept is introduced failure rates and, which are related to each other by the ratio

where is the component number,

– failure rate,

– .

Failure rates independent components add up:

a mean time between failures for a composite product is given by the ratio

Even these simple calculations show that if there is a component, failure rate which is much more than the rest, then it is he who determines mean time between failures the entire information system. This is the theoretical basis for the principle of strengthening the weakest link first.

The Poisson model makes it possible to substantiate another very important position, which is that the empirical approach to the construction of systems high availability cannot be completed in a reasonable time. In a traditional cycle of testing / debugging a software system, according to optimistic estimates, each error correction leads to an exponential decrease (by about half the decimal order) failure rates... It follows from this that in order to make sure by experience that the required level of availability is achieved, regardless of the testing and debugging technology used, you will have to spend time almost equal to mean time between failures... For example, to achieve mean time between failures 10 5 hours will take more than 10 4.5 hours, which is more than three years. This means that other methods of building systems are needed high availability, methods, the effectiveness of which has been proven analytically or practically for more than fifty years of development of computer technology and programming.

In accordance with the formulation of the problem, we will assume that there is a quantitative measure of the effectiveness of information services provided by the product. In this case, the concepts are introduced performance indicators individual elements and the efficiency of the entire complex system.

If there is redundancy in the system configuration, the probability that in the considered time interval efficiency of information services does not fall below the permissible limit, depends not only on the probability of component failure, but also on the time during which they remain inoperative, since in this case the total efficiency decreases, and each subsequent failure can become fatal. To maximize system availability, the downtime of each component must be minimized. In addition, it should be borne in mind that, generally speaking, repair work may require a decrease in efficiency or even a temporary shutdown of operable components; this kind of influence also needs to be minimized.

A few terminological notes. Usually in the literature on reliability theory instead of accessibility they talk about readiness (including high availability). We have chosen the term "accessibility" to emphasize that information service should not only be "ready" by itself, but be available to its users in conditions when situations of inaccessibility can be caused by reasons that, at first glance, have no direct relation to the service (for example, lack of consulting services).

Further, instead of time of unavailability usually talk about availability... We wanted to draw attention to two indicators - the duration of a single downtime and the total duration of downtime, so we preferred the term " time of unavailability "as more capacious.

High Availability Basics

Accessibility measures are based on a structured approach that is embodied in an object-oriented methodology. Structuring is necessary in relation to all aspects and constituent parts of the information system - from architecture to administrative databases, at all stages of its life cycle - from initiation to decommissioning. Structuring, while important in itself, is also a prerequisite for the feasibility of other accessibility measures. Only small systems can be built and operated as desired. Large systems have their own laws, which, as we have already indicated, programmers first realized more than 30 years ago.

When developing measures to ensure high availability

Type