MMAPs to Model Complex Multi-State Systems with Vacation Policies in the Repair Facility

Juan Eloy Ruiz-Castro* and Christian Acal

Department of Statistics and Operations Research and Mathematics Institute IMAG. University of Granada, Granada, Spain
E-mail: jeloy@ugr.es; chracal@ugr.es
*Corresponding Author

Received 05 February 2022; Accepted 01 May 2022; Publication 22 July 2022

Abstract

Two complex multi-state systems subject to multiple events are built in an algorithmic and computational way by considering phase-type distributions and Markovian arrival processes with marked arrivals. The internal performance of the system is composed of different degradation levels and internal repairable and non-repairable failures can occur. Also, the system is subject to external shocks that may provoke repairable or non-repairable failure. A multiple vacation policy is introduced in the system for the repairperson. Preventive maintenance is included in the system to improve the behaviour. Two types of task may be performed by the repairperson; corrective repair and preventive maintenance. The systems are modelled, the transient and stationary distributions are built and different performance measures are calculated in a matrix-algorithmic form. Cost and rewards are included in the model in a vector matrix way. Several economic measures are worked out and the net reward per unit of time is used to optimize the system. A numerical example shows that the system can be optimized according to the existence of preventive maintenance and the distribution of vacation time. The results have been implemented computationally with Matlab and R (packages: expm, optim).

Keywords: Phase-type distribution (PH), Marked Markovian arrival process (MMAP), vacation policy, preventive maintenance.

1 Introduction

The occurrence of repairable and non-repairable failures in a reliability system can provoke severe damage and major financial costs. To avoid such an outcome, several methodologies are considered, such as redundant systems and preventive maintenance.

Preventive maintenance (PM) is the maintenance methodology of systems in order to keep them running and prevent any costly unplanned downtime. A successful maintenance policy requires planning and scheduling maintenance of system before a failure takes place. In this respect, several preventive maintenance policies have been proposed in the reliability field. Barlow and Hunter (1960) considered two types of preventive maintenance policies to optimize a system depending on the failure distribution. Multiple preventive maintenance policies were given in detail in Nakagawa (1977, 2005) and Finkelstein et al. (2020) developed a new model for the hybrid preventive maintenance of systems with partially observable degradation. Recently, other strategies to optimize a reliability system are given in Shi et al. (2022). In this work an advanced estimation strategy is proposed, in which only one surrogate model is built, being able to estimate the failure probabilities of different performance functions.

Nowadays binary systems have been extended by multi-state systems (MSS). Complex systems that have a finite number of performance levels and various failure modes, each producing different effects on system performance, are termed multi-state systems. Murchland (1975) discussed this concept, which has since been developed extensively. One of the main problems when complex multi-state systems are modelled is that intractable expressions appear in the modelling and in the performance functions. This fact makes difficult the algorithmization and interpretation of results. One possible solution is based on two elements, phase-type distributions (PH) and Markovian arrival processes (MAP), which enable to express complex systems in an algorithmic and computational way. PH were introduced by Neuts (1975) and studied in detail in Neuts (1981). PH has been considered in multiple fields such as queuing theory, physics, reliability and survival. In the physics field PH has been considered to model the behaviour of resistive memories RRAM in Acal et al. (2019). They have also been considered in survival to study the evolution of several illnesses such as cancer (Pérez-Ocón et al, 1998; Ruiz-Castro and Zenga, 2020). The modelling with PH in reliability is extensive. A transient analysis of a multi-state system was modelled by using PH in Pérez-Ocón et al. (2006). One of the main properties of PH is that it is dense in the non-negative probability distributions set. Therefore, any non-negative probability distribution can be approximated so much as it is desirable through a PH.

MAP is a counting process in which PH distributions play an important role. This process was given by Neuts (1979) and reviewed by Artalejo et al. (2010) and He (2014). A special case is that of the MAP with marked arrivals (MMAP), which enables us to count different types of arrival. MMAPs are developed in a compressive form in He (2014). This markovian structure, analogously to PH, enables to count event in an algorithmic way. Multiple examples in several fields are proposed in He and Neuts (1998) and it has also been considered in the modelling of reliability discrete systems (Ruiz-Castro, 2018, 2020).

In complex multi-state systems is usual to consider either repairing immediately after repairable failure or immediate replacement when a non-repairable occurs. However, this might not be the case in a real scenario. For example, a failed unit might not be repaired immediately in a small or medium-sized firm that cannot afford to employ a full-time repairperson. The permanent service facility may increase cost, idleness, deterioration in quality. To reduce the wastage of valuable resources like time, money, quality, etc., vacation is a prominent idea for the service facility. Instead of remaining idle during this period, the repairperson may take a ‘vacation’ and/or use the time to do other work, thus optimising resources and reducing costs. A repairperson is on vacation when absent from the repair facility, whether or not it is empty. The economic implications of this situation should be considered, taking into account that the vacation policy applied might impact both on performance and also on economic rewards/costs.

Multiple vacation policies have been considered in queuing theory and reliability. A comparative study of different vacation policies on the reliability characteristics is presented in Shekhar et al. (2020). A Markovian queuing model with a vacation policy in the repair facility where the vacation period follows negative exponential is developed in Kalyanaraman and Sundaramoorthy (2019).

Multi-state Markov systems with vacation periods have also been considered. In Zhang et al. (2017) a k-out-of-n system with a single repairperson, assuming a phase-type distribution for the vacation time and an exponential distribution for the lifetime of the units is modelled.

In this paper, two complex multi-state unit systems subject to multiple events, such as internal and external repairable and non-repairable failures are modelled. The internal performance of the system is partitioned into several levels of degradation. A vacation policy is introduced by considering the internal degradation levels. The first system is extended to include preventive maintenance and the corresponding vacation policy. The repairperson performs two different tasks, corrective repair and preventive maintenance. The corrective and preventive time distributions can be different for both cases. The system is modelled by using PHs and MMAPs in an algorithmic and computational way. Multiple measures such as availability, reliability function, rate of occurrence of failure (ROCOF) and mean number of events are worked out. The transient and stationary distributions are calculated in a matrix-algorithmic form. Costs and rewards, depending on the internal degradation levels, are included. Everything is algorithmically and computationally modelled and has been applied to compare and optimize two similar systems with and without preventive maintenance. The results have been implemented computationally with Matlab and R-cran.

The paper is organized as follows. Both systems are described in Section 2. These systems are modelled in Section 3. Measures and costs/profits are developed in Sections 4 and 5, respectively. In Section 6 a numerical application is given where optimization and comparison are shown. Conclusions are given in Section 7. Finally two Appendices show the algorithm of the models in detail.

2 The Systems

Two different multi-state one-unit systems are modelled by considering Markovian Arrival Processes with marked arrivals, with and without preventive maintenance. Both systems are subject to internal degradation and external shocks. Repairable and non-repairable failures, depending on the internal degradation state, can occur in both cases. The systems can be observed only by the repairperson and this one is not always at his workplace. A policy of multiple vacation periods for the repairperson is given.

2.1 The System Without Preventive Maintenance

The internal behaviour of the unit consists of two different levels, minor degradation and moderate degradation. The number of states for the first and second level is n1 and n2 respectively. The unit can suffer repairable and non-repairable failures from both degradation levels. The plug in which unit is connected can undergo external shocks that can also provoke repairable or non-repairable failures. The system is composed of one repairperson. This repairperson can take multiple vacation periods depending on the unit degradation level. Thus, the repairperson is on vacation initially. When this one returns, if the system is in minor degradation level then a new random vacation period will be started by the repairperson. Otherwise this subject stays at the workplace waiting for a possible repairable failure. If the repairperson is on vacations and a repairable failure occurs, it remains in repairable failure macro-state till the repairperson returns and then he begins the repair. Analogously, if the repairperson is on vacations and a non-repairable failure occurs then it will be replaced by an identical unit when the repairperson returns. On the contrary, if the repairperson is at the workplace and a repairable or non-repairable occurs, then the repair starts immediately or it is replaced in a negligible time respectively. All random times embedded in the model go through different states until the event occurs.

These times embedded in the system verify the following assumptions.

Assumption 1. The internal operational time follows a PH distribution with representation (α,T) with order n. The n phases are partitioned into two macro-states, minor degradation (first n1 phases) and moderate degradation level (remaining phases, n2). The PH representation is composed of matrix blocks according to the levels. Then, α1 is a row vector composed of the first n1 elements of α. The matrix T is given by

T=(T11T120T22).

The order of T11 and T22 is n1 and n2 respectively.

The column vector T=0-Te contains the failure rates from the different operational phases. Throughout this paper e is a column vector of ones with appropriate order and A=0-Ae for any matrix A.

The column vector is expressed as T0=Tr0+Tnr0 where Tr0 and Tnr0 are column vectors which contain the repairable and non-repairable failure rates from the operational phases, respectively. These vectors are partitioned according to the degradation levels as Ti,r0 and Ti,nr0 for i=1,2.

Assumption 2. The external shock is modelled through a PH renewal process where the time between two consecutive shocks is PH distributed with representation (γ,L). The order of this representation is p. The vector L0 contains the transition intensities up to external shock rate depending on the phases of external shock time. This vector is partitioned as L0=Lr0+Lnr0 where Lr0 and Lnr0 are column vectors which contain the repairable and non-repairable external shock rates, respectively.

Assumption 3. The vacation time follows a PH distribution with representation (υ,V), being V a matrix of order v.

Assumption 4. The correction repair time follows a PH distribution with representation (β1,S1), with S1 being a matrix of order m1.

Therefore, the behaviour of the system can be partitioned into six macro-states of the state-space S,

S ={E1=O1,E2=O2WR,E3=O2R,E4=RFWR,
E5=NRFWR,E6=CR}.

These macro-states contain the phases with the following situations:

E1=O1: The unit is working in minor internal degradation.

E1=O1={(i,j,k);i=1,,n1,j=1,,p,k=1,,ν}

i: phase of the minor internal degradation level

j: phase of the external shock time

k: phase of the vacation time

E2=O2WR: The unit is working in middle internal degradation with the repairperson on vacation. The superscript WR indicates “without repairperson” in the repair facility,

E2=O2WR={(i,j,k);i=1,,n2,j=1,,p,k=1,,ν}

i: phase of the middle internal degradation level

j: phase of the external shock time

k: phase of the vacation time

E3=O2R: The unit is working in middle internal degradation with the repairperson on the workplace. The superscript R indicates that the “repairperson” is in the repair facility,

E3=O2R={(i,j);i=1,,n2,j=1,,p}

i: phase of the middle internal degradation level

j: phase of the external shock time

E4=RFWR: The unit is broken with repairable failure and the repairperson is on vacation. RF indicates that the system is in “repairable failure” and the superscript indicates “without repairperson” in the repair facility,

E4=RFWR={(j,k);j=1,,p,k=1,,ν}

j: phase of the external shock time

k: phase of the vacation time

E5=NRFWR: The unit is broken with non-repairable failure and the repairperson is on vacation. NRFindicates that the unit is in “non-repairable failure” and the superscript indicates “without repairperson” in the repair facility

E5=NRFWR={(j,k);j=1,,p;k=1,,ν}

j: phase of the external shock time

k: phase of the vacation time

E6=CR: The unit is on “corrective repair” with the repairperson

E6=CR={(j,l);j=1,,p;l=1,,m1}

j: phase of the external shock time

l: phase of the corrective repair

The system will be operational while it occupies a state of the macro-state W={E1=O1,E2=O2WR,E3=O2R} and it will be non-operational when it is found in some macro-state of F={E4=RFWR,E5=NRFWR,E6=CR}.

2.2 The System with Preventive Maintenance

The system described in section above is extending by including preventive maintenance. In this case we assume that the internal behaviour is composed of three different levels, i.e., minor, middle and major degradation. The number of states is n1, n2 and n3 for these levels respectively. External shocks with similar consequences are also included in this model. The vacation time policy is different for this system with preventive maintenance. The repairperson is also on vacation initially. When this one returns, the repairperson can observe five different situations instead of four.

• Minor internal degradation level. The repairperson begins a new random vacation period.

• Middle internal degradation. The repairperson stays at the workplace waiting for a possible repairable failure.

• Major internal damage. The repairperson starts the preventive maintenance.

• Repairable failure. The repairperson begins the corrective repair.

• Non-repairable failure. The repairperson replaces the unit by a new and identical one in a negligible time.

Other possibility is that a repairable failure or a non-repairable failure occurs while the repairperson is at the workplace without working. In this case, the corrective repair begins or the unit is replaced immediately after occurring the event, respectively.

When preventive maintenance is considered, the times embedded in the system verify the following assumptions.

Assumption 1. The internal operational time follows a PH distribution with representation (α,T) with order n. The n phases are partitioned into three macro-states, minor degradation level (first n1 phases), middle degradation level (the following first n2 phases) and major degradation level (last n3 phases). The PH representation is composed of matrix blocks according to the levels. Then, α1 is a row vector composed of the first n1 elements of α. The matrix T is given by

T=(T11T12T130T22T2300T33).

The order of T11, T22 and T33 is n1, n2 and n3, respectively.

The column vector T0 is expressed as T0=Tr0+Tnr0 again. These vectors are partitioned according to the degradation levels as Ti,r0 and Ti,nr0 for i=1,2,3.

Assumption 2. The same assumption 2 that the one given for the system without preventive maintenance.

Assumption 3. The same assumption 3 that the one given for the system without preventive maintenance.

Assumption 4. The same assumption 4 that the one given for the system without preventive maintenance.

Assumption 5. The preventive maintenance time follows a PH distribution with representation (β2,S2), with S2 being a matrix of order m2.

When preventive maintenance is included, eight macro-states are possible. The macro-state space for this layout is the following,

S ={E1=O1,E2=O2WR,E3=O2R,E4=O3WR,
E5=RFWR,E6=NRFWR,E7=PM,E8=CR}.

The macro-states E1, E2 and E3 are the same as for the case without preventive maintenance. The macro-states E5, E6 and E8 defined in the current system are the macro-states E4, E5 and E6, respectively. Then the new macro-states for this system are E4=O3WR, and E7=PM.

These new macro-states contain the phases with the following situations:

E4=O3WR: The unit is working in major internal degradation. The superscript indicates “without repairperson” in the repair facility,

E4=O3WR={(i,j,k);i=1,,n3,j=1,,p,k=1,,ν}

i: phase of the major internal degradation level

j: phase of the external shock time

k: phase of the vacation time

E7=PM: The unit is in “preventive maintenance”.

E7=PM={(j,l);j=1,,p;l=1,,m2}

j: phase of the external shock time

l: phase of the preventive maintenance time

For the system with preventive maintenance, the operational macro-state is W={E1=O1,E2=O2WR,E3=O2R,E4=O3WR} and the non-operational is given by F={E5=RFWR,E6=NRFWR,E7=PM,E8=CR}.

3 Modelling the Systems Through Marked Markovian Arrival Processes

The systems described in Section 2 are modelled through Markovian Arrival Processes with marked arrivals. These models enable us not only to analyse the system evolution, but also the number of different events can be worked out over time. The model for the system with preventive maintenance is developed in this work. The case for without preventive maintenance is given in Appendix A.

The multi-state unit may undergo the following types of events which are denoted as,

O: No events (no PM, no failure, no end of vacation)

RF+CR: Repairable failure and start of corrective repair (the repairperson was at the workplace)

RF: Only repairable failure (the repairperson continues on vacations)

PM: preventive maintenance (the repairperson continues on vacations)

NRF+NU: Non-repairable failure and immediate replacement (immediate new unit because the repairperson was at the workplace)

NRF: Only non-repairable failure (the repairperson continues on vacations)

I: Only return from vacations

I+PM: Return from vacations and start preventive maintenance (the unit is at major degradation level)

I+CR: Return from vacations and start corrective repair (the unit was in RF)

I+NU: Return from vacations and immediate replacement (immediate new unit because the repairperson was at the workplace)

3.1 The MMAP

The MMAP associated to the system has been built according to the different events aforementioned. The representation is given by

(D0,DRF+CR,DRF,DNRF+NU,DNRF,DI,DI+CR,DI+NU),

where DY contains the transition intensities for the event Y. These matrices are composed of matrix blocks. Each matrix block contains the transition intensities for the event Y by considering the macro-states of the state-space S.

The block matrices for the events RF and RF+CR are described next. The remainder are given in Appendix B.

Matrix Block DRF

The matrix block DRF contains the transitions intensities from an operational state to a repairable failure (without other event). Therefore, it is only possible for the transitions between the macro-states O1RF, O2WRRF or O3WRRF. The block DRF is given by,

DRF=(0000CO1RF0000000CO2WRRF000000000000000CO3WRRF00000000000000000000000000000000000).

The matrix CO1RF contains the transition between the macro-states O1RF. It occurs when an internal repairable failure takes place and the external shock and the vacation times do not change (T1,r0II), or because an external shock occurs by provoking a repairable failure (eLr0γI). In the last case, the internal damage finishes and the vacation time is not altered. Then,

CO1RF=T1,r0II+eLr0γI.

The matrix CO2WRRF contains the transition between the macro-states O2WRRF. It occurs when an internal repairable failure takes place from middle degradation level without repair and the external shock and the vacation time remain identical (T2,r0II), or because an external shock occurs by provoking a repairable failure (eLr0γI). Then,

CO2WRRFWR=T2,r0II+eLr0γI.

Finally, the matrix CO3WRRF contains the transition between the macro-states O3WRRF. The reasoning is similar as for the case above but from the macro-state in major degradation level without repair. It is given by,

CO3WRRFWR=T3,r0II+eLr0γI.

Matrix Block DRF+CR

The matrix block DRF+CR contains the transitions intensities from an operational state to repairable failure (with immediate corrective repair). Therefore, it is only possible for the transitions between the macro-states O2RRF because the repairperson must be at the workplace. This matrix block is

DRF+CR=(00000000000000000000000CO2RCR0000000000000000000000000000000000000000).

The matrix CO2RRF contains the transition between the macro-states O2RRF. The repairperson is at the workplace and a repairable failure occurs (internal or external) from middle degradation level. After the repairable repair occurs, a corrective repair begins given that the repairperson is prepared for that. The remainder does not change. Then,

CO2RCR=T2,r0Iβ1+eLr0γβ1.

3.2 Transient Distribution

The system is modelled by the MMAP given in Section 3.1. Therefore, the Q-matrix associated with the Markov process by which the system is governed adopts the expression

D =D0+DRF+CR+DRF+DNRF+NU+DNRF
+DI+DI+CR+DI+NU.

We assume that the system is new and the repairperson in on vacation initially. Therefore, the initial distribution for the system with preventive maintenance is given by θ=(αωυ,0) respectively, with ω being the stationary distribution of the external failure. This fact is assumed because external shocks happen in a continuous way. Therefore, ω=(0,1)((L+L0γ)*|e)-1.

The transient distribution probability is worked out from P(t)=exp(Dt) and the probability of being at any phase of the macro-state Ei, that is, pEi(t), is given by the vector p(t)=θexp(Dt) restricted to the elements of the macro-state Ei.

3.3 The Stationary Distribution

The stationary distribution is denoted as π and it is partitioned according to the macro-state space S. Therefore, it is denoted as πi to the vector πi=limtpEi(t). To ease the development, the generator of the process is denoted as

D=(D11D120D14D15D16000D22D23D24D25D2600D310D33000D37D38000D44D45D46D4700000D5500D58D610000D6600D7100000D770D81000000D88).

As it is well known, the stationary distribution is the solution of the matrix balance equation πD=0 with the normalization condition πe=1.

In a matrix way, the balance equations are given by

π1D11+π3D31+π6D61+π7D71+π8D81=0π1D12+π2D22=0π2D23+π3D33=0π1D14+π2D24+π4D44=0π1D15+π2D25+π4D45+π5D55=0π1D16+π2D26+π4D41+π6D66=0π3D37+π4D47+π7D77=0π3D38+π5D58+π8D88=0πe=1

The solution of this matrix system is given by

πi=π1Ri;i=2,,8

with

R2 =G12
R3 =R2G23
R4 =G14+R2G24
R5 =G15+R2G25+R4G45
R6 =G16+R2G26+R4G46
R7 =R3G37+R4G47
R8 =R3G38+R5G58.

Being Gjk=-DjkDkk-1 for the corresponding case.

The π1 is achieved from the first and last matrix equation,

π1D11+π3D31+π6D61+π7D71+π8D81=0πe=1

Then,

π1=(0,1)(A*|(I+a=28Ra)e)-1,

with A* being the matrix A without the first column and A=D11+R3D31+R6D61+R7D71+R8D81.

4 Measures

Several interesting measures in the reliability field such as ROCOF, availability, reliability and several mean number of events are worked out in this section.

4.1 Availability

The availability is the probability of being operational the system at a certain time. It is given by

A(t)=i=14pEi(t)e.

For the stationary case it is A=i=14πie.

4.2 Reliability

Several reliability functions may be defined for this system (time up to repairable failure, time up to non-repairable failure or time up to first case that the system in not operational). We define it as the first time that the unit is not operational. The probability distribution of this time is PH with representation (θ,D) being

θ=(αωυ,0);
D=(CO1O11+CO1O12CO1O2WR0CO1O3WR0CO2WRO2WRCO2WRO2RCO2WRO3WR00CO2RO2R0000CO3WRO3WR).

The reliability function is given by R(t)=θexp(Dt)e.

4.3 ROCOFRF and ROCOFNRF (Rate of Occurrence of Repairable and Non-repairable Failure)

The rate of occurrence of repairable failure is the rate of undergoing a repairable failure at a certain time t. It is given by

ROCOFRF(t) =pE1(t)CE1E5e+pE2(t)CE2E5e
+pE3(t)CE3E8e+pE4(t)CE4E5e.

In stationary regime it is

ROCOFRF =π1CE1E5e+π2CE2E5e+π3CE3E8e
+π4CE4E5e.

Analogously for the non-repairable case the rate of occurrence of non-repairable failure is defined as the rate of undergoing a non-repairable failure at a certain time t. It is given by

ROCOFNRF(t) =pE1(t)CE1E6e+pE2(t)CE2E6e
+pE3(t)CE3E1e+pE4(t)CE4E6e

This measure in steady-state is

ROCOFNRF(t) =π1CE1E6e+π2CE2E6e
+π3CE3E1e+π4CE4E6e

4.4 Mean Number of Events

The mean number of events described in Section 3 is worked out from the MMAP. Given an event Y, the mean number of events up to time t is given by

MNY(t)=θ0texp(Dt)dtDYe=θ(exp(Dt)-I-teπ)(D-eπ)-1DYe.

From this expression the mean number of even per unit of time in stationary regime is

MNY=limtMNY(t)t=limtθ0texp(Dt)dtDYet=πDYe.

Therefore, depending on DA, the following measures are calculated.

Mean number of repairable failures: DY=DRF+CR+DRF

Mean number of non-repairable failures: DY=DNRF+DNRF+NU

Mean number of preventive maintenance: DY=DPM+DI+PM

Mean number of corrective repairs: DY=DRF+CR+DI+CR

Mean number of incorporations: DY=DI+PM+DI+NU+DI+DI+CR

Mean number of new units: DY=DI+NU+DNRF+NU

5 Cost and Rewards

The system described is subject to different events that can provoke costs and rewards according to the macro-states defined. Each time that the system is operational, a reward equal to B is achieved, and analogously, each time that the system is not operational a cost equal to A is produced. Also, a cost is produced while the system is operational. This cost depends on the phases of the internal degradation level. It is given by the column vectors c1, c2 and c3 for minor, middle and major level, respectively.

If the system is in macro-state PM or CR, the repairperson produces a cost per unit of time depending on the corresponding repairing phases. This cost is given by the vectors cPM and cCR respectively. Also, if the repairperson is at the workplace, but idles, a cost equal to rS per unit of time is produced.

The net reward vectors according to the macro-states are given as follows,

nrO1=Ben1pv-c1epv,nrO2WR=Ben2pv-c2epv,
nrO2R=(B-rS)en2p-c2ep,nrO3WR=Ben3pv-c3epv,
nrRF=-Aepv,nrNRF=-Aepv,nrPM=-Aepm2-epcPM,
nrCR=-Aepm1-epcCR.

The net reward vector by considering the phases of the system is,

nr=(nrO1nrO2WRnrO2RnrO3WRnrRFWRnrNRFWRnrPMnrCR).

Defined the net reward vector according the state space, the expected net reward up to a certain time t id given by

Φ(t)=θ0tP(t)dtnr.

This measure per unit of time is Φ(t)t and this value in stationary regime is given by Φ=πnr. It can be interpreted as the net reward per unit of time when the system is balanced.

Other costs associated with different events are added in the model. These are,

fNU: fix cost per new unit

fCR: fix cost per corrective repair

fPM: fix cost per preventive maintenance

fI: fix cost per incorporation from vacation

Therefore, the total expected net reward up to a certain time t is

Ψ(t) =Φ(t)-(1+MNNU(t))fNU-MNCR(t)fCR
-MNPM(t)fPM-MNI(t)fI.

This measure per unit of time up to a certain time t is Γ(t)=Ψ(t)t, and this value in stationary regime is

Γ =limtΨ(t)t=Φ-MNNUfNU-MNCRfCR
-MNPMfPM-MNIfI.

6 Numerical Example: An Optimization Problem

One interesting problem in reliability is the optimization of systems. In this section, two similar systems, with and without preventive maintenance, are optimised according to the vacation time distribution. Both cases are developed and the optimum systems are compared. The general system consists of multiple internal stages and they are partitioned into minor, middle and major degradation level depending on the damage. In particular, there are seven states partitioned as follows: 1-2, minor degradation level; 3-4, middle degradation level and 5-6-7, major degradation level (if it is observed, the repairperson sends it to preventive maintenance). The repair facility is composed of one sole repairperson. This repairperson can take vacations and the vacation time is random for the general case.

The internal operational time is PH distributed with representation (α,T) where α=(1,0,0,0,0,0,0) and

T=(-10.510.240.250001.2-20.50.300000-0.80.200.160.16000.225-0.90.110.110.140000-0.40.030.0700000.1-0.90.12500000.070.03-0.4).

The internal repairable and non-repairable failure is governed by the following column vectors respectively,

Tr0=(000.240.270.280.630.28)andTnr0=(000.040.0450.020.0450.02).

The system is exposed to random external shocks with PH representation (γ,L) with γ=(1,0) and L=(-32.92.9-3). The shock can provoke repairable or non-repairable failure according to these transition rates,

Lr0=(0.080.08)andLnr0=(0.020.02).

The corrective repair and preventive maintenance time are phase-type distributed with representation (β1,S1) and (β2,S2) respectively, where

β1 =(1,0), β2=(1,0),
S1 =(-10.50.5-1), S2=(-20.0050.005-2).

Rewards and cost are introduced in the problem. A profit per unit of time equal to B = 100 monetary units (m.u.) occurs whereas the system is operational (equal cost when it is not operational, A=100). A cost is produced depending on the operational degradation level while it is operational; 0.1 m.u. 0.5 m.u. and 1 m.u. respectively for each one. Each unit of time that the repairperson is idle at the workplace, a cost equal to rS = 0.5 m.u. is produced. This amount increases when the repairperson is working. If the repairperson is engaged in preventive maintenance, the cost increases by 1.5 m.u. and 9.5 m.u. for corrective repair.

In the following, we examine how the repairperson’s vacation time should be distributed to optimise net rewards. To do so, it is assumed that the distribution of the vacation time is phase-type (gamma distribution) with representation,

v=(1,0);V=(-λ1λ10-λ2).

To get the optimum model, the net reward profit in stationary regime is maximised,

λ^1,λ^2s.t.Γ(λ^1,λ^2)=supλ1,λ2Γ(λ1,λ2).

These values are λ^1=5.8003 and λ^2=5.8003 for the case with preventive maintenance (maximum net profit in stationary regime 2.0943 m.u. per unit of time) and λ^1,smp=5.4502 and λ^2,smp=5.4502 for the scenario without preventive maintenance.

For both optimum systems, the cumulative net profit per unit of time is compared in Figure 1. It is observed that the system with preventive maintenance is always better than the system without preventive maintenance from an economic point of view. The system without preventive maintenance is in deficit at any time, but when preventive maintenance is included in the system it is profitable from time 433.17.

images

Figure 1 Cumulative net profit per unit of time over time (with preventive maintenance, continuous line; without preventive maintenance, dashed line).

Several performance measures such as the availability and mean events time have been worked out. Figure 2 shows the availability for both cases.

images

Figure 2 Availability for the optimum systems (with preventive maintenance, continuous line; without preventive maintenance, dashed line).

Table 1 shows the stationary distribution for both models. These values can be interpreted as the proportion of time in each macro-state.

Table 1 Stationary distribution by considering the macro-states (without preventive maintenance in parenthesis)

πO1e πO2WRe πO2Re πO3WRe πRFe πNRFe πPMe πCRe
0.3851 (0.2909) 0.0502 (0.0407) 0.2387 (0.3304) 0.0038 0.0133 (0.0100) 0.0030 (0.0023) 0.0479 0.2581 (0.3257)

Finally, several measures as ROCOF and mean number of events in transient and stationary regime are compared in Table 2.

Table 2 ROCOF and mean number of events (without preventive maintenance in parenthesis)

t=1 t=5 t=10 t=50 t=
ROCOFRF(t) 0.1423 (0.1602) 0.1315 (0.1688) 0.1291 (0.1628) 0.1290 (0.1629) ROCOFRF 0.1290 (0.1629)
ROCOFNRF(t) 0.0292 (0.0308) 0.0263 (0.0272) 0.0259 (0.0263) 0.0259 (0.0264) ROCOFNRF 0.0259 (0.0264)
MNRF(t) 0.1201 (0.1262) 0.6764 (0.8332) 1.3247 (1.6540) 6.4860 (8.1686) MNRF 0.1290 (0.1629)
MNNRF(t) 0.0261 (0.0266) 0.1376 (0.1458) 0.2676 (0.2782) 1.3027 (1.3326) MNNRF 0.0259 (0.0264)
MNPM(t) 0.0487 0.4614 0.9429 4.7694 MNPM 0.0957
MNCR(t) 0.0978 (0.1042) 0.6631 (0.8235) 1.3114 (1.6440) 6.4727 (8.1586) MNCR 0.1290 (0.1629)
MNI(t) 2.1841 (2.1750) 7.6818 (6.6759) 13.8847 (11.3160) 63.5153 (48.7966) MNI 1.2408 (0.9372)
MNNU(t) 0.0210 (0.0217) 0.1347 (0.1436) 0.2646 (0.2760) 1.2997 (1.3303) MNNU 0.0210 (0.0264)

7 Conclusions

This paper presents two complex multi-state systems subject to various types of failure where in one of them preventive maintenance is applied. These systems are composed of several internal degradation levels and are subject to internal failure and random external shocks. The possible internal failure and/or external shocks may provoke repairable or non-repairable failure. A vacation policy is included in the model to optimize it, considering a financial point of view. Both systems are modelled using a Markovian Arrival Process with Marked arrivals in an algorithmic and computational form.

It is shown that the PH and MMAP enable us to express the modelling and its associated measures in a well structured way. Costs and rewards are included in the model and several associated measures are worked out. One interesting measure, the net reward per unit of time function, is built and it is considered to optimize systems according to vacation time distribution. A numerical example, optimizing similar systems with and without preventive maintenance, and comparing them, illustrates the versatility of the model.

Acknowledgements

This paper is supported by the project FQM-307 of the Government of Andalusia (Spain), by the project PID2020-113961GB-I00 of the Spanish Ministry of Science and Innovation (also supported by the European Regional Development Fund program, ERDF) and the project A-FQM-66-UGR20 of the Ministry of Knowledge, Research and University, Junta de Andalucía (Spain). Also, the authors acknowledge financial support by the IMAG–María de Maeztu grant CEX2020-001105-M/AEI/10.13039/501100011033.

Appendix A

In this appendix, the matrix blocks and the stationary regime is given for the case without preventive maintenance addressed in Section 2.1. The state space is described in that section. The events associated with this system are

Events

0: No events (no PM, no failure, no end of vacation)

RF+CR: Repairable failure and corrective repair

RF: Repairable failure without immediate corrective repair

NRF+NU: Non-repairable failure and new unit

NRF: Non-repairable failure without immediate new unit

I: return of a vacation period

I+CR: return of a vacation period and corrective repair

I+NU: return and new unit

Therefore, the MMAP has the following representation

(D0,DRF+CR,DRF,DNRF+NU,DNRF,DI,DI+CR,DI+NU).

The block matrices are

DO=(CO1O11CO1O2WR00000CO2WRO2WR000000CO2RO2R000000CRFWRRFWR000000CNRFWRNRFWR0CCRO10000CCRCR)
DRF+CR=(00000000000000000CO2RCR000000000000000000),
DRF=(000CO1RFWR00000CO2WRRFWR00000000000000000000000000)
DNRF+NU=(000000000000CO2RO100000000000000000000000)
DNRF=(0000CO1NRFWR00000CO2WRNRFWR0000000000000000000000000)
DI=(CO1O120000000CO2WRO2R000000000000000000000000000),
DI+CR=(00000000000000000000000CRFWRCR000000000000),
DI+NU=(000000000000000000000000CNRFO100000000000),

where the matrix-blocks are

CO1O11=T11LI+IIV;
CO1O12=IIV0υ;CO1O2WR=T12II;
CO1RFWR=T1,r0II+eLr0γI;
CO1NRFWR=T1,nr0II+ILnr0γI;
CO2WRO2WR=T22LI+IIV;
CO2WRO2R=IIV0;
CO2WRRFWR=T2,r0II+eLr0γI;
CO2WRNRFWR=T2,nr0II+eLnr0γI;
CO2RO1=T2,nr0α1Iυ+eα1Lnr0γυ;
CO2RO2R=T22I+IL;
CO2RCR=T2,r0Iβ1+eLr0γβ1;
CRFWRRFWR=(L+L0γ)V;CRFWRCR=IV0β1;
CNRFWRO1=α1IV0υ;CNRFWRNRFWR=(L+L0γ)V;
CCRO1=α1IυS10;CCRCR=(L+L0γ)S1.

Stationary Distribution

The stationary distribution for the case without preventive maintenance is also partitioned according to the macro-state space S, composed of six macro-states in this case. The process generator is denoted as

D=(D11D120D14D1500D22D23D24D250D310D3300D36000D440D46D51000D550D610000D66).

The balance equations are given by

π1D11+π3D31+π5D51+π6D61=0π1D12+π2D22=0π2D23+π3D33=0π1D14+π2D24+π4D44=0π1D15+π2D25+π5D55=0π3D36+π4D46+π6D66=0πe=1

The solution of this matrix system is given by

πi=π1Ri;i=2,,6

with

R2 =G12,R3=R2G23,R4=G14+R2G24,
R5 =G15+R2G25,R6=R3G36+R4G46,

where Gjk=-DjkDkk-1 for the corresponding case.

The π1 vector is achieved from the first and last matrix equation,

π1D11+π3D31+π5D51+π6D61=0πe=1

Then,

π1=(0,1)(A*|(I+a=26Ra)e)-1

with A* being the matrix A without the first column and A=D11+R3D31+R5D51+R6D61.

Appendix B

This appendix contains the rest of the block-matrices for the system with preventive maintenance described in Section 3.1.

DO=(CO1O11CO1O2WR0CO1O3WR00CO2WRO2WR0CO2WRO3WR000CO2RO2R00000CO3WRO3WR00000CRFWRRFWR00000CPMO10000CCRO10000
000000000000000CNRFWRNRFWR000CPMPM000CCRCR)
DPM=(0000000000000000000000CO2RPM00000000000000000000000000000000000000000)
DI+PM=(000000000000000000000000000000CO3WRPM000000000000000000000000000000000);
DI+NU=(0000000000000000000000000000000000000000CNRFWRO100000000000000000000000)
DNRF=(00000CO1NRFWR0000000CO2WRNRFWR000000000000000CO3WRNRFWR0000000000000000000000000000000000)
DNRF+NU=(0000000000000000CO2RO100000000000000000000000000000000000000000000000)
DI=(CO1O12000000000CO2WRO2R00000000000000000000000000000000000000000000000000000)
DI+CR=(000000000000000000000000000000000000000CRFWRCR000000000000000000000000).
CO1O11=T11LI+IIV;CO1O12=IIV0υ;
CO1O2WR=T12II;CO1O3WR=T13II;
CO1NRFWR=T1,nr0II+eLnr0γI
CO2WRO2WR=T22LI+IIV;CO2WRO2R=IIV0;
CO2WRO3WR=T23II;
CO2WRNRFWR=T2,nr0II+eLnr0γI;
CO2RO1=T2,nr0α1Iυ+eα1Lnr0γυ;
CO2RO2R=T22I+IL;
CO2RPM=T23eIβ2;
CO3WRO3WR=T33LI+IIV;
CO3WRRFWR=T3,r0II+eLr0γI;
CO3WRNRFWR=T3,nr0II+eLnr0γI;
CO3WRPM=eIV0β2;
CRFWRRFWR=(L+L0γ)V;
CRFWRCR=IV0β1;CNRFWRO1=α1IV0υ;
CNRFWRNRFWR=(L+L0γ)V;CPMO1=α1IυS20;
CPMPM=(L+L0γ)S2;CCRO1=α1IυS10;
CCRCR=(L+L0γ)S1.

References

Acal, C., Ruiz-Castro, J.E., Aguilera, A.M., Jiménez-Molinos, F. and Roldán, J.B (2019) ‘Phase-type distributions for studying variability in resistive memories’, Journal of Computational and Applied Mathematics, 345, pp. 23–32. doi: 10.1016/j.cam.2018.06.010.

Artalejo, J. R., Gómez-Corral, A. and Qi-Ming, H. (2010) ‘Markovian arrivals in stochastic modelling: a survey and some new results’, Sort: Statistics and Operations Research Transactions, 34(2), pp. 101–156. Available at: http://dialnet.unirioja.es/servlet/articulo?codigo=3352462.

Barlow, R. and Hunter, L. (1960) ‘Optimum Preventive Maintenance Policies’, Operations Research, 8(1), pp. 90–100. doi: 10.1287/opre.8.1.90.

Finkelstein, M., Cha, J. H. and Levitin, G. (2020) ‘A hybrid preventive maintenance model for systems with partially observable degradation’, IMA Journal of Management Mathematics, 31(3), pp. 345–365. doi: 10.1093/imaman/dpz018.

He, Q.-M. (2014) Fundamentals of Matrix-Analytic Methods. New York, NY: Springer New York. doi: 10.1007/978-1-4614-7330-5.

He, Q.-M. and Neuts, M. F. (1998) ‘Markov chains with marked transitions’, Stochastic Processes and their Applications, 74(1), pp. 37–52. doi: 10.1016/S0304-4149(97)00109-9.

Kalyanaraman, R. and Sundaramoorthy, A. (2019) ‘A Markovian single server working vacation queue with server state dependent arrival rate and with randomly varying environment’, in, p. 020033. doi: 10.1063/1.5135208.

Murchland, J. D. (1975) ‘Fundamental concepts and relations for reliability analysis of multi-state systems’. Available at: https://inis.iaea.org/search/search.aspx?orig\_q=RN:8291134 (Accessed: 5 March 2021).

Nakagawa, T. (1977) ‘Optimum Preventive Maintenance Policies for Repairable Systems’, IEEE Transactions on Reliability, R-26(3), pp. 168–173. doi: 10.1109/TR.1977.5220105.

Nakagawa, T. (2005) Maintenance Theory of Reliability. London: Springer-Verlag (Springer Series in Reliability Engineering). doi: 10.1007/1-84628-221-7.

Neuts, M. F. (1975) ‘Probability distributions of phase type’, Probability Distributions of Phase Type, pp. 173–206.

Neuts, M. F. (1979) ‘A versatile Markovian point process’, Journal of Applied Probability, 16(4), pp. 764–779. doi: 10.2307/3213143.

Neuts, M. F. (1981) Matrix Geometric Solutions in Stochastic Models: An Algorithmic Approach. Mineola, Nueva York: Dover Publications.

Pérez-Ocón, R., Montoro-Cazorla, D. and Ruiz-Castro, J. E. (2006) ‘Transient analysis of a multi-component system modeled by a general markov process’, Asia-Pacific Journal of Operational Research, 23(03), pp. 311–327. doi: 10.1142/S0217595906000954.

Pérez-Ocón, R., Ruiz-Castro, J. E. and Gámiz-Pérez, M. L. (1998) ‘A Multivariate Model to Measure the Effect of Treatments in Survival to Breast Cancer’, Biometrical Journal, 40(6), pp. 703–715. doi: 10.1002/(SICI)1521-4036(199810)40:6<703::AID-BIMJ703$>$3.0.CO;2-7.

Ruiz-Castro, J. E. (2018) ‘A D-MMAP to Model a Complex Multi-state System with Loss of Units’, in, pp. 39–58. doi: 10.1007/978-3-319-63423-4\_3.

Ruiz-Castro, J. E. (2020) ‘A complex multi-state k-out-of-n: G system with preventive maintenance and loss of units’, Reliability Engineering & System Safety, 197, p. 106797. doi: 10.1016/j.ress.2020.106797.

Ruiz-Castro, J. E. and Zenga, M. (2020) ‘A general piecewise multi-state survival model: application to breast cancer’, Statistical Methods & Applications, 29(4), pp. 813–843. doi: 10.1007/s10260-019-00505-6.

Shekhar, C., Varshney, S. and Kumar, A. (2020) ‘Reliability and Vacation: The Critical Issue’, in, pp. 251–292. doi: 10.1007/978-3-030-31375-3\_7.

Shi, Y. et al. (2022) ‘A new preventive maintenance strategy optimization model considering lifecycle safety’, Reliability Engineering & System Safety, 221, p. 108325. doi: 10.1016/j.ress.2022.108325.

Zhang, Y., Wu, W. and Tang, Y. (2017) ‘Analysis of an k-out-of-n:G system with repairman’s single vacation and shut off rule’, Operations Research Perspectives, 4, pp. 29–38. doi: 10.1016/j.orp.2017.02.002.

Biographies

images

Juan Eloy Ruiz-Castro is a Full Professor in the Department of Statistics and Operations Research at the University of Granada (Spain). He works mainly, since 1993, on survival and reliability analysis considering Markovian and semi-Markov models, Phase Type distributions and Markovian arrival processes using matrix analysis methods. An interesting aspect of his research is the construction of theoretical models in medicine to analyse the behavior of various diseases. For example, he has applied these models to study the evolution of breast cancer subject to multiple treatments. His research in the field of reliability focuses on the analysis of repairable systems with and without loss of units. As a result of his research, he has more than thirty publications in a large number of high-impact scientific journals and has been invited to participate in multiple conferences to show his contributions. His editorial activity is extensive, and he belongs to the editorial board of several journals recognized as prestigious in the JCR. Currently, he is interested in incorporating phase distributions and Makovian arrival processes in various fields, such as electronics and physics, to analyze the behavior of complex devices.

images

Christian Acal is Substitute Teaching Tutor in the Department of Statistics and Operation Research at University of Granada (Spain). He received his International Ph.D. Degree in Mathematical and Applied Statistics from University of Granada in 2021. His areas of interest is focused on the Stochastic Modelling and Forecasting of high dimension data. In particular, his main research line is the Functional Data Analysis and its applications in different areas of knowledge, although he also works in Survival and Reliability Analysis. He has participated in numerous national e international congresses, many of them as invited, and he has multiple publications in high-impact scientific journals indexed in the Journal Citation Reports. Currently, he belongs to the work team of several research projects awarded by the Spanish Ministry of Science and Innovation and by the Government of Andalusia (Spain) and before, he was director of a project for young researchers promoted by the University of Granada. Finally, he is researcher attached to the Institute of Mathematics at University of Granada, which has the Seal of National Excellence ‘María de Maeztu’ and he is also member of the Spanish Society of Statistics and Operation Research.

Abstract

1 Introduction

2 The Systems

2.1 The System Without Preventive Maintenance

2.2 The System with Preventive Maintenance

3 Modelling the Systems Through Marked Markovian Arrival Processes

3.1 The MMAP

Matrix Block DRF

Matrix Block DRF+CR

3.2 Transient Distribution

3.3 The Stationary Distribution

4 Measures

4.1 Availability

4.2 Reliability

4.3 ROCOFRF and ROCOFNRF (Rate of Occurrence of Repairable and Non-repairable Failure)

4.4 Mean Number of Events

5 Cost and Rewards

6 Numerical Example: An Optimization Problem

images

images

7 Conclusions

Acknowledgements

Appendix A

Events

Stationary Distribution

Appendix B

References

Biographies