Test of the Randomness of Residuals and Detection of Potential Outliers for the Morgan-Mercer-Flodin Used in the Fitting of the Prediction of Cumulative Death Cases in Nigeria Due to COVID-19 BULLETIN OF ENVIRONMENTAL SCIENCE & SUSTAINABLE MANAGEMENT

In this study, we use the Wald–Wolfowitz runs test as a statistical diagnosis tool to check whether the randomness of the residual for the Morgan-Mercer-Flodin (MMF) utilized in the fitting of the prediction of cumulative death cases in Nigeria owing to COVID-19. The runs test revealed that there were 13 total runs, however the number of runs that should have been expected based on the randomization assumption was 26. Because the p-value was less than 0.05, we can conclude that the residuals are not truly random and must reject the null hypothesis. Too many instances of a specific run sign may indicate the presence of a negative serial correlation; on the other hand, too few runs may indicate the presence of a clustering of residuals with the same sign or the presence of a systematic bias. A further analysis of the residuals using the Grubb's test indicate the existence of an outlier, which indicates that the data must be remodeled because of the presence of the outlier.


INTRODUCTION
A biological system may be thought of as a collection of many cellular compartments (such as different types of cells) that each have a distinct function in the organism (e.g. white and red blood cells have very different commitments). An elemental unit of some sort that can be viewed but whose internal structure is either unknown or does not exist is referred to as an object. The chosen elemental unit will determine the scale that will be used to display the system. The study of biological systems at various levels of organization, from molecules to organisms and even populations, is made possible by the availability of data representing many biological states and activities, as well as the time dependencies of those biological states and processes. A representation of a system that can be deciphered or understood by researchers in general is called a model. A model is a description of a system in terms of the constituent components and the interactions between those components.
However, in a nonlinear regression, the curve's residuals need to have a natural dispersion, in contrast to the standard least squares approach, which requires the residuals to have a normal distribution in a linear regression. This is because the standard least squares approach is based on the principle of least squares. The residuals, which are significantly more important, have to be random and have the same variance (homoscedastic distribution). The Wald-Wolfowitz runs test is used to establish whether or not the randomization process was successful. [1]. More often than not, the residuals must be tested for the presence of outliers [at 95 or 99% of confidence). This is normally done using the Grubb's test. The subject of this study is to test for the randomness of the residuals the Morgan-Mercer-Flodin (MMF) utilized in the fitting of the prediction of cumulative death cases in Nigeria owing to COVID-19.

METHODOLOGY
Measurement of the accuracy of any model fitting a curve in nonlinear regression can be achieved by evaluating the residual information (D'Agostino, 1986). In the statistical meaning, residual data is calculated by the difference between observed and predicted data, the latter obtained using suitable model and usually carried out using nonlinear regression (Eq. 1);

HISTORY
This work is licensed under the terms of the Creative Commons Attribution (CC BY) (http://creativecommons.org/licenses/by/4.0/). where yi is the i th response from a particular data and xi is the vector of descriptive variables to each set at the i th observation which corresponds to values from a particular data. Residual data from the Morgan-Mercer-Flodin (MMF) used in the Fitting of the Prediction of Cumulative Death Cases in Nigeria due to COVID-19 was obtained from a previous work .

Grubbs' Statistic
The test is a statistical test used to discover outliers in a univariate data set that is believed to have a Gaussian or normal distribution. Grubb's test assumes that the data is regularly distributed. The test is used to discover outliers in a univariate context [2]. The test can be utilized to the maximal or minimal examined data from a Student's t distribution (Eq.2) and to test for both data instantaneously (Eq. 3).
The ROUT method can be employed in the event that there is more than one outliers [3]. The False Discovery Rate is the foundation of the approach (FDR). Q, a probability of (incorrectly) recognising one or more outliers must be explicitly specified. It is the highest desired FDR. Q is fairly comparable to alpha in the absence of outliers. Assumption that all data has a Gaussian distribution is mandatory.

Runs test
The Wald-Wolfowitz test is named after the discoverer, Abraham Wald and Jacob Wolfowitz. It is a non-parametric statistical test that adheres to the hypothesis of randomness. When applying a specific model, the runs test might reveal a systematic deviation of over or under estimating parts of the model-fitted curve. The runs test has been useful in ensuring that the residuals of nonlinear regression models in biological systems are genuinely random and thus the model is statistically accurate to be utilized [4][5][6]. This test was applied to the regression residuals in order to find unpredictability in the residuals.
The number of sign runs is often stated as a percentage of the greatest number possible. The runs test examines the sequence of residuals, of which they are composed of positive and negative values. A successful run, after running the test, is often represented by the presence of an alternating or adequately balanced number of positive and negative residual values. The runs test computes the likelihood of the residuals data having too many or too few runs of sign (Eq. 4). Too few runs may suggest a clustering of residuals with the same sign or the existence of systematic bias, whereas too many of a run sign may identify the presence of negative serial correlation [1,7].
The test statistic is H0= the sequence was produced randomly Ha= the sequence was not produced randomly Where Z is the test statistic, � indicates the anticipated number of runs, sR is the standard deviation of the runs and R is the observed number of runs and (Eqs. 5 and 6). The calculation of the respective values of � and sR (n1 is positive while n2 is negative signs) is as follows.
As an example

RESULTS
In the statistical analysis of nonlinear regression, the utilization of residuals data, which is essential the difference between observed and predicted data plays a key role. The residuals of a mathematical model are defined as the differences between the values predicted by the model and the values that are actually observed in the data. In order to evaluate whether or not the residuals are sufficiently random, do not include any outliers, adhere to the normal distribution, and do not display any autocorrelation, statistical tests need to be carried out. Residuals data are often in the form of positive and negative values, which is important to indicate a balance of the data, and this can be observed visually before any tests are carried out. Tests to residuals data are often neglected in nonlinear regression. When there is a larger discrepancy between the values that were predicted and those that were actually observed, the model is deemed to be of lower quality as a general rule. This is because there is less correlation between the two sets of data [8]. The residuals for the MMF model are shown in Table 1.
In the case of the data presented before, the use of Grubbs' test revealed that there was no evidence of an outlier. This suggests that the model was adequate to model the data. When fitting a nonlinear curve, a significant amount of inaccuracy can be caused by either the mean being distorted by a single data point or a single data point from a triple being distorted. Grubbs' test is able to identify one outlier at any given point in time. Checking for outliers is therefore an essential component of curve fitting. This data point that was determined to be an outlier is removed from the set, and the analysis is repeated until there are no more outliers [9][10][11][12][13][14][15]. However, multiple repetitions can change the likelihood of detection, and the test should not be used for sample sizes of six or less because it consistently identifies the majority of the points as outliers. The Grubbs' test statistic determines which sample value has the greatest absolute departure from the sample mean expressed in terms of the sample's standard deviation. If the resulting test statistic g is higher than the critical value, the value in question is regarded as being an outlier. This is because the critical value is the minimum acceptable value [2]. The Grubbs' test indicated that there was an outlier ( Table 2). This outlier must be removed and the modelling redone. An extreme data point that the investigator labels as implausible due to the fact that it does not satisfy a number of specific parameters is one example of a possible outlier. In a lot more specific terms, an outlier in a sample is truly an extreme number that is unacceptably high. For instance, the maximum is regarded an outlier when it is statistically much larger than the distribution that should be expected for the maximum in the model of the population [16]. In engineering, Chauvenet's criterion and the 3-sigma criterion together with the Z-score are used to label potential outliers in measurements. In chemometrics, the Z-score is used in conjunction with the 3-sigma criterion.  [3] are recommended [17]. Of the two, the ROUT method, which combines robust regression and outlier removal is increasingly being employed in removal of multiple outliers [9,[18][19][20][21].
This work is licensed under the terms of the Creative Commons Attribution (CC BY) (http://creativecommons.org/licenses/by/4.0/).
The runs test revealed that there were 13 total runs, but the number of runs that should have been anticipated on the basis of the assumption of randomness was 26 ( Table 3). This suggests that the series of residuals included runs that were only partially appropriate. The Z-value provides an indication of how many standard errors the actual number of runs is below the predicted number of runs, and the accompanying p-value provides an indication of how severe this z-value is. The interpretation is identical to that of any other p-value statistics. Since the p-value was less than 0.05, then it is possible to conclude that the residuals are not truly random and so we have to reject the null hypothesis. When there are too many instances of a particular run sign, this may point to the existence of a negative serial correlation; on the other hand, when there are too few runs, this may point to the existence of a clustering of residuals that all have the same sign or to the existence of a systematic bias [7]. When using a specific model, the runs test is possible to find a systematic deviation of the curve, such as an overestimation or underestimate of the sections. This may be accomplished by comparing the actual values to those predicted by the model. The runs test is used to determine the likelihood that there is an excessive amount or an inadequate number of runs of sign. The runs test was applied to the regression residuals in order to see if there was any evidence of nonrandomness. It is feasible to construct an ordered variance of the curve in a model that is either higher or lower than the estimate. This may be done in either direction.
The run test contrasts a drug's generally negative sequence of residues with its optimistic sequence in order to determine whether or not the material is hazardous. A remarkable result is typically identified by a shift or combination of shifts or combinations of shifts between the negative and positive residual values. Alternatively stated: a notable result is frequently distinguished by a shift or combination of shifts [1]. It is common practice to utilize the highest feasible percentage when attempting to reflect the number of signs that are run. The run's test identifies whether a high number of sign passes are likely to occur or if a low number of sign passes are more likely to occur. It's possible that a disproportionate number of run signs indicates a negative serial correlation, but it's also possible that a disproportionate number of runs indicates that residues are connected with the same sign or that there are systemic biases. [7].
In the context of testing for the presence of autocorrelation in time-series regression models, the run approach is utilised rather frequently. To be more specific, Monte Carlo simulation experiments have shown that the run-time test causes strikingly asymmetrical error rates in the two tails. This indicates that the use of run-time autocorrelation research may not be stable and that the Durbin-Watson approach will be the preferred method for measuring autocorrelation in the future [22]. Previous studies that were comparable to this one and were focused on analysing the randomness of the residuals provide justification for the strategy that was used in this study. For example, the application of the Baranyi-Roberts model to the task of fitting an algal development curve, which demonstrates adequateness in terms of the statistics [23], Moraxella sp. B on monobromoacetic acid (MBA) [5] and the Buchanan-three-phase model used in the fitting the growth of Paracoccus sp. SKG on acetonitrile [24]. For lead (II) absorption by alginate gel bead, the runs tests on the residuals for the Sips and Freundlich models were found to be sufficient [25]. In a previous study, the test for the randomness of the residual for the data from the pseudo-1 st order Kinetic modelling of adsorption of the brominated flame retardant 4bromodiphenyl ether onto biochar-immobilized Sphingomonas sp. shows that the residual series had sufficient runs after a runs test was carried out [26]. In the body of academic research, different applications of the runs test of residual may be found for the purpose of evaluating the validity of the nonlinear regression [27][28][29][30][31].

CONCLUSION
A test for the randomness of the residual for the residuals for the Morgan-Mercer-Flodin (MMF) utilized in the fitting of the prediction of cumulative death cases in Nigeria owing to COVID-19 was carried out with the help of the Wald-Wolfowitz runs test was performed in this investigation. The runs test revealed that there were 13 total runs, however the number of runs that should have been expected based on the randomization assumption was 26. Because the p-value was less than 0.05, we can conclude that the residuals are not truly random and must reject the null hypothesis. Too many instances of a specific run sign may indicate the presence of a negative serial correlation; on the other hand, too few runs may indicate the presence of a clustering of residuals with the same sign or the presence of a systematic bias. This indicates that there is significant evidence of non-randomness of the residues and that additional intervention, such as the detection of potential outliers was required. A further analysis of the residuals using the Grubb's test indicate the existence of an outlier, which indicates that the data must be remodeled because of the presence of the outlier.