Home 
Errors in use of multivariable regression analysis Correspondence Address:
Issues Related to Multivariable Logistic Regression Thirtyone subjects were included in the study, as per rule of thumb derived from the simulation study for logistic regression at least 10 events per variable (EPV) for the minimum outcome.[2] The author calculated on the basis of 10 subjects per variable, which is not correct for logistic regression. For instance, 11 subjects out of 31 were survived on the end of the study. Thus, minimum outcome event is survival at end of the study. Therefore, survival benefit outcome had only 5.5 EPV as author included two risk factors in the model. In addition to this condition when a risk factor is rare which means positivity of this risk factor is small, even 10 EPV may be inadequate.[3] Insufficient sample size in regression models yield unstable risk estimates, wider confidence interval (CI), and can reflect inaccurate association as author found in their study for in vitro fertilizations (IVFs) variable higher odds ratio (OR), wider CI and P value closest to 0.05. Furthermore, the author reported logodds and its 95% CI 2.42 (1.06, 121.13), instead of reporting 95% CI of logodds, the author reported 95% CI of OR. The correct value of logodds and its 95% CI was 2.42 (0.06, 4.797). The OR is equal to exp(logodds). In logistic regression, logodds and OR has different meaning former give additive effect while later provide the multiplicative effect. The wider CI of OR usually indication of overfitting of model and results of the model is not trustworthy. The HL goodness of fit test is also sensitive to small sample size because this is based on the Chisquare test and usually divided into deciles (10 parts). Chisquare test needed more the 5 excepted frequencies in at least 80% of cells. Thus, this test is not reliable in a small size. Issues Related to Multiple Linear Regression Analysis For multiple linear regression rules of thumb state that at least 20 subjects per eligible variable were included in the model. Whereas, the author applied multiple linear regression only in seven subjects which are too small to yield correct model results. The terminology in multiple regression is "regression coefficient" not "regression correlation" as reported by the author in results. Author reported seven subjects were included and reported Fstatistics as F(2,6) = 6.27 which is also wrong because it should be F (p, np1) where P is number of parameters and n is number of sample size, according to this it should be F(2,4). The basic assumptions for multiple linear regression: Normality of residuals and homogeneity of variance were not reported by the author. Author reported that negative Pearson correlation between volume of IVF used and change in serum creatinine (r = −0.816) but in multiple regression its regression coefficients was positive and too small 0.000 although significant, zero regression coefficient appeared because volume of IVF was measured in milliliter, and the effect of 1 ml volume of IVFs change had very small influence on change in serum creatinine level. In such situations, it would be better to convert the ml into liter to get the regression coefficient into a meaningful and in a presented way. Furthermore, Pearson correlation needs both variables should be metric scale, the author reported Pearson correlation (r = 0.042) between sex and change in MAP, which also seems to be inaccurate. In addition, the ratio between male to female is 30–1 that showed only 1 female in this study. The multivariable regressions are an important tool in medical literature to find the association between outcome and risk factors, or to predict the outcome from a set of predictors. Reliability of regression model depends on the fulfillment of model associated assumptions. The quality of logistic regression was evaluated on bases of wellestablished 10 points criteria and found the quality in Indian medical journal is far lagging behind as compared to quality of logistic regression in articles published from Europe and USA.[4] The author should consult a competent statistician or read a good literature to understand the associated assumptions and criteria before analysis and reporting the results of these models.[5] The associated assumptions not only should be checked but also reported in the text. I appreciated that author tried to explain collinearity criterion of the multivariable regression model. Financial Support and Sponsorship Nil. Conflicts of Interest There are no conflicts of interest. References


