Statistics is by far the most effective tool to not only process but also interpret data and hypotheses in the field of biomedicine, thus significantly helping to guarantee the quality of scientific evidence in medicine. The ability of basic and applied concepts of biostatistics and their use in the interpretation of statistical results in professional communications belongs to the pension of the modern physician's education. Especially in an era that produces a huge number of publications containing highly technical and complex information that must be interpreted using the language of statistics.
The era of covidu-19 and the (un)conscious dissemination of erroneous or misleading information has tested the level of statistical and epidemiological education of physicians and medical students and created room for strengthening the interest-based teaching of biostatistics, for example, in the form of an elective course. The course is recommended to all undergraduate students with ambitions to pursue part or all of their future professional career in scientific work. The expected audience is therefore (not only) prospective PhD students, possibly interested postgraduate students. The course is designed as an introduction for beginners; no prior knowledge of statistics or statistical software is required. The mathematical apparatus is limited to the minimum necessary and is based only on a high school knowledge of mathematics.
The student will be introduced to descriptive statistics, commonly used descriptive indices and measures of association used in publications, as well as graphical data visualization and appropriate graphical outputs for specific data inputs. A major area that will be taught theoretically and especially practiced is hypothesis testing, including knowledge of hypothesis assumptions, parametric and non-parametric (robust) approaches, and the ability to properly select the appropriate type of testing according to the input data and hypotheses. Furthermore, linear and non-linear regression techniques allowing to model a continuous variable using explanatory variables will be practiced, as well as logistic regression allowing to classify into categories using predictors.
More complex models including hierarchical and survival analysis will also be presented. The practical part, on which adequate emphasis will be placed, will be done in statistical software applied to real data - the open-source clicker tool Jam, and possibly others; also in the language and environment of R if desired. In addition to the actual work with data and statistical concepts and models, appropriate interpretations of the statistical conclusions obtained will also be discussed. During the course and in the final seminar paper (project), the student will simultaneously develop an approach to reproducible and transparent project management and data analysis. A similar course is offered in various forms, possibly already established at other Prague faculties, including medical faculties. At our faculty, the course is already taught in the Czech language version.
Aims
- Describe the roles biostatistics serves in the discipline of medicine.
- Describe basic concepts of probability, random variation, and commonly used statistical probability distributions.
- Describe preferred methodological alternatives to commonly used statistical methods when assumptions are not met.
- Distinguish among the different measurement scales and the implications for the selection of statistical methods to be used based on these distinctions.
- Apply descriptive techniques commonly used to summarize (bio)medical data.
- Apply common statistical methods for inference.
- Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.
- Interpret results of statistical analyses found in (bio)medical studies.
- Perform univariate data analysis for continuous and categorical variables.
- Capability to build an appropriate statistical model over real health data. Estimate and compare the efficiency of models.
- Apply and particularly understand fundamental models within the scope of (non)linear regression, time-series, survival analyses, and other domains.
- Use statistical software to analyze health-related data.
- Descriptive statistics. Averages and variability. Measures of association. Graphical data representation.
- Introduction to probability. Random variable. Selected probabilistic distributions.
- Confidence intervals. Introduction to inferential statistics. Parametric and nonparametric tests of inference. Interpretation of results and appropriate graphical visualization.
- Analysis of variance (ANOVA). Univariate and multivariate linear regression. Interpretation of results and appropriate graphical visualization.
- Logistic binary regression. Multinomial logistic regression. Interpretation of results and appropriate graphical visualization.
- Mixed-effects model. Hierarchical models. Interpretation of results and appropriate graphical visualization.
- Introduction to time series. Introduction to survival analysis. Interpretation of results and appropriate graphical visualization.
- Selected advanced statistical methods in R, both linear and nonlinear. Cluster analysis. Discriminant analysis. Jackknife. Bootstrap. Interpretation of results and appropriate graphical visualization.
- Selected methods of machine learning in R. Naïve Bayes classifier. Support Vector Machine (SVM). Cross-Validation (CV). Principal Component Analysis (PCA). Decision trees. Random forests. Neural networks. Association rules. Interpretation of results and appropriate graphical visualization.