1st Faculty of Medicine Charles University 1st Faculty of Medicine Charles University Institute of Biophysics and Informatics
stuka 31.01.2020

Introduction to R scripting language

An introduction to R scripting language

Dear students,

we are announcing that a new elective subject dedicated to R language and R statistical environment was opened to start at the summer term of the year 2019/2020. You are more than welcome to join the subject and enroll it via SIS using the ident code B83128 or the name An introduction to R scripting language. The course of the subject is ended by a seminar project and worth 3 ECTS.

Course annotation

The course is aimed at students interested in programming language and environment R and the field of data science as well, as R is widely used for data science applications. R is not only a programming language designed for statistical computing and graphics purposes but also a Turing-complete general-purpose programming language suitable for complex tasks solutions. Advantages of R over commercial systems such as MATLAB are (i) open-source distribution – both free in the sense of costing no money (“free-as-in-beer”) and having absolutely no restrictions on source code editing or commercial use (“free-as-in-speech”). Among other benefits, (ii) there is a large online community congregated around R ready to help and answer user’s questions. R also provides (iii) an easy development of R web applications or (iv) user-friendly TeX documents typesetting directly via R code. The syntax of R language is simple, intuitive, and quite similar to the syntax of MATLAB language. According to the recent kaggle.com worldwide statistics, R became the most popular programming language chosen for data analysis, data science, and machine learning. Let’s say R is the lingua franca of data science. Class is practice-based and focused on problem-solving, number-crunching exercises, and on real-data analyses solved via hands-on R programming and scripting; assigned tasks follow an easy-to-difficult schedule.

Course syllabus

1st Introduction, installation, settings of the R environment, R data types and structures overview; basic operations, numbers, vectors, and simple manipulation with them, respectively. Quick statistics and summaries for one-dimensional data.

2nd More on data types in R, data structures, and structures manipulation. Arrays, data frames, lists. Effective work with data of usual formats.

3rd Loading external data into R. Saving data from R to a file. Data (pre)processing. How to automatize routine data cleaning using R.

4th Functions in R. Useful built-in functions. User-defined functions in R. Functionalities far beyond MS Excel’s “data processing”.

5th R as a programming language. Loops, conditions, warnings, errors. Automation of data tasks.

6th Elements of statistics and data analysis in R. Gently introduction to probability distributions. Measures of average and variability. Introduction to hypothesis testing in R. Basic graphical visualization. Must-know for data analysis.

7th Advanced statistics and data analysis in R. Linear models, including generalized ones (GLM). Linear regression. Analysis of variance (ANOVA), analysis of covariance (ANCOVA), and their multidimensional alternatives (MANOVA, MANCOVA). Appropriate graphical visualization. The golden standard for scientific articles in (bio)medicine.

8th Logistic regression and its interpretation, visualization. A tool for the classification of patients into classes of disease.

9th Time series. Survival analysis. Appropriate graphical visualization. Introducing a time as a variable in an analysis that matters.

10th Selected advanced statistical methods in R. Cluster analysis. Discriminant analysis. Factor analysis (FA), explanation and clarification of the solution rotation. Principal Component Analysis (PCA). Appropriate graphical visualization.

11th More on graphical outputs in R. Low-level and high-level graphical commands. Multivariate data displaying. Parameters of plots and diagrams. Overview of plots and diagrams in R and how to save a plot to a file. Choosing the most appropriate type of plot to use, given the analysis. How to improve the plot enough to use it in a publication.

*12th Selected methods of machine learning in R. Naïve Bayes classifier. Support Vector Machine (SVM). Cross-Validation (CV). Decision trees. Random forests. Neural networks. Association rules. Jackknife. Bootstrap.

*13th Text processing in R. Handling and processing strings in R. Regular expressions in R. Tokenization, n-gramming. TeX code included within R code. How to add R code or results of data analysis and plots outputted by R into TeX code and typeset a pdf.

*14th Building web applications with R and Shiny package. Shiny package. Components of web application built with R. Using HTML, CSS, and javascript to build R web application.

15th Consultations and individual help with the seminar project.

The topics marked with an asterisk (*) are advanced and, therefore, optional.

  6441

number of views: 1153 author: stuka, last modification: stuka, 31.01.2020
Page ranking: If you think the article is not up-to-date, click here.