Programming Languages in Data Analysis

Space saving for paragraph from Helen

 

  1. Comparison of R, SAS, Python, Julia, and Stata
LanguageProsCons
R (Highly recommended!)- Extensive collection of packages for statistical analysis and data visualization
- Strong community support and comprehensive documentation
- Highly extensible and flexible
- Excellent for creating high-quality, publication-ready graphics
- Steeper learning curve for beginners
- Performance can be slower with large datasets
- Memory management can be challenging with very large datasets
SAS- Widely used in industry
- Comprehensive suite of tools for advanced analytics
- Highly reliable and robust
- Excellent for handling large datasets efficiently
- Expensive licensing costs
- Proprietary software with less flexibility
- Less intuitive for exploratory data analysis and visualization
Python- Versatile and widely used for various applications
- Rich ecosystem of libraries for data manipulation and analysis
- Great integration capabilities with other systems
- Strong community support and resources
- Performance can be slower than compiled languages
- Visualization capabilities less polished than R
- Some statistical packages may not be as comprehensive as R or SAS
Julia- Designed for high-performance numerical and scientific computing
- Combines ease of use of high-level languages with performance of low-level languages
- Growing ecosystem of packages
- Excellent for parallel and distributed computing
- Relatively newer language with a smaller community
- Less mature in terms of available libraries
- May require more effort to find solutions to problems
Stata- User-friendly interface and easy to learn
- Well-suited for econometrics and social science research
- Comprehensive suite of built-in commands
- Excellent documentation and customer support
- Proprietary software with licensing costs
- Less flexibility and extensibility
- Limited community-contributed resources

 

2.  University of Arizona (UA) and Online Resources 

LanguageUA CoursesOnline Resources
R- STAT 675 - Statistical Consulting
- Data Science Institute Workshops
- Coursera: "R Programming" by Johns Hopkins University.
- DataCamp: "Introduction to R"
- Swirl
SAS

- BIOS 576A - Biostatistics in Public Health

- BIOS 576D - Data Management and the SAS Programming Language

- SAS Official Training
- Coursera: "Getting Started with SAS Programming" by SAS.
Python- INFO 520 - Programming for Informatics Applications
- CSC 110 - Introduction to Computer Programming I
- Coursera: "Python for Everybody" Specialization by the University of Michigan.
- edX: "Introduction to Computer Science using Python" by MIT.
Julia- Data Science Institute Workshops- JuliaAcademy
- DataCamp: Provides courses like "Introduction to Julia" for data science.
Stata- BIOS 576A - Biostatistics in Public Health- Stata Official Training
- UCLA Institute for Digital Research and Education

 

3.  Certificate Information 

LanguageCertifications
R- Data Science Specialization (Coursera, Johns Hopkins University)
- R Programming (Coursera, Johns Hopkins University)
- Professional Certificate in Data Science (edX, Harvard University)
- RStudio Certification (Rstudio)
SAS- SAS Certified Base Programmer for SAS 9 (SAS)
- SAS Certified Advanced Programmer for SAS 9 (SAS)
- SAS Certified Statistical Business Analyst (SAS)
- SAS Certified Data Scientist (SAS)
Python- PCAP – Certified Associate in Python Programming (Python Institute)
- Python for Everybody Specialization (Coursera, University of Michigan)
- Introduction to Computer Science using Python (edX, MIT)
- Python Basics for Data Science (edX, IBM)
Julia- Introduction to Julia (JuliaAcademy)
- Data Science with Julia (JuliaAcademy)