Let’s continue our journey from last time and in case you missed:
Data Exploration in Python for SAS Programmers(Part 1)
The aim of this article is to introduce basic data exploration in python for SAS programmers. Every other job nowadays…
In the second part, we will focus on few descriptive statistics procedures as mentioned here:
SAS/STAT Descriptive Statistics Procedures
Below are highlights of the capabilities of the SAS/STAT procedures that compute descriptive statistics: BOXPLOT…
First let’s look at PROC BOXPLOT. These help us in understanding the distribution of data and show outliers as well.
In Python, there are many ways to do it. We are showing two of the easiest ways: 1. Using Pandas package 2. Using Seaborn package. A side note: you don’t need any packages to achieve our goals but we are exploring easy ways to jump from SAS to Python. Why packages? Packages or libraries contain nice functions and methods that are pre-written for us just like the SAS procedures. I wouldn’t recommend creating your own packages yet if you are new to python.
Second let’s look at PROC CORR. We will look at Pearson, Spearman, and Kendall. Let’s recall what they mean. Pearson — parametric measure of association between two continuous variables; -1 indicates strong negative relationship and +1 indicates strong positive relationship. Spearman — non-parametric measure of association based on rank assigned to data values. Kendall — also non-parametric measure of association but based on the number of concordances and discordances in paired observations.¹
Lastly, let’s look at PROC UNIVARIATE.