Data Exploration in Python for SAS Programmers(Part 2)

Let’s continue our journey from last time and in case you missed:

In the second part, we will focus on few descriptive statistics procedures as mentioned here:

First let’s look at PROC BOXPLOT. These help us in understanding the distribution of data and show outliers as well.

SAS Code:

In Python, there are many ways to do it. We are showing two of the easiest ways: 1. Using Pandas package 2. Using Seaborn package. A side note: you don’t need any packages to achieve our goals but we are exploring easy ways to jump from SAS to Python. Why packages? Packages or libraries contain nice functions and methods that are pre-written for us just like the SAS procedures. I wouldn’t recommend creating your own packages yet if you are new to python.

Using Pandas:

Using Seaborn:

Second let’s look at PROC CORR. We will look at Pearson, Spearman, and Kendall. Let’s recall what they mean. Pearson — parametric measure of association between two continuous variables; -1 indicates strong negative relationship and +1 indicates strong positive relationship. Spearman — non-parametric measure of association based on rank assigned to data values. Kendall — also non-parametric measure of association but based on the number of concordances and discordances in paired observations.¹

Python code:

Lastly, let’s look at PROC UNIVARIATE.

Python code:

References:

  1. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=procstat&docsetTarget=procstat_corr_examples01.htm&locale=en

I am working as a Data Scientist at a bank in Canada. Passionate about data science since 2014. I started with SQL , R and SAS; later picked up Python.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store