Exploratory Data Analysis

{loadposition employment}


Project Title:  Exploratory Data Analysis on TCP Metrics (Web10G)



Bryan Learn, PSC Network Programmer, Group Leader
Chris Rapier, PSC Senior Research Programmer



Exploratory data analysis (EDA) is a crucial early step in any data science project. The main goal of EDA is to gain insight about data which then guides the direction of further research. Students will explore a large dataset of network traffic data, specifically TCP statistics. The statistics are from an implementation of RFC4898 known as Web10G. Web10G is a Linux kernel module that provides over 150 metrics about a single TCP flow. Under the guidance of the mentors, the students will apply a variety of analysis techniques on a dataset of over 6 million TCP flows to gain some insight on the 150+ metrics and their impact on TCP performance.


Required Background

Some coding experience in any language.


Recommended Background

Python experience
Some experience with machine learning frameworks (TensorFlow, Torch, etc.)





Learning Focus

Learning exploratory data analysis techniques and what analysis questions to ask to gain insight from data.


Desired Results

Students gain experience with cleaning a dataset for processing, form questions about the data to gain insight, and apply various analysis techniques to find answers to some of their questions about the data.


Desired Major

Computer Science



The student will receive a stipend or course credit for this work.


Please submit your resume and cover letter to Vivian Benton, benton@psc.edu