Project Title: Exploratory Data Analysis on TCP Metrics (Web10G)
Bryan Learn, PSC Network Programmer, Group Leader
Chris Rapier, PSC Senior Research Programmer
Exploratory data analysis (EDA) is a crucial early step in any data science project. The main goal of EDA is to gain insight about data which then guides the direction of further research. Students will explore a large dataset of network traffic data, specifically TCP statistics. The statistics are from an implementation of RFC4898 known as Web10G. Web10G is a Linux kernel module that provides over 150 metrics about a single TCP flow. Under the guidance of the mentors, the students will apply a variety of analysis techniques on a dataset of over 6 million TCP flows to gain some insight on the 150+ metrics and their impact on TCP performance.
Some coding experience in any language.
Some experience with machine learning frameworks (TensorFlow, Torch, etc.)
Learning exploratory data analysis techniques and what analysis questions to ask to gain insight from data.
Students gain experience with cleaning a dataset for processing, form questions about the data to gain insight, and apply various analysis techniques to find answers to some of their questions about the data.
The student will receive a stipend or course credit for this work.
Please submit your resume and cover letter to Vivian Benton, firstname.lastname@example.org.