Link Search Menu Expand Document

Useful Resources

  • Project Workflow Guide: The purpose of this document is to provide data science project groups with some guidelines on how to best structure their projects in a way that encourages consistent workflows across projects and promotes collaboration.

  • Project Outline Template: This document serves as a template for a project outline documentation. It includes project goals, plans for implementation, proposed methologies and links to meeting notes documents. Teams can use this template to structure their project outlines, basing it off their meeting with stakeholders, project documents provided to them.

  • Exploratory Data Analysis: This document gives an introduction to exploratory data analysis. It includes descriptions on popular EDA python packages, different types of data and visualizations, and commonly-used statistical measures used in EDA.

  • EDA Example: This document provides an example of what a complete EDA looks like. See one way to clean, analyze, and explore a Boston 311-service request dataset in python using Pandas, Numpy, MatplotLib, Seaborn, and a geocoder for spatial data.

  • Geographical EDA: This guide provides an introduction to geographical exploratory data analysis using GeoPandas. It works through an example involving the analysis of Boston’s census tracts and free WiFi locations. It covers topics like .shp files, spatial joins, choropleth maps, and finding spatial relationships.

  • Making Sense of the Census: This tutorial provides an in-depth overview of the U.S. Census and the American Community Survey (ACS), detailing their purposes, data collected, and how to access this information. It also explores the census’ geography, race and ethnicity categorizations, and includes a tutorial on using the Census API.

  • Presenting Data Science Findings: This article outlines three fundamental principles for presenting data science findings to clients effectively, emphasizing the importance of data-driven storytelling, audience understanding, and ownership. It includes a case study from a Spark! data science project to illustrate how these principles can be applied in practice.

  • Effective Data Visualizations: This guide walks through some good and bad practices when creating data visualizations. It walks readers through the process of data cleaning, choosing appropriate visualization techniques, and effectively using design elements to convey information in an accessible manner.

  • Topic Modeling: This guide serves as an introduction to topic modeling. It demonstrates how to automatically extract and identify themes or topics from a collection of text documents using techniques like BERTopic.


Table of contents