Research and documentation
This area contains the research findings and other documentation for the Sports Performance Analysis project for T3 2023.
-
At this stage, this document only explores that data available in the “new Cyclist data.csv” file available from the T2 GitHub repository as we have not yet received Google Big Query access and therefore cannot yet explore the entire available dataset.
This document explores the cycling data available with the focus on how the data can be used for building predictive ML models.
-
Strava Bulk Export Data Description
This document explains the origin of the Strava data used in the Cycling analysis sub-project which was obtained through a bulk export of a team members workout data. The data contained details for multiple sports but the cycling data can be separated for analysis as part of this project.
This document describes the cycling data available with the focus on how the data can be used for building predictive ML models.
-
This document provides information on heart rate zones to support sports performance analysis.
Heart Rate Zones
Heart rate zones are ranges of the heart which, usually expressed as a percentage of your maximum heart rate, that correspond to different levels of exercise intensity. These ranges are used to guide training and exercise intensity for various fitness goals. Heart rate ranges are generally classified into the following zones.
-
The purpose of this document is to provide a snapshot of all the Sports Performance analyses and capture the key objectives.
Overall Objective
The Sports Performance Analysis project aims to deliver comprehensive analysis in various sports, leveraging both real-time and historical data. This project encompasses two main areas: predictive analytics and data visualisation.
-
Developing ML Models for Football Prediction
Football Game Outcome Prediction Analysis
Introduction
Welcome to the Football Game Outcome Prediction Analysis Confluence page! This project aims to predict the outcomes of football games, specifically focusing on the Premier League games that occurred during the 2022-23 season. The predictive models are built using a dataset publicly available on Football-Data.co.uk.
-
Introduction
Web scraping is a powerful technique used to extract data from websites, providing valuable information. It involves using software tools to navigate and interact with web pages, download and parse HTML, and extract relevant information. Web scraping allows users to gather data from various online sources, transforming unstructured web data into a structured format that can be analyzed, stored, or used for various applications. However, it's important to note that web scraping should be performed ethically and in compliance with the terms of service of the websites being accessed.
-
Introduction
Power BI and GitHub are powerful tools in data analysis and software development, respectively. Integrating Power BI with GitHub allows users to visualise and analyse data hosted on GitHub repositories. The primary advantage of this integration is the seamless update process. Any modifications made to the data in GitHub are easily synchronised with the Power BI dashboard, eliminating the need to establish a new connection for each update.
-
Introduction
If you're interested in developing tools to assist cyclists in improving their performance, you've likely come across the term "FTP" or Functional Threshold Power. FTP is a pivotal metric within the cycling community that allows you to assess a cyclist's fitness level, establish accurate training zones, and create tools tailored to enhancing their strength and efficiency. Understanding FTP is crucial when building effective resources for cyclists looking to excel in their sport.
-
Cycling duration prediction models
A number of experiments were performed to test prediction models for duration of a workout based upon previous workout details. These experiments can been seen in the Python Notebook in the Project GitHub repository.
Data Loading and Preprocessing:
- The notebook starts with loading cycling data that has been exported from Strava and that contains numerous attributes like distance, speed, heart rate, power, etc.
-
A number of experiments were performed to test prediction models for duration of a workout based upon previous workout details. These experiments can been seen in the Python Notebook in the Project GitHub repository. FTP is a critical performance metric in cycling, indicating the highest power a rider can sustain for an hour.
-
Introduction
This guide is designed to walk you through the process of integrating Python into Power BI, a synergy that unlocks a new realm of possibilities for data analysis and business intelligence.
Prerequisites