Skip to content

GGC-DSA/itskills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JobsMatch: Web/Mobile application used by IT students for matching jobs to GGC courses

Description

This tool is designed to assist IT students in exploring different entry-level positions and then view the specific courses at GGC which would provide the entry-level skills for each position they are interested in. The scope of this project for Spring 2021 is to gather job listing datasets, clean and filter the initial datasets, manually test the project's goal, and statistically analyze data about job skills. The result of the project is that we have proven that our process works and that important job-skills data can be captured, filtered, displayed, and can be ultimately used in an application. Significant further development is required. Other IT students at GGC will have the opportunity to provide more research into different entry-level positions and provide the actual development of the Web/Mobile application to create a highly-beneficial working tool.

Spring 2025 Scope: Further develop the website tool by cleaning, analyzing, and creating visualizations from webscraped job posting sites such as LinkedIn, Glassdoor, and Indeed.com. These results will be used to attain the top 5-10 jobs and their correlating primary technical and soft skills depending on major or field. The user selected job/skill will result in the tool displaying courses that teach those necessary skills, ensuring the student will learn the skills necessary to attain their dream job. The team is working to update and revamp the current website to be more interactive and dynamic for users.

Project Demo Video

Project Website

Spring '23: Grizzly Path

Notebook

Final Report

Spring '25 Team

  • Student: Michelle Webb - Data Analyzer/Project Manager, Nikhita Nikhita- Visualization/Project Documenter, Krishan Bhalsod - Data Modeler/Data Analyzer/data cleaning/collection, Lucas Leon Visualization/Client_Liason
  • Advisor: Dr. Anca Doloc-Mihu, Assistant Professor of Information Technology

Fall '23 Team

  • Students: Sam Downs
  • Advisor: Dr. Anca Doloc-Mihu

Spring '23 Team

alt text

  • Students: Anel Coralic, Sam Downs, Ashley Mendez
  • Advisor: Dr. Anca Doloc-Mihu, Assistant Professor of Information Technology

Spring '22 Team

  • Student: Michael Murillo Martinez
  • Advisor: Dr. Anca Doloc-Mihu, Assistant Professor of Information Technology

Summer '21 Team

  • Student: Hugh Smith
  • Advisor: Dr. Anca Doloc-Mihu, Assistant Professor of Information Technology

Publications

STaRS Symposium Poster

CREATE Symposium

Outreach Activities

  • ITEC 2140 Introduction to Java, Profession Xin Xu, April 27, 2021
  • ITEC 2140 Introduction to Java, Profession Xin Xu, April 28, 2021

Technology

Spring '25

Fall '23

Spring '23

Summer '21

I utilized the website kaggle.com to obtain the job listing datasets, Microsoft Excel for data cleansing, Google Drive for online file location, and Google Colab Notebook for python development, data analysis, and display.

  1. https://www.kaggle.com/
  2. https://www.microsoft.com/en-us/microsoft-365/excel
  3. https://drive.google.com/drive/my-drive
  4. https://colab.research.google.com/notebooks/intro.ipynb?utm_source=scs-index
  5. Technical Results: https://github.com/GGC-DSA/itskills/blob/main/media/Technical%20Results/SD%20Skills.png

Project Setup/Installation

  1. Consulted with advisor (and searched online) to find job listing dataset sources
  2. Researched online which job titles are considered entry-level within GGC IT concentrations (SD and DSA)
  3. Downloaded datasets which likely contained job titles in the concentration researched
  4. Created a master spreadsheet to summarize all datasets downloaded
  5. Used MS Excel to explore and filter down to entry-level job titles in each dataset Made sure each dataset of job listings contained a column where the details about required skills, abilities, and responsibilities were specified by employers
  6. Saved dataset copies on Google Drive
  7. Saved a specific subset file for each dataset containing enough of one type of job title:
    1. For example: I created a dataset called DS-8
    2. Using excel I filtered titles down to just "Junior Software Developer" and save that file as "DS-8 Junior Software Developer"
  8. Signed up for access to Google Colab and created the Colab notebook for this project
  9. Created Python code that...
    1. Linked notebook to my Google drive
    2. Loaded single job title datasets (from step 7)
    3. Created dictionary to strip out common words during analysis
    4. Stripped meaningless characters out of data
    5. Created counter object to automatically rank top 1000 words
    6. Created dictionaries to home in on select words during analysis
      • Created MS Excel file "SD Key Skills Python Dictionary Builder"
      • Copied Top 1000 words/ranks from Python results to this file
      • Worked through list to create common word and skill set dictionaries
      • Formatted to put back into Python coding to build dictionaries
    7. Created analysis loop to filter out common words and filter down to skill words
    8. Created histogram plot to display skills words in a ranked order

Usage

  1. Open up colab Notebook for project
  2. Click on left side arrow buttons at each step starting from the top
    1. Run access to Google Drive at start of each session
    2. For each new dataset
      1. Run dataset load for specific subset on Google Drive
      2. Run definitions for excluding common words and garbage characters
      3. Run long list analysis of top 1000 words
      4. Run definitions for focusing in on skill-set words
      5. Run short list analysis of specific skill-sets
        1. Will print lists of skill words in different orders
        2. Will display histogram showing top skill words in chosen order

Project Status

  1. Datasets collected from Jan '23
  2. Cleaned and Analyzed for common skills and job titles
  3. Grizzly Path Website up to date since April '23

Datasets

Spring 2025

Cleaned

Final

Fall 2023

Cleaned

Original

Main methods for Analysis, ML/AI

Spr '25:

  • Python - value counts
  • Prince module for MCA
  • MCA/K-Means for classification and clustering

Fall '23:

  • Python - value_counts()
  • Predicting job titles - Naive Byes, Logistic Regression, Support Vector Machine, Random Forest

2 Main Results

Spring '25:

Tree Map with Courses and Related Skills Systems Security Jobs with SKills

Fall '23:

Systems and Security Common Job Titles Software Developer Common Skills for Web Developer

Spring 2025 Remaining Scope

  1. Have a continuous influx of data with live job posting data
  2. Create a survey for IT faculty to gather which specific skills are taught for each class - Dr. Anca can distribute
  3. Include full range of courses available for domains
  4. Incorporate more data from other popular job sites or elsewhere
  5. Incorporating more interactive visuals as we could not implement our PowerBI dashboard into the website due to paywall
  6. Improve chatbot algorithm for better matching courses with job skills

Fall 2023 Remaining Scope

  1. Create/Update GGC class survey and ask collect data from students using the survey.
  2. Collect more information about Entriprise Systems Classes.
  3. Add a disclaimer page
  4. Refactor the TreeCreation.js file

Spring 2022 Remaing Scope of Project

  1. Create/Update GGC class survey. Needs to be user friendly and easier to extract data.
  2. Ask GGC IT students to complete survey
  3. Associate skills from job titles to GGC courses.
  4. Update Grizzly Path website with GGC courses

About

Published at

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10

Languages