《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
-
Updated
Mar 29, 2021 - Python
《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
Using hadoop to utilize data from an automobile tracking platform that tracks the history of important incidents after the initial sale of a new vehicle.
Lambda to start EMR and run a map reduce job
A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.
This repository have codes that extracts meaningful information from News headline data-set.
A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.
Hadoop MapReduce Python
Lightweight and extensible library to execute MapReduce-like jobs in Python
Market basket analysis of finding frequent itemsets using SON algorithm in Spark
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks…
Performing Map reduce to get the page rank on the WDC data.
Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading
Modified from big-data-europe/docker-hadoop
Emulation-based System for Distributed File storage and Parallel Computation
K-means clustering algorithm using MapReduce.
Distributed Computing using Hadoop, Docker and Python (Map Reduce)
Alternative Mapreduce Simple Example
A Hadoop based Map-Reduce based SQL engine
This project builds a data pipeline implementing the ETL process.
Add a description, image, and links to the mapreduce-python topic page so that developers can more easily learn about it.
To associate your repository with the mapreduce-python topic, visit your repo's landing page and select "manage topics."