This project tries to clean data and check dataset having acceptable values and then shows implementation of some algorithm in frequent pattern, clustering and classification on Divar dataset.
This stage of the project was checked all columns and rows to have acceptable values. If it has Null or unacceptable values, it was filled by "unknown" or "mean of data in column".
also Completeness was checked in 3 way:
Measurement Function A/B
A: records with no missing attribute
B: Total records in a dataset
A: number of data required for the particular context in the data file
B: number of data in the specified particular context of intended use
A: attribute fields containing values
B: records × attributes
At the end, third method had better result because our missing values were broadcasted.
This stage of project extract pattern
make product category based on popularity rate and people's purchase rate. use agglomerative, k-means and DBSCAN algoritms.
predict price for products using LinearRegression algorithm.