Learning PySpark
I have writtenn some codes in PySpark.
wordcount.py calculate the word count in PySpark
#Project_3.pdf avrlength.py is to calcualte the average outgoing length
reachable_nodes.py is to calculate the number of nodes which i reachable from start nodes
#Set_Similarity_Join.pptx secondary_sort.py, setSimJoin.py calculate the similarity pair in large data set