Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 391 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 391 Bytes

LearnSpark

Learning PySpark

I have writtenn some codes in PySpark.

wordcount.py calculate the word count in PySpark

#Project_3.pdf avrlength.py is to calcualte the average outgoing length

reachable_nodes.py is to calculate the number of nodes which i reachable from start nodes

#Set_Similarity_Join.pptx secondary_sort.py, setSimJoin.py calculate the similarity pair in large data set