I wanted to use the Python APIs like BeautifulSoup, how can we use external python api's along with pyspark http://omz-software.com/pythonista/docs/ios/beautifulsoup_guide.html http://apache-spark-user-list.1001560.n3.nabble.com/How-to-consider-HTML-files-in-Spark-td22017.html https://pypi.python.org/pypi/beautifulsoup4