python – No module name pyspark error

python – No module name pyspark error

You dont have pyspark installed in a place available to the python installation youre using. To confirm this, on your command line terminal, with your virtualenv activated, enter your REPL (python) and type import pyspark:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type help, copyright, credits or license for more information.
>>> import pyspark
Traceback (most recent call last):
  File <stdin>, line 1, in <module>
ImportError: No module named pyspark

If you see the No module name pyspark ImportError you need to install that library. Quit the REPL and type:

pip install pyspark

Then re-enter the repl to confirm it works:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type help, copyright, credits or license for more information.
>>> import pyspark
>>>

As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:

$ source bin/activate

These instructions are for a unix-based machine, and will vary for Windows.

Just use:

import findspark
findspark.init()

import pyspark # only run after findspark.init()

If you dont have findspark module install it with:

python -m pip install findspark

python – No module name pyspark error

You can use findspark to make spark accessible at run time. Typically findspark will find the directory where you have installed spark, but if it is installed in a non-standard location, you can point it to the correct directory. Once you have installed findspark, if spark is installed at /path/to/spark_home just put

import findspark
findspark.init(/path/to/spark_home)

at the very top of your script/notebook and you should now be able to access the pyspark module.

Leave a Reply

Your email address will not be published. Required fields are marked *