python – No module name pyspark error
python – No module name pyspark error
You dont have pyspark
installed in a place available to the python installation youre using. To confirm this, on your command line terminal, with your virtualenv
activated, enter your REPL (python
) and type import pyspark
:
$ python
Python 3.5.0 (default, Dec 3 2015, 09:58:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type help, copyright, credits or license for more information.
>>> import pyspark
Traceback (most recent call last):
File <stdin>, line 1, in <module>
ImportError: No module named pyspark
If you see the No module name pyspark
ImportError you need to install that library. Quit the REPL and type:
pip install pyspark
Then re-enter the repl to confirm it works:
$ python
Python 3.5.0 (default, Dec 3 2015, 09:58:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type help, copyright, credits or license for more information.
>>> import pyspark
>>>
As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:
$ source bin/activate
These instructions are for a unix-based machine, and will vary for Windows.
Just use:
import findspark
findspark.init()
import pyspark # only run after findspark.init()
If you dont have findspark module install it with:
python -m pip install findspark
python – No module name pyspark error
You can use findspark
to make spark accessible at run time. Typically findspark
will find the directory where you have installed spark, but if it is installed in a non-standard location, you can point it to the correct directory. Once you have installed findspark
, if spark is installed at /path/to/spark_home
just put
import findspark
findspark.init(/path/to/spark_home)
at the very top of your script/notebook and you should now be able to access the pyspark module.