Those who are facing problems like 'spark-shell' is not recognized as an internal or external command On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too) And then write spark-shell or pyspark (It finally worked for me, hope it works for you too) If it worked, like this so that more people benefit from this
'pyspark' is not recognized as an internal or external command, operable program or batch file. getting this error and tried it for whole day and same issue.
On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too) And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
@@srishtimadaan03 hello....but we added spark home in environment variables, what is the point of running it from the exact location? Environment variables should help system to find the command.
On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too) And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
Thank you so much for this video. Unfortunately, I couldn't complete this - getting this erros C:\Users\Ismahil>spark-shell 'cmd' is not recognized as an internal or external command, operable program or batch file. please help
I was having the same issue as you, and it turn to work, simply doing what Jesus said (OMG!): "Run it as admin". Thanks everyone.. Indeed, Jesus saves us!!
I am not able to find the package type: pre-build for Apache Hadoop 2.7 in the drop-down. FYI - my spark release versions that i can see in the spark releases are 3.4.3 and 3.5.1.
This worked so well for me :-) The pace is great and your explanations are clear. I am so glad i came across this, thanks a million! 😄 I have subscribed to your channel!!
Ok guys this is how to do it, incase you are having problems👇 1.) I used the latest version 3.5.0, (Pre-built for apache hadoop 3.3 or later) - downloaded it. 2.) Extracted the zip file just as done, the first time it gave me a file, not a folder but a .rar file which winrar could not unzip, so I used 7-zip and it finally extracted to a folder that had the bins and all the other files. 3.) In the system variables he forgot to edit the path variables and to add %SPARK_HOME%\bin. 4.) Downloaded winutils.exe for hadoop 3.0.0 form the link provided in the video. 5.) Added it the same way but c>Hadoop>bin>winutils.exe 6.) Then edit the user variables as done then do the same to the path %HADOOP_HOME%\bin Reply for any parts you might have failed to understand🙂
@@sarahq6497 me too... when i am running the spark-shell command from the exact spark location on cd, it works... but when i run it just after opening cmd, it doesn't it gives error like spark-shell is not found
Hello, I have been trying to install it for some days too, I keep getting an error when I try to run the spark shell command is not recognized any suggestions?
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
@@ampcode bro I followed every step you said, but in CMD when I gave "spark-shell", it displayed " 'spark-shell' is not recognized as an internal or external command, operable program or batch file." Do you know how to solve this?
Hi, i followed exact steps (installed spark 3.2.4 as that is the only version available for hadoop 2.7). Spark-shell command is working but pyspark is thrwing errors. if anyone has fix to this please help me. Thanks
@@JustinLi-y6q did you get it? i was having the same issue, but i downgraded my java version to version 17 and it is now working fine. The Java 23 is not compatible with spark 3. XXX i think. Did not work for me
I am getting a message of 'spark-version' is not recognized as an internal or external command, operable program or batch file. This is after setting up the path in environment variables for PYSPARK_HOME.
I got the below error while running spyder even though i have added the PYTHONPATH. File ~\anaconda\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec exec(code, globals, locals) File c:\users\justa\.spyder-py3\temp.py:26 df = spark.createDataFrame(data = data, schema = columns) File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1276 in createDataFrame return self._create_dataframe( File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1318 in _create_dataframe rdd, struct = self._createFromLocal(map(prepare, data), schema) File ~\anaconda\lib\site-packages\pyspark\sql\session.py:962 in _createFromLocal struct = self._inferSchemaFromList(data, names=schema) File ~\anaconda\lib\site-packages\pyspark\sql\session.py:834 in _inferSchemaFromList infer_array_from_first_element = self._jconf.legacyInferArrayTypeFromFirstElement() File ~\anaconda\lib\site-packages\py4j\java_gateway.py:1322 in __call__ return_value = get_return_value( File ~\anaconda\lib\site-packages\pyspark\errors\exceptions\captured.py:169 in deco return f(*a, **kw) File ~\anaconda\lib\site-packages\py4j\protocol.py:330 in get_return_value raise Py4JError( Py4JError: An error occurred while calling o29.legacyInferArrayTypeFromFirstElement. Trace: py4j.Py4JException: Method legacyInferArrayTypeFromFirstElement([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:1623)
I have done exactly you shown in tutorial but when I am running the spark-shell command in cmd getting "spark-shell The system cannot find the path specified."
spark-shell is working for me, pyspark is not working from home directory, getting error 'C:\Users\Sana>pyspark '#' is not recognized as an internal or external command, operable program or batch file.' But when I go to python path and run the cmd pyspark is working. I have setup the SPARK_HOME and PYSPARK_HOME environment variables. Could you please help me. Thanks
while launching the spark-shell getting the following error, any idea?? WARN jline: Failed to load history java.nio.file.AccessDeniedException: C:\Users\sanch\.scala_history_jline3
after 11:17 I am getting this error: 'spark-shell' is not recognized as an internal or external command, operable program or batch file. I have checked the environment variables too.
Hello..sorry for late response...could you please navigate once to the spark bin folder and open the CMD there and kick off the spark-shell command? If the spark works fine in the bin directory then definitely it will be the issue with environment variables. Please let me know if any difficulties. :)
@@ampcode now this is error I am getting after getting into the bin folder C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin> spark-shell The system cannot find the path specified.
I was having this issue as well, when I added the %SPARK_HOME%\bin, %HADOOP_HOME%\bin and %JAVA_HOME%\bin to the User variables (top box, in the video he shows doing system, bottom box) it worked. Good luck.
Hi, I completed the process step by step and everything else is working but when I run 'spark-shell' , it shows - 'spark-shell' is not recognized as an internal or external command, operable program or batch file. Do you know what went wrong?
I did everything until the environment variables setup, still while using cmd spark-shell it is giving me "'spark-shell' is not recognized as an internal or external command, operable program or batch file." versions I used - For Java: java version "11.0.24" 2024-07-16 LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.24+7-LTS-271) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.24+7-LTS-271, mixed mode) For Python: Python 3.11.0rc2 For Spark: spark-3.5.3-bin-hadoop3 For Hadoop: (file from below location) winutils/hadoop-3.3.6/bin /winutils.exe
Getthing this error: WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped. People have mentioned to use python fodler path which I have as you have mentioned but still.
I found a fix for this. Change your python path to that of anaconda(within the environment variable section of this video) and use your anaconda command prompt instead. No errors will pop up again.
thank you so much, very helpful! The only error I got was running spark-shell, but from other comments I figured out that you can either run the command prompt as admin or cd into the spark folder and then call it
everything is working fine until i run "pyspark" in my command prompt which shoes an error "ERROR: The process with PID 38016 (child process of PID 30404) could not be terminated. Reason: There is no running instance of the task. ERROR: The process with PID 30404 (child process of PID 7412) could not be terminated. Reason: There is no running instance of the task."
After entering pyspark in cmd it shows "The system cannot find the path specified. Files\Python310\python.exe was unexpected at this time" please help me resolve it
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path I am getting above error while running spark or pyspark session. I have ensured that winutils file is present in C:\hadoop\bin
I followed the steps & Installed JDK 17, spark 3.5 and python 3.12 when I am trying to use map function I am getting an Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe Error please someone help me
Hi, I have installed Hadoop 3.3 (the lastest one) as 2.7 was not available. But while downloading winutils, we don't have for Hadoop 3.3 in repository. Where do i get it from?
Hello when I try to run the command spark_shell as a local user its not working (not recognized as an internal or external command) and it only works if I use it as an administratror. Can you please help me solve this? Thanks.
Sorry for late response. Could you please try once running the same command from the spark/bin directory and let me know. I guess there might be some issues with your environment vatiables🤔
thanks a lot pyspark is opening but when executing df.show() command on a dataframe i get below error Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified is there any way to rectify it
I have followed same thing what ever u have done while u have explained, I have observed and same thing I did but both spark and pyspark is not working,Can you please help me how to resolve the issue as after giving cmd then typing spark-shell it's showing spark- shell is not recognised as internal or external command same thing for spark also . Please help me how to overcome from this 🙏🙏🙏🙏🙏🙏🙏
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
'spark' is not recognized as an internal or external command, operable program or batch file. its not working for me i have follow all the steps but its still not working waiting for solution
Sorry for late reply. Could you please check if your spark-shell is running properly from the bin folder. If yes I guess there are some issues with your env variables only. Please let me know.
I followed the step by step and when I search for spark-shel at the command prompt I come across the message :( 'spark-shell' is not recognized as a built-in command or external, an operable program or a batch file). I installed windows on another HD and did everything right, there are more people with this problem, can you help us? I'm since January trying to use pyspark on windows
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
Even I'm facing the same issue ,can you tell in more detail like what to do add in system variables??As we already added Java , Hadoop, Spark and Pyspark_Home in the user varaibles as said in the video.@@nayanagrawal9878
If you tried all the steps mentioned above and it still does not work, try to add "C:\Windows\System32" to system variable "path". It fixed the error after 2 days of struggling
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
Sorry for late response. This could happen in windows only and can be safely ignored. Could you please confirm if you’re able to kick off spark-shell and pyspark?
i have fallowed all these steps and installed those 3 and created paths too, but when i go to check in the command prompt... its not working.. error came... can anyone help me please to correct this
Hi @AmpCode Thanks for the great tutorial. I followed each steps and spark is working fine. But when I'm executing some of my pyspark script, I'm getting below Hadoop error: ERROR SparkContext: Error initializing SparkContext. java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. Can you please help me on this urgently.. I have set all paths as you showed in video but I'm not able to solve this error. Please Help.
Excellent tutorial! I followed along and nothing worked in the end :) StackOverflow told me that "C:Windows\system32" is also required in the PATH variable for spark to work. I added it and spark started working.
This was the game changer, also check the the extracted spark folder isn't in a folder of another folder(3 clicks to see the files). Should just be the spark folder you created and inside that folder another folder with the extracted spark filies.(2 clicks to see the files)
I was able to get around this by copying manually the URL of the site you were opened up to after selecting the 2.7th version from the dropdown. Seems like they have archived it.
Sorry for late response. Then this might be the issues with your environment variables. Could you please verify if they are set correctly and let me know.
@@chittardhar8861 Hi, did you add it in the system variables or user variables ? (Speaking about C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path right ?) So frustrating that it is not working @AmpCode
Extremely sorry for that. I have added it in the description as well as pasting it here. GitHUB: github.com/steveloughran/winutils Hope this is helpful! :)
Bro while executing spark-shell getting an error ReplGlobal.abort: bad constant pool index: 0 at pos: 49180 [init] error: bad constant pool index: 0 at pos: 49180 while compiling: during phase: globalPhase=, enteringPhase= library version: version 2.12.17 compiler version: version 2.12.17 reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b epl-3fc51940-943d-416d-ab37-074575e4ad8d last tree to typer: EmptyTree tree position: tree tpe: symbol: null call site: in == Source file context for tree position == Exception in thread "main" scala.reflect.internal.FatalError: bad constant pool index: 0 at pos: 49180 while compiling: during phase: globalPhase=, enteringPhase= library version: version 2.12.17 compiler version: version 2.12.17 reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b epl-3fc51940-943d-416d-ab37-074575e4ad8d
spark shell not working
Those who are facing problems like 'spark-shell' is not recognized as an internal or external command
On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
If it worked, like this so that more people benefit from this
It worked .. Thank you
It worked, thanks :)
Thank you 😊 so much it worked
Thank you 😊 so much it worked
why did we get this error?
how is your spark shell running from your users directory?
its not running for me
did it workfor you now? same issue ffacing here
'pyspark' is not recognized as an internal or external command,
operable program or batch file.
getting this error and tried it for whole day and same issue.
On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
@@srishtimadaan03 hello....but we added spark home in environment variables, what is the point of running it from the exact location? Environment variables should help system to find the command.
Did Everything as per the video, still getting this error : The system cannot find the path specified. on using spark-shell
On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
@@srishtimadaan03 Yes!
@@srishtimadaan03 it doesnt work for me....fuc fuc fuc fuc what do i doooo
Thank you so much for this video. Unfortunately, I couldn't complete this - getting this erros C:\Users\Ismahil>spark-shell
'cmd' is not recognized as an internal or external command,
operable program or batch file. please help
execute as admin
@@JesusSevillanoZamarreno-cu5hk You are the bestest and sweetest in the world
I was having the same issue as you, and it turn to work, simply doing what Jesus said (OMG!): "Run it as admin". Thanks everyone.. Indeed, Jesus saves us!!
What I was doing in 2 days, you narrowed to 30 mins!! Thank you!!
Thank you so much! Subscribe for more content 😊
The best video on installing payspark, even in 2024. Many thanks to the author!
which spark version did u downloaded ?
@playtrip7528 I downloaded 3.5.3 and pre build for Hadoop 3.3 with 3.0.0 winutils
@@playtrip7528 I downloaded a 3.5.3 version of pyspark and 3.3 pre built for Hadoop with 3.0.0 winutils
i did every step you have said, but still spark is not working
I followed all the setps but getting error
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
the same happens to me
Facing same error, Did you find any solution for it?
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
WHAT A PROCESS!! It worked for me just by run spark-shell in cmd as ADMIN. thank you for the video!
I am not able to find the package type: pre-build for Apache Hadoop 2.7 in the drop-down. FYI - my spark release versions that i can see in the spark releases are 3.4.3 and 3.5.1.
How did you solve this? I am running into the same issue
This worked so well for me :-) The pace is great and your explanations are clear. I am so glad i came across this, thanks a million! 😄 I have subscribed to your channel!!
Ok guys this is how to do it, incase you are having problems👇
1.) I used the latest version 3.5.0, (Pre-built for apache hadoop 3.3 or later) - downloaded it.
2.) Extracted the zip file just as done, the first time it gave me a file, not a folder but a .rar file which winrar could not unzip, so I used 7-zip and it finally extracted to a folder that had the bins and all the other files.
3.) In the system variables he forgot to edit the path variables and to add %SPARK_HOME%\bin.
4.) Downloaded winutils.exe for hadoop 3.0.0 form the link provided in the video.
5.) Added it the same way but c>Hadoop>bin>winutils.exe
6.) Then edit the user variables as done then do the same to the path %HADOOP_HOME%\bin
Reply for any parts you might have failed to understand🙂
What do you mean for the 3rd step ?
Thanks
Thank you so much 😊
Hello, I had to use the latest version as well, but I'm not able to make it work, I followed the tutorial exactly :(
@@sarahq6497 me too... when i am running the spark-shell command from the exact spark location on cd, it works... but when i run it just after opening cmd, it doesn't it gives error like spark-shell is not found
Thanks bro fixed it after struggling for 2 days 2 nights 2hours 9mins.
Hello, I have been trying to install it for some days too, I keep getting an error when I try to run the spark shell command is not recognized any suggestions?
In cmd the comand spark-shell is running only under C:\Spark\spark-3.5.0-bin-hadoop3\bin directory not globally
same for pyspark
yeah man , same for me.. did you found any fixes... if, let me know :)
@@s_a_i5809 add your Environment variables under system variables not user variables.
100 % working solution
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg
I added C:\Program Files\spark\spark-3.5.1-bin-hadoop3\bin to the system variables and it worked
@@lucaswolff5504 yes
i have followed all your steps,still i'm facing an issue.
'spark2-shell' is not recognized as an internal or external command
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
Step by step spark + PySpark in pycharm solution video
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc
I don't have the option for Hadoop 2.7 what to choose now???
did you get any solution?
please let me know
@@LLM_np NO
Very helpful video. Just by following the steps you mentioned I could run the spark on my windows laptop. Thanks a lot for making this video!!
Thank you so much!😊
@@ampcode bro I followed every step you said, but in CMD when I gave "spark-shell", it displayed " 'spark-shell' is not recognized as an internal or external command,
operable program or batch file." Do you know how to solve this?
@@iniyaninba489 add same path in User Variables Path also, just like how u added in System Variables Path
Hi, i followed exact steps (installed spark 3.2.4 as that is the only version available for hadoop 2.7). Spark-shell command is working but pyspark is thrwing errors.
if anyone has fix to this please help me.
Thanks
Step by step solution
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc
I have followed whole instruction but when I am running spark -shell is not recognised
same here
@@JustinLi-y6q did you get it? i was having the same issue, but i downgraded my java version to version 17 and it is now working fine. The Java 23 is not compatible with spark 3. XXX i think. Did not work for me
I am getting a message of 'spark-version' is not recognized as an internal or external command,
operable program or batch file. This is after setting up the path in environment variables for PYSPARK_HOME.
try running as administrator.
I’m little confused on how to setup the PYTHONHOME environment variable
Step by step
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc
Great. I followed the instructions and successfully installed spark. Thank you!
I got the below error while running spyder even though i have added the PYTHONPATH.
File ~\anaconda\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\users\justa\.spyder-py3\temp.py:26
df = spark.createDataFrame(data = data, schema = columns)
File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1276 in createDataFrame
return self._create_dataframe(
File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1318 in _create_dataframe
rdd, struct = self._createFromLocal(map(prepare, data), schema)
File ~\anaconda\lib\site-packages\pyspark\sql\session.py:962 in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File ~\anaconda\lib\site-packages\pyspark\sql\session.py:834 in _inferSchemaFromList
infer_array_from_first_element = self._jconf.legacyInferArrayTypeFromFirstElement()
File ~\anaconda\lib\site-packages\py4j\java_gateway.py:1322 in __call__
return_value = get_return_value(
File ~\anaconda\lib\site-packages\pyspark\errors\exceptions\captured.py:169 in deco
return f(*a, **kw)
File ~\anaconda\lib\site-packages\py4j\protocol.py:330 in get_return_value
raise Py4JError(
Py4JError: An error occurred while calling o29.legacyInferArrayTypeFromFirstElement. Trace:
py4j.Py4JException: Method legacyInferArrayTypeFromFirstElement([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1623)
Sorry for late response. Could you please check if you are able to run spark-submit using cmd?
100 % working solution
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg
I have done exactly you shown in tutorial but when I am running the spark-shell command in cmd getting "spark-shell
The system cannot find the path specified."
yes same error..did you find out the colustion
@@ganeshkalaivani6250 yes the spark don't support with latest java and python version try with java 1.8 and python 3.7 and spark 2.7
@@satishboddula4942 can you please share the java 1.8 download link jdk showing only 18,19 and 20 version
@@satishboddula4942 still system path cannot find out error
C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin\ add this to env var path
still hadoop not recognize even with your installation it give you a warning message "
unable to load native.hadoop library"
spark-shell is working for me, pyspark is not working from home directory, getting error 'C:\Users\Sana>pyspark
'#' is not recognized as an internal or external command,
operable program or batch file.'
But when I go to python path and run the cmd pyspark is working. I have setup the SPARK_HOME and PYSPARK_HOME environment variables. Could you please help me. Thanks
Sorry for late response. Could you please also set PYSPARK_HOME as well to your python.exe path. I hope this will solve the issue😅👍
@@ampcode nope. Same error
A very good and amazing content. You can not find better place then this video to setup pyspark (Y).
while launching the spark-shell getting the following error, any idea??
WARN jline: Failed to load history
java.nio.file.AccessDeniedException: C:\Users\sanch\.scala_history_jline3
resolved hua ?
I followed all the steps but I'm getting this error
'spark-shell' is not recognized as an internal or external command, operable program or batch file
Yeah I'm also facing this same issue
100 % working solution
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg
after 11:17 I am getting this error:
'spark-shell' is not recognized as an internal or external command, operable program or batch file.
I have checked the environment variables too.
Hello..sorry for late response...could you please navigate once to the spark bin folder and open the CMD there and kick off the spark-shell command? If the spark works fine in the bin directory then definitely it will be the issue with environment variables.
Please let me know if any difficulties. :)
@@ampcode now this is error I am getting after getting into the bin folder
C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin> spark-shell
The system cannot find the path specified.
My problem finally got solved tonight... I needed to add this C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path
I'm very glad you solved your problem. Cheers!
@@abhinavtiwari6186 I try this also but same issue only "The system cannot find the path specified"
Hi, following all the steps given in video, I am still getting error as "cannot recognize spark-shell as internal or external command" @Ampcode
I was having this issue as well, when I added the %SPARK_HOME%\bin, %HADOOP_HOME%\bin and %JAVA_HOME%\bin to the User variables (top box, in the video he shows doing system, bottom box) it worked. Good luck.
Step by step spark + PySpark in pycharm solution video
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc
can someone help , i have downloaded hadoop 3.3 which is the newest version but it is not showing in github . what to do ?
Hi, I completed the process step by step and everything else is working but when I run 'spark-shell' , it shows - 'spark-shell' is not recognized as an internal or external command,
operable program or batch file. Do you know what went wrong?
I'm having this same problem, the command only works if I run CMD as an administrator. Did you manage to solve it?
@@viniciusfigueiredo6740 same as you, run as administrator works
@@viniciusfigueiredo6740 same issue is happening with me
@@viniciusfigueiredo6740same issue for me did u fix it?
Anyone solved this?
I did everything until the environment variables setup, still while using cmd spark-shell it is giving me "'spark-shell' is not recognized as an internal or external command,
operable program or batch file."
versions I used -
For Java:
java version "11.0.24" 2024-07-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.24+7-LTS-271)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.24+7-LTS-271, mixed mode)
For Python:
Python 3.11.0rc2
For Spark:
spark-3.5.3-bin-hadoop3
For Hadoop: (file from below location)
winutils/hadoop-3.3.6/bin
/winutils.exe
It worked, my friend. The instructions were concise and straightforward.
can we connect ?
Getthing this error: WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped. People have mentioned to use python fodler path which I have as you have mentioned but still.
I found a fix for this. Change your python path to that of anaconda(within the environment variable section of this video) and use your anaconda command prompt instead. No errors will pop up again.
Sorry for late response. Could you please let me know if you are still facing this issue and also confirm if you’re able to open spark-shell?
@@bukunmiadebanjo9684 Hi Adebanjo, my error got resolved with you solution. Thanks for your help!
I have followed all the steps and added all the system variables but at that time winutils file was not present in my system
Now I have that file how to make the changes plz let me know
thank you so much, very helpful! The only error I got was running spark-shell, but from other comments I figured out that you can either run the command prompt as admin or cd into the spark folder and then call it
in CMD while running spark-shell getting error like ERROR SparkContext: Error initializing SparkContext.
Excellent! Thank you for making this helpful lecture! You relieved my headache, and I did not give up.
Thank you so much!
hey , which version of hadoop did you install because the 2.7 wasn't available
everything is working fine until i run "pyspark" in my command prompt which shoes an error "ERROR: The process with PID 38016 (child process of PID 30404) could not be terminated.
Reason: There is no running instance of the task.
ERROR: The process with PID 30404 (child process of PID 7412) could not be terminated.
Reason: There is no running instance of the task."
me too
have find a solution?
After entering pyspark in cmd it shows "The system cannot find the path specified. Files\Python310\python.exe was unexpected at this time" please help me resolve it
i face the same problem. is there any solution
installed successfully but when i am checking hadoop version, i am getting an like hadoop is not recognized as internal or external command
Great ! got SPARK working on Windows 10 -- Good work !
Thank you so much! Subscribe for more content 😊
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
I am getting above error while running spark or pyspark session.
I have ensured that winutils file is present in C:\hadoop\bin
Could you please let me know if your all the env variables are set properly?
I followed the steps & Installed JDK 17, spark 3.5 and python 3.12 when I am trying to use map function I am getting an Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe Error please someone help me
same problem 😢
This works as smooth as butter. Be patient that's it! Once set up done, no looking back.
Bro, which version of spark & winutils you've downloaded. I took 3.5.1 and hadoop-3.0.0/bin/winutils but not worked
@@SUDARSANCHAKRADHARAkula same for me!
Apache 2.7 option not available during spark download. Can we choose Apache Hadoop 3.3 and later ( scala2.13) as package type during download
sir, spark version is available with Hadoop 3.0 only. Spark-shell not recognized as internal or external command. Please do help.
Hi, I have installed Hadoop 3.3 (the lastest one) as 2.7 was not available. But while downloading winutils, we don't have for Hadoop 3.3 in repository. Where do i get it from?
Same here.Did u get it now?
@@sriram_L yes, u can directly get it from google by simply mention the Hadoop version for which u want winutils. I hope this helps.
@@sriram_L it still not working for me though
hadoop 2.7 tar file is not available in the link
100 % working solution
ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg
I'm getting spark- shell is not recognised as an internal or external command, operable program or batch file
Video is very helpful. Thanks for sharing
Thank you so much!
my Apache hadoop which i downloaded previously is version 3.3.4 eventhough i should choose pre-built for Apache Hadoop 2.7?
Same doubt bro.
Did u install now
Hello when I try to run the command spark_shell as a local user its not working (not recognized as an internal or external command) and it only works if I use it as an administratror. Can you please help me solve this? Thanks.
Sorry for late response. Could you please try once running the same command from the spark/bin directory and let me know. I guess there might be some issues with your environment vatiables🤔
@@ampcode followed each and every step of video still getting not recognised as an internal or external command error
@@dishantgupta1489 open fresh cmd prompt window and try after you save the environment variables
In Environment Variables you give the paths in Users variable Admin. NOT IN System variables
thanks a lot pyspark is opening but when executing df.show() command on a dataframe i get below error
Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
is there any way to rectify it
Did you get the solution.. i am also facing the same issue.
I can't see Pre-Built for Apache Hadoop 2.7 on the spark website
same problem for me! I tried the "3.3 and later" version with the "winutils/hadoop-3.0.0/bin", but it didn't work
spark runs only on java 8 or 11 version it doesn't work with latest version I've tried it
I have followed same thing what ever u have done while u have explained, I have observed and same thing I did but both spark and pyspark is not working,Can you please help me how to resolve the issue as after giving cmd then typing spark-shell it's showing spark- shell is not recognised as internal or external command same thing for spark also . Please help me how to overcome from this 🙏🙏🙏🙏🙏🙏🙏
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
not getting the ui says: docker refused to connect
'spark' is not recognized as an internal or external command,
operable program or batch file. its not working for me i have follow all the steps but its still not working waiting for solution
I am getting errors continuously after doing the same procedure as well, please reply to me.
can any one please help...last two days tried to install spark and give correct variable path but still getting system path not speicifed
Sorry for late reply. Could you please check if your spark-shell is running properly from the bin folder. If yes I guess there are some issues with your env variables only. Please let me know.
Im getting bad constant pool index error. Please help
I followed the step by step and when I search for spark-shel at the command prompt I come across the message :( 'spark-shell' is not recognized as a built-in command or external, an operable program or a batch file). I installed windows on another HD and did everything right, there are more people with this problem, can you help us? I'm since January trying to use pyspark on windows
Need to edit bottom "add this to env var path"
path >> C:\Spark\spark-3.3.1-bin-hadoop2\bin\
@@letsexplorewithzak3614 Thanks worked for me
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
Even I'm facing the same issue ,can you tell in more detail like what to do add in system variables??As we already added Java , Hadoop, Spark and Pyspark_Home in the user varaibles as said in the video.@@nayanagrawal9878
@@nayanagrawal9878 thank you!!! I did this and it solved my problem
Did not work for me. At last when I typed the pyspark in command prompt, it did not worked.
FileNotFoundError: [WinError 2] The system cannot find the file specified getting this error even i have installed all required intalliation
Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!
Hi, Thanks for the steps. I am unable to see Web UI after installing pyspark. It gives This URL can't be reached. Kindly help
winutil file is not downloading from that github link
Yes brother.Did u get it now from anywhere?
I'm still unable to get this to work. I've been trying to solve this problem for nearly 2 weeks
hi i installed but when I restarted my pc it is no longer running from cmd? what might be the issue?
not working for me i set up everything except hadoop version came with 3.0
I have an issue with the pyspark it's not working and it's related to java class I can't realy understant what is wrong ???
Thank you! It is clear and much helpful!! from Ethiopia
hey pyspark isnt working at my pc. I did everything how you asked. Can you help please
Sorry for late response. Could you please also set PYSPARK_HOME env variable to the python.exe path. I guess this’ll do the trick😅👍
and when downloading the spark a set of files came to download not the tar file
If you tried all the steps mentioned above and it still does not work, try to add "C:\Windows\System32" to system variable "path". It fixed the error after 2 days of struggling
I get a noModuleError saying pyspark does not contain numpy module. I followed all the steps. Can you please help??
Hello, Are you trying to use numpy in your code. If so, have you installed pandas package? Please let me know so we can solve this issue😃
@@ampcode how to install pandas in pyspark
@@nuzairmohamed5345 you can run command as below:
pip install pandas
Please let me know if any issues.
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.-- Getting this error
Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.
C:\Users\lavdeepk>spark-shell
'spark-shell' is not recognized as an internal or external command,
operable program or batch file.
Not working
which winutil file did u download? Its Hadoop 2.7 or later version?
you haven't give solution for that warn procfsMetricsGetter exception is there any solution for that ?
Sorry for late response. This could happen in windows only and can be safely ignored. Could you please confirm if you’re able to kick off spark-shell and pyspark?
I have some issues in launching python & pyspark. I need some help. Can you pls help me?
same, did you fix it? it worked for scala for me but not spark
java,python and spark should be in same directory?
i have fallowed all these steps and installed those 3 and created paths too, but when i go to check in the command prompt... its not working.. error came... can anyone help me please to correct this
Hi @AmpCode
Thanks for the great tutorial.
I followed each steps and spark is working fine.
But when I'm executing some of my pyspark script, I'm getting below Hadoop error:
ERROR SparkContext: Error initializing SparkContext.
java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
Can you please help me on this urgently..
I have set all paths as you showed in video but I'm not able to solve this error.
Please Help.
Excellent tutorial! I followed along and nothing worked in the end :)
StackOverflow told me that "C:Windows\system32" is also required in the PATH variable for spark to work. I added it and spark started working.
helped
@@Manojprapagar happy to hear it!
Thank you so much!
This was the game changer, also check the the extracted spark folder isn't in a folder of another folder(3 clicks to see the files). Should just be the spark folder you created and inside that folder another folder with the extracted spark filies.(2 clicks to see the files)
Every now and then we receive alert from Oracle to upgrade JDK. Do we need to upgrade our JDK version? If we upgrade, will it impact running of spark.
on apache spark's installation page, under choose a package type, the 2.7 version seem to not be any option anymore as on 04/28/2023. What to do?
I was able to get around this by copying manually the URL of the site you were opened up to after selecting the 2.7th version from the dropdown. Seems like they have archived it.
Sorry for late reply. I hope your issue is resolved. If not we can discuss further on it!
my spark shell command is working when opened from bin folder , but it's not working in normal cmd , please help
Sorry for late response. Then this might be the issues with your environment variables. Could you please verify if they are set correctly and let me know.
@@ampcode yup , i have to add 1 more environment variable which i got to know from other comments. Your video is great. Thank you so much.
@@chittardhar8861 Thank you so much😊
@@chittardhar8861 Hi, did you add it in the system variables or user variables ? (Speaking about C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path right ?) So frustrating that it is not working @AmpCode
@@UManfromEarth i did it in the system variable path.
hello, which Hadoop Version should i install since the 2.7 is not available anymore ? thanks in advance
You can go ahead and install the latest one as well. no issues!
@@ampcode Will the utils file still be 2.7 version ?
where is that git repository link? Its not there in the description box below
Extremely sorry for that. I have added it in the description as well as pasting it here.
GitHUB: github.com/steveloughran/winutils
Hope this is helpful! :)
Man, i love you.
Thank you for this video!!!
Bhai, bro, Brother, Thank you so much for this video
Thank you so much!
Bro while executing spark-shell getting an error
ReplGlobal.abort: bad constant pool index: 0 at pos: 49180
[init] error:
bad constant pool index: 0 at pos: 49180
while compiling:
during phase: globalPhase=, enteringPhase=
library version: version 2.12.17
compiler version: version 2.12.17
reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b
epl-3fc51940-943d-416d-ab37-074575e4ad8d
last tree to typer: EmptyTree
tree position:
tree tpe:
symbol: null
call site: in
== Source file context for tree position ==
Exception in thread "main" scala.reflect.internal.FatalError:
bad constant pool index: 0 at pos: 49180
while compiling:
during phase: globalPhase=, enteringPhase=
library version: version 2.12.17
compiler version: version 2.12.17
reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b
epl-3fc51940-943d-416d-ab37-074575e4ad8d
Gettin as spark shell is not recognized as internal or external commnad
C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin\ add this to env var path
Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!
how did you download the apache spark in zipped file? mine was downloaded as tgz file
Sorry for late response. You’ll get both options on their official website. Could you please check if you are using the right link?
@@ampcode There is no way now to download the zip file, only tgz.
Spark-Shell is not running