Install Apache PySpark on Windows PC | Apache Spark Installation Guide

Поділитися
Вставка
  • Опубліковано 27 гру 2024

КОМЕНТАРІ • 443

  • @nftmobilegameshindi8392
    @nftmobilegameshindi8392 9 місяців тому +10

    spark shell not working

  • @indianintrovert281
    @indianintrovert281 7 місяців тому +31

    Those who are facing problems like 'spark-shell' is not recognized as an internal or external command
    On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
    And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
    If it worked, like this so that more people benefit from this

    • @SharinH
      @SharinH 7 місяців тому +1

      It worked .. Thank you

    • @jagjodhsingh2358
      @jagjodhsingh2358 7 місяців тому +1

      It worked, thanks :)

    • @Manishamkapse
      @Manishamkapse 7 місяців тому +1

      Thank you 😊 so much it worked

    • @Manishamkapse
      @Manishamkapse 7 місяців тому

      Thank you 😊 so much it worked

    • @vishaltanwar2238
      @vishaltanwar2238 7 місяців тому +1

      why did we get this error?

  • @joshizic6917
    @joshizic6917 Рік тому +10

    how is your spark shell running from your users directory?
    its not running for me

    • @Sai_naga
      @Sai_naga 4 місяці тому +2

      did it workfor you now? same issue ffacing here

  • @riptideking
    @riptideking 9 місяців тому +2

    'pyspark' is not recognized as an internal or external command,
    operable program or batch file.
    getting this error and tried it for whole day and same issue.

    • @srishtimadaan03
      @srishtimadaan03 7 місяців тому +1

      On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
      And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)

    • @Sai_naga
      @Sai_naga 4 місяці тому

      @@srishtimadaan03 hello....but we added spark home in environment variables, what is the point of running it from the exact location? Environment variables should help system to find the command.

  • @prateektripathi3834
    @prateektripathi3834 Рік тому +5

    Did Everything as per the video, still getting this error : The system cannot find the path specified. on using spark-shell

    • @srishtimadaan03
      @srishtimadaan03 7 місяців тому +1

      On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
      And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)

    • @jaymanhire
      @jaymanhire Місяць тому

      @@srishtimadaan03 Yes!

    • @youknowwhatlol6628
      @youknowwhatlol6628 Місяць тому

      @@srishtimadaan03 it doesnt work for me....fuc fuc fuc fuc what do i doooo

  • @ismailcute1584
    @ismailcute1584 10 місяців тому +5

    Thank you so much for this video. Unfortunately, I couldn't complete this - getting this erros C:\Users\Ismahil>spark-shell
    'cmd' is not recognized as an internal or external command,
    operable program or batch file. please help

    • @JesusSevillanoZamarreno-cu5hk
      @JesusSevillanoZamarreno-cu5hk 10 місяців тому +1

      execute as admin

    • @johnpaulmawa4808
      @johnpaulmawa4808 6 місяців тому +1

      @@JesusSevillanoZamarreno-cu5hk You are the bestest and sweetest in the world

    • @frankcastelo9987
      @frankcastelo9987 2 місяці тому +1

      I was having the same issue as you, and it turn to work, simply doing what Jesus said (OMG!): "Run it as admin". Thanks everyone.. Indeed, Jesus saves us!!

  • @sisterkeys
    @sisterkeys Рік тому +3

    What I was doing in 2 days, you narrowed to 30 mins!! Thank you!!

    • @ampcode
      @ampcode  11 місяців тому

      Thank you so much! Subscribe for more content 😊

  • @donjuancapistrano2382
    @donjuancapistrano2382 Місяць тому

    The best video on installing payspark, even in 2024. Many thanks to the author!

    • @playtrip7528
      @playtrip7528 Місяць тому +2

      which spark version did u downloaded ?

    • @donjuancapistrano2382
      @donjuancapistrano2382 Місяць тому

      @playtrip7528 I downloaded 3.5.3 and pre build for Hadoop 3.3 with 3.0.0 winutils

    • @donjuancapistrano2382
      @donjuancapistrano2382 Місяць тому

      ​@@playtrip7528 I downloaded a 3.5.3 version of pyspark and 3.3 pre built for Hadoop with 3.0.0 winutils

  • @rayudusunkavalli2318
    @rayudusunkavalli2318 10 місяців тому +5

    i did every step you have said, but still spark is not working

  • @meditationmellowmelodies7901
    @meditationmellowmelodies7901 8 місяців тому +2

    I followed all the setps but getting error
    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.

    • @Mralbersan
      @Mralbersan 7 місяців тому +1

      the same happens to me

    • @indianintrovert281
      @indianintrovert281 7 місяців тому +1

      Facing same error, Did you find any solution for it?

  • @anandbagate2347
    @anandbagate2347 2 місяці тому +2

    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.

  • @ArtificialIntelligenceColombia
    @ArtificialIntelligenceColombia 4 місяці тому

    WHAT A PROCESS!! It worked for me just by run spark-shell in cmd as ADMIN. thank you for the video!

  • @ramnisanthsimhadri3161
    @ramnisanthsimhadri3161 7 місяців тому +4

    I am not able to find the package type: pre-build for Apache Hadoop 2.7 in the drop-down. FYI - my spark release versions that i can see in the spark releases are 3.4.3 and 3.5.1.

    • @mindcoder4823
      @mindcoder4823 2 місяці тому

      How did you solve this? I am running into the same issue

  • @ipheiman3658
    @ipheiman3658 Рік тому +3

    This worked so well for me :-) The pace is great and your explanations are clear. I am so glad i came across this, thanks a million! 😄 I have subscribed to your channel!!

  • @arnoldochris5082
    @arnoldochris5082 Рік тому +13

    Ok guys this is how to do it, incase you are having problems👇
    1.) I used the latest version 3.5.0, (Pre-built for apache hadoop 3.3 or later) - downloaded it.
    2.) Extracted the zip file just as done, the first time it gave me a file, not a folder but a .rar file which winrar could not unzip, so I used 7-zip and it finally extracted to a folder that had the bins and all the other files.
    3.) In the system variables he forgot to edit the path variables and to add %SPARK_HOME%\bin.
    4.) Downloaded winutils.exe for hadoop 3.0.0 form the link provided in the video.
    5.) Added it the same way but c>Hadoop>bin>winutils.exe
    6.) Then edit the user variables as done then do the same to the path %HADOOP_HOME%\bin
    Reply for any parts you might have failed to understand🙂

    • @MANALROGUI
      @MANALROGUI Рік тому

      What do you mean for the 3rd step ?

    • @stay7485
      @stay7485 Рік тому

      Thanks

    • @ampcode
      @ampcode  11 місяців тому

      Thank you so much 😊

    • @sarahq6497
      @sarahq6497 7 місяців тому +3

      Hello, I had to use the latest version as well, but I'm not able to make it work, I followed the tutorial exactly :(

    • @Sai_naga
      @Sai_naga 4 місяці тому +1

      @@sarahq6497 me too... when i am running the spark-shell command from the exact spark location on cd, it works... but when i run it just after opening cmd, it doesn't it gives error like spark-shell is not found

  • @prashanthnm3406
    @prashanthnm3406 6 місяців тому +1

    Thanks bro fixed it after struggling for 2 days 2 nights 2hours 9mins.

    • @nickcheruiyot9069
      @nickcheruiyot9069 6 місяців тому

      Hello, I have been trying to install it for some days too, I keep getting an error when I try to run the spark shell command is not recognized any suggestions?

  • @BOSS-AI-20
    @BOSS-AI-20 Рік тому +4

    In cmd the comand spark-shell is running only under C:\Spark\spark-3.5.0-bin-hadoop3\bin directory not globally
    same for pyspark

    • @s_a_i5809
      @s_a_i5809 Рік тому +3

      yeah man , same for me.. did you found any fixes... if, let me know :)

    • @BOSS-AI-20
      @BOSS-AI-20 Рік тому

      @@s_a_i5809 add your Environment variables under system variables not user variables.

    • @ankitgupta5446
      @ankitgupta5446 Рік тому

      100 % working solution
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg

    • @lucaswolff5504
      @lucaswolff5504 9 місяців тому

      I added C:\Program Files\spark\spark-3.5.1-bin-hadoop3\bin to the system variables and it worked

    • @BOSS-AI-20
      @BOSS-AI-20 9 місяців тому

      @@lucaswolff5504 yes

  • @laxman0457
    @laxman0457 Рік тому +3

    i have followed all your steps,still i'm facing an issue.
    'spark2-shell' is not recognized as an internal or external command

    • @nayanagrawal9878
      @nayanagrawal9878 Рік тому

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

    • @thedataguyfromB
      @thedataguyfromB Рік тому

      Step by step spark + PySpark in pycharm solution video
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc

  • @harshithareddy5087
    @harshithareddy5087 11 місяців тому +3

    I don't have the option for Hadoop 2.7 what to choose now???

    • @LLM_np
      @LLM_np 10 місяців тому

      did you get any solution?
      please let me know

    • @geetalimatta
      @geetalimatta 3 місяці тому

      @@LLM_np NO

  • @neeleshgaikwad6387
    @neeleshgaikwad6387 Рік тому +2

    Very helpful video. Just by following the steps you mentioned I could run the spark on my windows laptop. Thanks a lot for making this video!!

    • @ampcode
      @ampcode  Рік тому

      Thank you so much!😊

    • @iniyaninba489
      @iniyaninba489 Рік тому

      @@ampcode bro I followed every step you said, but in CMD when I gave "spark-shell", it displayed " 'spark-shell' is not recognized as an internal or external command,
      operable program or batch file." Do you know how to solve this?

    • @sssssshreyas
      @sssssshreyas 7 місяців тому

      @@iniyaninba489 add same path in User Variables Path also, just like how u added in System Variables Path

  • @rakesh.kandula
    @rakesh.kandula Рік тому +3

    Hi, i followed exact steps (installed spark 3.2.4 as that is the only version available for hadoop 2.7). Spark-shell command is working but pyspark is thrwing errors.
    if anyone has fix to this please help me.
    Thanks

    • @thedataguyfromB
      @thedataguyfromB Рік тому

      Step by step solution
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc

  • @amitkumarpatel7762
    @amitkumarpatel7762 9 місяців тому +5

    I have followed whole instruction but when I am running spark -shell is not recognised

    • @JustinLi-y6q
      @JustinLi-y6q 2 місяці тому +2

      same here

    • @mindcoder4823
      @mindcoder4823 2 місяці тому +1

      @@JustinLi-y6q did you get it? i was having the same issue, but i downgraded my java version to version 17 and it is now working fine. The Java 23 is not compatible with spark 3. XXX i think. Did not work for me

  • @sibrajbanerjee6297
    @sibrajbanerjee6297 6 місяців тому +1

    I am getting a message of 'spark-version' is not recognized as an internal or external command,
    operable program or batch file. This is after setting up the path in environment variables for PYSPARK_HOME.

    • @Sai_naga
      @Sai_naga 4 місяці тому

      try running as administrator.

  • @saikrishnareddy3474
    @saikrishnareddy3474 Рік тому +3

    I’m little confused on how to setup the PYTHONHOME environment variable

    • @thedataguyfromB
      @thedataguyfromB Рік тому

      Step by step
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc

  • @alireza2295
    @alireza2295 3 місяці тому

    Great. I followed the instructions and successfully installed spark. Thank you!

  • @anthonyuwaifo8605
    @anthonyuwaifo8605 Рік тому +2

    I got the below error while running spyder even though i have added the PYTHONPATH.
    File ~\anaconda\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)
    File c:\users\justa\.spyder-py3\temp.py:26
    df = spark.createDataFrame(data = data, schema = columns)
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1276 in createDataFrame
    return self._create_dataframe(
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1318 in _create_dataframe
    rdd, struct = self._createFromLocal(map(prepare, data), schema)
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:962 in _createFromLocal
    struct = self._inferSchemaFromList(data, names=schema)
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:834 in _inferSchemaFromList
    infer_array_from_first_element = self._jconf.legacyInferArrayTypeFromFirstElement()
    File ~\anaconda\lib\site-packages\py4j\java_gateway.py:1322 in __call__
    return_value = get_return_value(
    File ~\anaconda\lib\site-packages\pyspark\errors\exceptions\captured.py:169 in deco
    return f(*a, **kw)
    File ~\anaconda\lib\site-packages\py4j\protocol.py:330 in get_return_value
    raise Py4JError(
    Py4JError: An error occurred while calling o29.legacyInferArrayTypeFromFirstElement. Trace:
    py4j.Py4JException: Method legacyInferArrayTypeFromFirstElement([]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:1623)

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. Could you please check if you are able to run spark-submit using cmd?

    • @ankitgupta5446
      @ankitgupta5446 Рік тому

      100 % working solution
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg

  • @satishboddula4942
    @satishboddula4942 Рік тому +3

    I have done exactly you shown in tutorial but when I am running the spark-shell command in cmd getting "spark-shell
    The system cannot find the path specified."

    • @ganeshkalaivani6250
      @ganeshkalaivani6250 Рік тому +2

      yes same error..did you find out the colustion

    • @satishboddula4942
      @satishboddula4942 Рік тому +4

      @@ganeshkalaivani6250 yes the spark don't support with latest java and python version try with java 1.8 and python 3.7 and spark 2.7

    • @ganeshkalaivani6250
      @ganeshkalaivani6250 Рік тому +1

      @@satishboddula4942 can you please share the java 1.8 download link jdk showing only 18,19 and 20 version

    • @ganeshkalaivani6250
      @ganeshkalaivani6250 Рік тому +1

      @@satishboddula4942 still system path cannot find out error

    • @shashankkkk
      @shashankkkk Рік тому +3

      C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin\ add this to env var path

  • @rahmaesam2732
    @rahmaesam2732 Місяць тому

    still hadoop not recognize even with your installation it give you a warning message "
    unable to load native.hadoop library"

  • @nagarajgotur
    @nagarajgotur Рік тому +2

    spark-shell is working for me, pyspark is not working from home directory, getting error 'C:\Users\Sana>pyspark
    '#' is not recognized as an internal or external command,
    operable program or batch file.'
    But when I go to python path and run the cmd pyspark is working. I have setup the SPARK_HOME and PYSPARK_HOME environment variables. Could you please help me. Thanks

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. Could you please also set PYSPARK_HOME as well to your python.exe path. I hope this will solve the issue😅👍

    • @bintujose1981
      @bintujose1981 Рік тому +1

      @@ampcode nope. Same error

  • @priyankashekhawat6174
    @priyankashekhawat6174 2 місяці тому

    A very good and amazing content. You can not find better place then this video to setup pyspark (Y).

  • @sanchitabhattacharya353
    @sanchitabhattacharya353 10 місяців тому +1

    while launching the spark-shell getting the following error, any idea??
    WARN jline: Failed to load history
    java.nio.file.AccessDeniedException: C:\Users\sanch\.scala_history_jline3

  • @AkshayNagendra
    @AkshayNagendra Рік тому +1

    I followed all the steps but I'm getting this error
    'spark-shell' is not recognized as an internal or external command, operable program or batch file

    • @Karansingh-xw2ss
      @Karansingh-xw2ss Рік тому

      Yeah I'm also facing this same issue

    • @ankitgupta5446
      @ankitgupta5446 Рік тому

      100 % working solution
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg

  • @abhinavtiwari6186
    @abhinavtiwari6186 Рік тому +2

    after 11:17 I am getting this error:
    'spark-shell' is not recognized as an internal or external command, operable program or batch file.
    I have checked the environment variables too.

    • @ampcode
      @ampcode  Рік тому +2

      Hello..sorry for late response...could you please navigate once to the spark bin folder and open the CMD there and kick off the spark-shell command? If the spark works fine in the bin directory then definitely it will be the issue with environment variables.
      Please let me know if any difficulties. :)

    • @abhinavtiwari6186
      @abhinavtiwari6186 Рік тому +3

      @@ampcode now this is error I am getting after getting into the bin folder
      C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin> spark-shell
      The system cannot find the path specified.

    • @abhinavtiwari6186
      @abhinavtiwari6186 Рік тому +5

      My problem finally got solved tonight... I needed to add this C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path

    • @ampcode
      @ampcode  Рік тому +1

      I'm very glad you solved your problem. Cheers!

    • @aswinjoseph
      @aswinjoseph Рік тому

      @@abhinavtiwari6186 I try this also but same issue only "The system cannot find the path specified"

  • @kchavan67
    @kchavan67 Рік тому +1

    Hi, following all the steps given in video, I am still getting error as "cannot recognize spark-shell as internal or external command" @Ampcode

    • @psychoticgoldphish5797
      @psychoticgoldphish5797 Рік тому

      I was having this issue as well, when I added the %SPARK_HOME%\bin, %HADOOP_HOME%\bin and %JAVA_HOME%\bin to the User variables (top box, in the video he shows doing system, bottom box) it worked. Good luck.

    • @thedataguyfromB
      @thedataguyfromB Рік тому

      Step by step spark + PySpark in pycharm solution video
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=aaITbbN7ggnczQTc

  • @Ohisthisyou
    @Ohisthisyou 9 днів тому

    can someone help , i have downloaded hadoop 3.3 which is the newest version but it is not showing in github . what to do ?

  • @YohanTharakan
    @YohanTharakan Рік тому +7

    Hi, I completed the process step by step and everything else is working but when I run 'spark-shell' , it shows - 'spark-shell' is not recognized as an internal or external command,
    operable program or batch file. Do you know what went wrong?

    • @viniciusfigueiredo6740
      @viniciusfigueiredo6740 Рік тому +1

      I'm having this same problem, the command only works if I run CMD as an administrator. Did you manage to solve it?

    • @hulkbaiyo8512
      @hulkbaiyo8512 Рік тому

      @@viniciusfigueiredo6740 same as you, run as administrator works

    • @shivamsrivastava4337
      @shivamsrivastava4337 Рік тому +1

      @@viniciusfigueiredo6740 same issue is happening with me

    • @RohitRajKodimala
      @RohitRajKodimala Рік тому

      @@viniciusfigueiredo6740same issue for me did u fix it?

    • @santaw
      @santaw Рік тому +2

      Anyone solved this?

  • @AbhiShek-m6s
    @AbhiShek-m6s 29 днів тому

    I did everything until the environment variables setup, still while using cmd spark-shell it is giving me "'spark-shell' is not recognized as an internal or external command,
    operable program or batch file."
    versions I used -
    For Java:
    java version "11.0.24" 2024-07-16 LTS
    Java(TM) SE Runtime Environment 18.9 (build 11.0.24+7-LTS-271)
    Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.24+7-LTS-271, mixed mode)
    For Python:
    Python 3.11.0rc2
    For Spark:
    spark-3.5.3-bin-hadoop3
    For Hadoop: (file from below location)
    winutils/hadoop-3.3.6/bin
    /winutils.exe

  • @yashusachdeva
    @yashusachdeva 10 місяців тому

    It worked, my friend. The instructions were concise and straightforward.

  • @Jerriehomie
    @Jerriehomie Рік тому +2

    Getthing this error: WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped. People have mentioned to use python fodler path which I have as you have mentioned but still.

    • @bukunmiadebanjo9684
      @bukunmiadebanjo9684 Рік тому +1

      I found a fix for this. Change your python path to that of anaconda(within the environment variable section of this video) and use your anaconda command prompt instead. No errors will pop up again.

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. Could you please let me know if you are still facing this issue and also confirm if you’re able to open spark-shell?

    • @shivalipurwar7205
      @shivalipurwar7205 Рік тому +1

      @@bukunmiadebanjo9684 Hi Adebanjo, my error got resolved with you solution. Thanks for your help!

  • @geetakavalad8983
    @geetakavalad8983 Місяць тому

    I have followed all the steps and added all the system variables but at that time winutils file was not present in my system

    • @geetakavalad8983
      @geetakavalad8983 Місяць тому

      Now I have that file how to make the changes plz let me know

  • @gbs7212
    @gbs7212 Місяць тому

    thank you so much, very helpful! The only error I got was running spark-shell, but from other comments I figured out that you can either run the command prompt as admin or cd into the spark folder and then call it

  • @nihalisahu3857
    @nihalisahu3857 3 місяці тому

    in CMD while running spark-shell getting error like ERROR SparkContext: Error initializing SparkContext.

  • @susmayonzon9198
    @susmayonzon9198 Рік тому +2

    Excellent! Thank you for making this helpful lecture! You relieved my headache, and I did not give up.

    • @ampcode
      @ampcode  Рік тому

      Thank you so much!

    • @moathmtour1798
      @moathmtour1798 Рік тому +1

      hey , which version of hadoop did you install because the 2.7 wasn't available

  • @coclegend715
    @coclegend715 Рік тому +1

    everything is working fine until i run "pyspark" in my command prompt which shoes an error "ERROR: The process with PID 38016 (child process of PID 30404) could not be terminated.
    Reason: There is no running instance of the task.
    ERROR: The process with PID 30404 (child process of PID 7412) could not be terminated.
    Reason: There is no running instance of the task."

  • @anastariq1310
    @anastariq1310 Рік тому +1

    After entering pyspark in cmd it shows "The system cannot find the path specified. Files\Python310\python.exe was unexpected at this time" please help me resolve it

    • @mahamudullah_yt
      @mahamudullah_yt Рік тому

      i face the same problem. is there any solution

  • @Manoj-ed3lj
    @Manoj-ed3lj 6 місяців тому

    installed successfully but when i am checking hadoop version, i am getting an like hadoop is not recognized as internal or external command

  • @cloudandsqlwithpython
    @cloudandsqlwithpython Рік тому +1

    Great ! got SPARK working on Windows 10 -- Good work !

    • @ampcode
      @ampcode  11 місяців тому

      Thank you so much! Subscribe for more content 😊

  • @AnuragPatel-y9j
    @AnuragPatel-y9j Рік тому +1

    ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
    I am getting above error while running spark or pyspark session.
    I have ensured that winutils file is present in C:\hadoop\bin

    • @ampcode
      @ampcode  Рік тому

      Could you please let me know if your all the env variables are set properly?

  • @badnaambalak364
    @badnaambalak364 11 місяців тому +1

    I followed the steps & Installed JDK 17, spark 3.5 and python 3.12 when I am trying to use map function I am getting an Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe Error please someone help me

  • @somanathking4694
    @somanathking4694 8 місяців тому

    This works as smooth as butter. Be patient that's it! Once set up done, no looking back.

    • @SUDARSANCHAKRADHARAkula
      @SUDARSANCHAKRADHARAkula 8 місяців тому

      Bro, which version of spark & winutils you've downloaded. I took 3.5.1 and hadoop-3.0.0/bin/winutils but not worked

    • @meriemmouzai2147
      @meriemmouzai2147 7 місяців тому

      @@SUDARSANCHAKRADHARAkula same for me!

  • @AmreenKhan-dd3lf
    @AmreenKhan-dd3lf 5 місяців тому

    Apache 2.7 option not available during spark download. Can we choose Apache Hadoop 3.3 and later ( scala2.13) as package type during download

  • @shankarikarunamoorthy4391
    @shankarikarunamoorthy4391 7 місяців тому

    sir, spark version is available with Hadoop 3.0 only. Spark-shell not recognized as internal or external command. Please do help.

  • @khushboojain3883
    @khushboojain3883 Рік тому +1

    Hi, I have installed Hadoop 3.3 (the lastest one) as 2.7 was not available. But while downloading winutils, we don't have for Hadoop 3.3 in repository. Where do i get it from?

    • @sriram_L
      @sriram_L Рік тому

      Same here.Did u get it now?

    • @khushboojain3883
      @khushboojain3883 Рік тому

      @@sriram_L yes, u can directly get it from google by simply mention the Hadoop version for which u want winutils. I hope this helps.

    • @hritwikbhaumik5622
      @hritwikbhaumik5622 Рік тому

      @@sriram_L it still not working for me though

  • @karthikeyinikarthikeyini380
    @karthikeyinikarthikeyini380 Рік тому +1

    hadoop 2.7 tar file is not available in the link

    • @ankitgupta5446
      @ankitgupta5446 Рік тому

      100 % working solution
      ua-cam.com/video/jO9wZGEsPRo/v-deo.htmlsi=lzXq4Ts7ywqG-vZg

  • @nagalakshmip8725
    @nagalakshmip8725 8 місяців тому

    I'm getting spark- shell is not recognised as an internal or external command, operable program or batch file

  • @Kartik-vy1rh
    @Kartik-vy1rh Рік тому +1

    Video is very helpful. Thanks for sharing

    • @ampcode
      @ampcode  Рік тому

      Thank you so much!

  • @edu_tech7594
    @edu_tech7594 Рік тому +1

    my Apache hadoop which i downloaded previously is version 3.3.4 eventhough i should choose pre-built for Apache Hadoop 2.7?

    • @sriram_L
      @sriram_L Рік тому

      Same doubt bro.
      Did u install now

  • @sriramsivaraman4100
    @sriramsivaraman4100 Рік тому +2

    Hello when I try to run the command spark_shell as a local user its not working (not recognized as an internal or external command) and it only works if I use it as an administratror. Can you please help me solve this? Thanks.

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. Could you please try once running the same command from the spark/bin directory and let me know. I guess there might be some issues with your environment vatiables🤔

    • @dishantgupta1489
      @dishantgupta1489 Рік тому

      @@ampcode followed each and every step of video still getting not recognised as an internal or external command error

    • @ayonbanerjee1969
      @ayonbanerjee1969 Рік тому

      ​@@dishantgupta1489 open fresh cmd prompt window and try after you save the environment variables

    • @obulureddy7519
      @obulureddy7519 Рік тому

      In Environment Variables you give the paths in Users variable Admin. NOT IN System variables

  • @ashwinnair2325
    @ashwinnair2325 6 місяців тому

    thanks a lot pyspark is opening but when executing df.show() command on a dataframe i get below error
    Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
    is there any way to rectify it

  • @Mralbersan
    @Mralbersan 7 місяців тому

    I can't see Pre-Built for Apache Hadoop 2.7 on the spark website

    • @meriemmouzai2147
      @meriemmouzai2147 7 місяців тому

      same problem for me! I tried the "3.3 and later" version with the "winutils/hadoop-3.0.0/bin", but it didn't work

  • @prasadbarla7215
    @prasadbarla7215 22 дні тому

    spark runs only on java 8 or 11 version it doesn't work with latest version I've tried it

  • @chinmayapallai8452
    @chinmayapallai8452 Рік тому +1

    I have followed same thing what ever u have done while u have explained, I have observed and same thing I did but both spark and pyspark is not working,Can you please help me how to resolve the issue as after giving cmd then typing spark-shell it's showing spark- shell is not recognised as internal or external command same thing for spark also . Please help me how to overcome from this 🙏🙏🙏🙏🙏🙏🙏

    • @nayanagrawal9878
      @nayanagrawal9878 Рік тому

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

  • @alpha_ak-p3h
    @alpha_ak-p3h 3 місяці тому

    not getting the ui says: docker refused to connect

  • @prajakta-dh7fc
    @prajakta-dh7fc 7 місяців тому

    'spark' is not recognized as an internal or external command,
    operable program or batch file. its not working for me i have follow all the steps but its still not working waiting for solution

  • @SupravaMishra-e4d
    @SupravaMishra-e4d Місяць тому

    I am getting errors continuously after doing the same procedure as well, please reply to me.

  • @ganeshkalaivani6250
    @ganeshkalaivani6250 Рік тому +1

    can any one please help...last two days tried to install spark and give correct variable path but still getting system path not speicifed

    • @ampcode
      @ampcode  Рік тому

      Sorry for late reply. Could you please check if your spark-shell is running properly from the bin folder. If yes I guess there are some issues with your env variables only. Please let me know.

  • @touhidalam4825
    @touhidalam4825 3 місяці тому

    Im getting bad constant pool index error. Please help

  • @viniciusfigueiredo6740
    @viniciusfigueiredo6740 Рік тому +1

    I followed the step by step and when I search for spark-shel at the command prompt I come across the message :( 'spark-shell' is not recognized as a built-in command or external, an operable program or a batch file). I installed windows on another HD and did everything right, there are more people with this problem, can you help us? I'm since January trying to use pyspark on windows

    • @letsexplorewithzak3614
      @letsexplorewithzak3614 Рік тому +1

      Need to edit bottom "add this to env var path"
      path >> C:\Spark\spark-3.3.1-bin-hadoop2\bin\

    • @kiranmore29
      @kiranmore29 Рік тому

      @@letsexplorewithzak3614 Thanks worked for me

    • @nayanagrawal9878
      @nayanagrawal9878 Рік тому

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

    • @jayakrishnayashwanth7358
      @jayakrishnayashwanth7358 Рік тому

      Even I'm facing the same issue ,can you tell in more detail like what to do add in system variables??As we already added Java , Hadoop, Spark and Pyspark_Home in the user varaibles as said in the video.@@nayanagrawal9878

    • @penninahgathu7956
      @penninahgathu7956 10 місяців тому

      @@nayanagrawal9878 thank you!!! I did this and it solved my problem

  • @ankushv2642
    @ankushv2642 Рік тому

    Did not work for me. At last when I typed the pyspark in command prompt, it did not worked.

  • @ganeshkalaivani6250
    @ganeshkalaivani6250 Рік тому +1

    FileNotFoundError: [WinError 2] The system cannot find the file specified getting this error even i have installed all required intalliation

    • @ampcode
      @ampcode  Рік тому

      Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!

  • @nikhilupmanyu8804
    @nikhilupmanyu8804 10 місяців тому

    Hi, Thanks for the steps. I am unable to see Web UI after installing pyspark. It gives This URL can't be reached. Kindly help

  • @chinmaymishra6381
    @chinmaymishra6381 Рік тому +1

    winutil file is not downloading from that github link

    • @sriram_L
      @sriram_L Рік тому

      Yes brother.Did u get it now from anywhere?

  • @infamousprince88
    @infamousprince88 5 місяців тому

    I'm still unable to get this to work. I've been trying to solve this problem for nearly 2 weeks

  • @pulkitdikshit9474
    @pulkitdikshit9474 8 місяців тому

    hi i installed but when I restarted my pc it is no longer running from cmd? what might be the issue?

  • @Bujdil-y8z
    @Bujdil-y8z Рік тому

    not working for me i set up everything except hadoop version came with 3.0

  • @basanthaider3238
    @basanthaider3238 Рік тому

    I have an issue with the pyspark it's not working and it's related to java class I can't realy understant what is wrong ???

  • @alulatafere6008
    @alulatafere6008 6 місяців тому

    Thank you! It is clear and much helpful!! from Ethiopia

  • @itsshehri
    @itsshehri Рік тому +1

    hey pyspark isnt working at my pc. I did everything how you asked. Can you help please

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. Could you please also set PYSPARK_HOME env variable to the python.exe path. I guess this’ll do the trick😅👍

  • @theefullstackdev
    @theefullstackdev Рік тому

    and when downloading the spark a set of files came to download not the tar file

  • @user-zk4hm2cy8l
    @user-zk4hm2cy8l 3 місяці тому

    If you tried all the steps mentioned above and it still does not work, try to add "C:\Windows\System32" to system variable "path". It fixed the error after 2 days of struggling

  • @nuzairmohamed5345
    @nuzairmohamed5345 Рік тому +1

    I get a noModuleError saying pyspark does not contain numpy module. I followed all the steps. Can you please help??

    • @ampcode
      @ampcode  Рік тому

      Hello, Are you trying to use numpy in your code. If so, have you installed pandas package? Please let me know so we can solve this issue😃

    • @nuzairmohamed5345
      @nuzairmohamed5345 Рік тому +1

      ​@@ampcode how to install pandas in pyspark

    • @ampcode
      @ampcode  Рік тому

      @@nuzairmohamed5345 you can run command as below:
      pip install pandas
      Please let me know if any issues.

  • @nikhilchavan7741
    @nikhilchavan7741 Рік тому

    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.-- Getting this error

    • @nayanagrawal9878
      @nayanagrawal9878 Рік тому

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

  • @Analystmate
    @Analystmate Рік тому

    C:\Users\lavdeepk>spark-shell
    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.
    Not working

    • @syamprasad8295
      @syamprasad8295 Рік тому

      which winutil file did u download? Its Hadoop 2.7 or later version?

  • @bramhanaskari3152
    @bramhanaskari3152 Рік тому +1

    you haven't give solution for that warn procfsMetricsGetter exception is there any solution for that ?

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. This could happen in windows only and can be safely ignored. Could you please confirm if you’re able to kick off spark-shell and pyspark?

  • @Nathisri
    @Nathisri Рік тому +1

    I have some issues in launching python & pyspark. I need some help. Can you pls help me?

  • @manikantaperumalla2197
    @manikantaperumalla2197 6 місяців тому

    java,python and spark should be in same directory?

  • @theefullstackdev
    @theefullstackdev Рік тому

    i have fallowed all these steps and installed those 3 and created paths too, but when i go to check in the command prompt... its not working.. error came... can anyone help me please to correct this

  • @anuraggupta5665
    @anuraggupta5665 Місяць тому

    Hi @AmpCode
    Thanks for the great tutorial.
    I followed each steps and spark is working fine.
    But when I'm executing some of my pyspark script, I'm getting below Hadoop error:
    ERROR SparkContext: Error initializing SparkContext.
    java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
    Can you please help me on this urgently..
    I have set all paths as you showed in video but I'm not able to solve this error.
    Please Help.

  • @antonstsezhkin6578
    @antonstsezhkin6578 Рік тому +8

    Excellent tutorial! I followed along and nothing worked in the end :)
    StackOverflow told me that "C:Windows\system32" is also required in the PATH variable for spark to work. I added it and spark started working.

    • @Manojprapagar
      @Manojprapagar Рік тому +1

      helped

    • @antonstsezhkin6578
      @antonstsezhkin6578 Рік тому +2

      @@Manojprapagar happy to hear it!

    • @ampcode
      @ampcode  Рік тому

      Thank you so much!

    • @conroybless
      @conroybless 3 місяці тому

      This was the game changer, also check the the extracted spark folder isn't in a folder of another folder(3 clicks to see the files). Should just be the spark folder you created and inside that folder another folder with the extracted spark filies.(2 clicks to see the files)

  • @gosmart_always
    @gosmart_always Рік тому

    Every now and then we receive alert from Oracle to upgrade JDK. Do we need to upgrade our JDK version? If we upgrade, will it impact running of spark.

  • @shahrahul5872
    @shahrahul5872 Рік тому +1

    on apache spark's installation page, under choose a package type, the 2.7 version seem to not be any option anymore as on 04/28/2023. What to do?

    • @shahrahul5872
      @shahrahul5872 Рік тому +2

      I was able to get around this by copying manually the URL of the site you were opened up to after selecting the 2.7th version from the dropdown. Seems like they have archived it.

    • @ampcode
      @ampcode  Рік тому

      Sorry for late reply. I hope your issue is resolved. If not we can discuss further on it!

  • @chittardhar8861
    @chittardhar8861 Рік тому +1

    my spark shell command is working when opened from bin folder , but it's not working in normal cmd , please help

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. Then this might be the issues with your environment variables. Could you please verify if they are set correctly and let me know.

    • @chittardhar8861
      @chittardhar8861 Рік тому +1

      @@ampcode yup , i have to add 1 more environment variable which i got to know from other comments. Your video is great. Thank you so much.

    • @ampcode
      @ampcode  Рік тому

      @@chittardhar8861 Thank you so much😊

    • @UManfromEarth
      @UManfromEarth Рік тому

      @@chittardhar8861 Hi, did you add it in the system variables or user variables ? (Speaking about C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path right ?) So frustrating that it is not working @AmpCode

    • @chittardhar8861
      @chittardhar8861 Рік тому

      @@UManfromEarth i did it in the system variable path.

  • @moathmtour1798
    @moathmtour1798 Рік тому +1

    hello, which Hadoop Version should i install since the 2.7 is not available anymore ? thanks in advance

    • @ampcode
      @ampcode  Рік тому

      You can go ahead and install the latest one as well. no issues!

    • @venkatramnagarajan2302
      @venkatramnagarajan2302 Рік тому

      @@ampcode Will the utils file still be 2.7 version ?

  • @abhinavtiwari6186
    @abhinavtiwari6186 Рік тому +1

    where is that git repository link? Its not there in the description box below

    • @ampcode
      @ampcode  Рік тому +1

      Extremely sorry for that. I have added it in the description as well as pasting it here.
      GitHUB: github.com/steveloughran/winutils
      Hope this is helpful! :)

  • @ДаниилСидоров-ж3и
    @ДаниилСидоров-ж3и 2 місяці тому

    Man, i love you.
    Thank you for this video!!!

  • @Adhikash015
    @Adhikash015 Рік тому +1

    Bhai, bro, Brother, Thank you so much for this video

    • @ampcode
      @ampcode  Рік тому

      Thank you so much!

  • @richardalphonse2680
    @richardalphonse2680 9 місяців тому

    Bro while executing spark-shell getting an error
    ReplGlobal.abort: bad constant pool index: 0 at pos: 49180
    [init] error:
    bad constant pool index: 0 at pos: 49180
    while compiling:
    during phase: globalPhase=, enteringPhase=
    library version: version 2.12.17
    compiler version: version 2.12.17
    reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b
    epl-3fc51940-943d-416d-ab37-074575e4ad8d
    last tree to typer: EmptyTree
    tree position:
    tree tpe:
    symbol: null
    call site: in
    == Source file context for tree position ==
    Exception in thread "main" scala.reflect.internal.FatalError:
    bad constant pool index: 0 at pos: 49180
    while compiling:
    during phase: globalPhase=, enteringPhase=
    library version: version 2.12.17
    compiler version: version 2.12.17
    reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b
    epl-3fc51940-943d-416d-ab37-074575e4ad8d

  • @ashwinkumar5223
    @ashwinkumar5223 Рік тому +1

    Gettin as spark shell is not recognized as internal or external commnad

    • @shashankkkk
      @shashankkkk Рік тому +1

      C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin\ add this to env var path

    • @ampcode
      @ampcode  Рік тому

      Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!

  • @syafiq3420
    @syafiq3420 Рік тому +1

    how did you download the apache spark in zipped file? mine was downloaded as tgz file

    • @ampcode
      @ampcode  Рік тому

      Sorry for late response. You’ll get both options on their official website. Could you please check if you are using the right link?

    • @georgematies2521
      @georgematies2521 Рік тому

      @@ampcode There is no way now to download the zip file, only tgz.

  • @SupravaMishra-e4d
    @SupravaMishra-e4d Місяць тому

    Spark-Shell is not running