Implementing Pyspark Real Time Application || End-to-End Project || Part-1
Вставка
- Опубліковано 14 чер 2023
- In this video we will discuss about , implementing Pyspark application in Pycharm and reading the Files Dynamically from the Respective Folders..
Pre-Requisite::
Spark and Hadoop Installed, Python, Pycharm
Link to DataSet::
Download City Dimension File at below Link:
prescpipeline1.blob.core.wind...
Download Prescriber Fact File at below Link:
prescpipeline1.blob.core.wind...
#azuredatabricks
#dataengineering
#dataanalysis
#pyspark
#pythonprogramming
#dataengineering
#dataanalysis
#pyspark
#python
#sql
Great explain, very clearly, this video very helpfull for me
Glad to hear that!
Good explanation 😊, now am confident on structure of folders in pyspark works
Thanks
you r ahead of everyone in explanantion.
This video helped me a lot.hope we can expect more real time scenarios like this
good content
good explanation
👍👍
Hey Great Explanation. Please could you reshare the csv file which is used. Not able to extract the file mentioned in your description
drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
@dataspark Could you please provide those data links again because those link got expired
Instead of get_env_variables.py, we could use .env file isn't it ?
I think the code looks too verbose and need some refactoring to simplify things. Overall good content
Sir, Can we use Scala in Intellij IDE for the project ?
yes you can use brother.
I am not able to download the fact file,I am getting the error in extracting the file
drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
can i use databricks community edition?
Hi, You can use databricks, then you have to play around dbutils.fs methods in order to get the list / file path as we did in get_env.py file. Thank you
Hello. Does anyone know hindi and can explain this project to me entirely in Hindi (not very much detailed manner, just briefly) in 30 mins or so? I'm a fresher and all this is going bouncer over my head, help out pleaseeee😢😢😢
I am getting error with logging. Python\Python39\lib\configparser.py", line 1254, in __getitem__
raise KeyError(key)
KeyError: 'keys'
can you share the code written in the video?
sure, here is the link drive.google.com/drive/folders/1QD8635pBSzDtxI-ykTx8yquop2i4Xghn?usp=sharing
Thanks@@DataSpark45
did your problem resolved?
@@vishavsi
how can i find this code? is there any repo where you have uploaded it.?
Sorry to say this bro , unfortunately we lost those files
sir why have u no used databricks for transformation?
Hi generally all the application development would be done with IDE and also it's easier to maintain folder kind of structure . Though you can develop in DataBricks But it's majorly for Analysis Part
@@DataSpark45 but DataBricks internally using spark and even its used in DEV,QA and PROD also? Current trend is also DataBricks right? Please correct me if my understanding is wrong!
@@nandesh783 Any answers ?
where's your parquet file located?
Hi, r u talking about source parquet file! It's under source folder
I can't download the dataset 😭.
Take a look at this :
drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
Can you provide me the source data file?
Hi in the description i provided the link bro
AuthenticationFailed
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:ea8e17b4-701e-004d-1db1-573f6a000000 Time:2024-02-04T21:31:20.0816196Z
Signature not valid in the specified time frame: Start [Tue, 22 Nov 2022 07:36:34 GMT] - Expiry [Wed, 22 Nov 2023 15:36:34 GMT] - Current [Sun, 04 Feb 2024 21:31:20 GMT]
where did you got this error bro
@@DataSpark45 while downloading data. But i got data from part 2