- 533
- 481 031
learndataa
India
Приєднався 31 бер 2020
Hello and Welcome,
I love to work with data. On this channel I upload coding tutorials in data science.
Please like, share and subscribe.
Thank you for your support.
I love to work with data. On this channel I upload coding tutorials in data science.
Please like, share and subscribe.
Thank you for your support.
Math | Ridge Regression | Gradient Decent
The video discusses concept and math for ridge regression with gradient descent
00:00 - Ridge regression vs. linear regression
01:15 - Concept: Overfit (high variance; too flexible line), underfit (high bias; too rigid line)
04:53 - Equation: predict: ŷ_i=β0+β1*x_i
05:50 - Equation: error: (y_i - ŷ_i)
06:10 - Equation: sum of squares: (y_i - ŷ_i)^2
07:42 - Equation: mean squared error (MSE): (1/n)*∑(y_i - ŷ_1)^2
08:28 - Objective function: Linear regression: min_w ||Xw - y_i||2 ^2 (L2 norm)
09:20 - Objective function: Ridge regression : min_w ||Xw - y_i||2 ^2 (L2 norm) + α||w||2 ^2
09:57 - Concept: Regularization term: α||w||2 ^2
13:39 - Equation: cost function or objective function
15:45 - Calculate gradient w.r.t. β0
21:55 - Calculate gradient w.r.t. β1
* 1st iteration ---
26:45 - Calculate: X=[1,2,3], y=[2.0,2.5,3.5],β0=0.5,β1=0.5,α=0.1,η=0.1
28:11 - Calculate: predicted value: ŷ=X.θ
30:51 - Calculate: error=y-ŷ
31:56 - Calculate: gradient=-1/n*X.T.(error)+[0, α*β1]
38:14 - Update parameters: θ=θ-η.gradient
* 2nd iteration ---
40:53 - Calculate: new predicted value
43:38 - Calculate: error
*
46:00 - Calculate: MSE for 1st and 2nd iteration
50:30 - Ending notes
00:00 - Ridge regression vs. linear regression
01:15 - Concept: Overfit (high variance; too flexible line), underfit (high bias; too rigid line)
04:53 - Equation: predict: ŷ_i=β0+β1*x_i
05:50 - Equation: error: (y_i - ŷ_i)
06:10 - Equation: sum of squares: (y_i - ŷ_i)^2
07:42 - Equation: mean squared error (MSE): (1/n)*∑(y_i - ŷ_1)^2
08:28 - Objective function: Linear regression: min_w ||Xw - y_i||2 ^2 (L2 norm)
09:20 - Objective function: Ridge regression : min_w ||Xw - y_i||2 ^2 (L2 norm) + α||w||2 ^2
09:57 - Concept: Regularization term: α||w||2 ^2
13:39 - Equation: cost function or objective function
15:45 - Calculate gradient w.r.t. β0
21:55 - Calculate gradient w.r.t. β1
* 1st iteration ---
26:45 - Calculate: X=[1,2,3], y=[2.0,2.5,3.5],β0=0.5,β1=0.5,α=0.1,η=0.1
28:11 - Calculate: predicted value: ŷ=X.θ
30:51 - Calculate: error=y-ŷ
31:56 - Calculate: gradient=-1/n*X.T.(error)+[0, α*β1]
38:14 - Update parameters: θ=θ-η.gradient
* 2nd iteration ---
40:53 - Calculate: new predicted value
43:38 - Calculate: error
*
46:00 - Calculate: MSE for 1st and 2nd iteration
50:30 - Ending notes
Переглядів: 106
Відео
Math | Linear Regression | Gradient Decent
Переглядів 752 місяці тому
The video discusses concept and math for gradient descent in linear regression. 00:00 - Overview 00:50 - Concept: A line through a cloud of data 02:13 - Concept: Fitted straight line: A model 02:47 - Concept: Errors vs. Best fitted line 04:19 - Concept: Equation of a straight line: y=m*x b 05:10 - When we say!: Fit a line or train a model 06:14 - Concept: Final model 06:35 - What's the big deal...
Linear Regression: Implementations in Python, TensorFlow, Kotlin, Keras and scikit-learn
Переглядів 1263 місяці тому
The video shows linear regression implementation using gradient descent in 5 different ways: Python, TensorFlow, Kotlin, Keras and scikit-learn. The code is available on Github at link below: github.com/learndataa/shared - Linear_Regression Keras.ipynb - Linear_Regression sklearn.ipynb - Linear_Regression TensorFlow.ipynb - linear_regression.kt - linear_regression.py Hope you enjoyed the video....
128: sampled softmax loss | TensorFlow | Tutorial
Переглядів 764 місяці тому
The video discusses in TensorFlow: tf.nn.sampled_softmax_loss() 00:00 - Start 00:30 - tf.nn.sampled_softmax_loss() 00:55 - Ending notes # # TensorFlow Guide # sampled_softmax_loss: www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss
127: nce loss | TensorFlow | Tutorial
Переглядів 344 місяці тому
The video discusses in TensorFlow: tf.nn.log_poisson_loss() 00:00 - Start 00:35 - tf.nn.log_poisson_loss() 01:04 - Ending notes # # TensorFlow Guide # log_poisson_loss: www.tensorflow.org/api_docs/python/tf/nn/log_poisson_loss
126: nce loss | TensorFlow | Tutorial
Переглядів 264 місяці тому
The video discusses in TensorFlow: tf.nn.nce_loss() 00:00 - Start 01:40 - tf.nn.nce_loss() 02:45 - Ending notes # # TensorFlow Guide # nce_loss: www.tensorflow.org/api_docs/python/tf/nn/nce_loss
125: L2 loss | TensorFlow | Tutorial
Переглядів 425 місяців тому
The video discusses in TensorFlow: tf.nn.l2_loss() 00:00 - Start 00:34 - tf.nn.l2_loss 02:36 - Ending notes # # TensorFlow Guide # l2_loss: www.tensorflow.org/api_docs/python/tf/nn/l2_loss
124: ctc unique labels | TensorFlow | Tutorial
Переглядів 567 місяців тому
The video discusses in TensorFlow: tf.nn.ctc_unique_labels() 00:00 - Start 00:38 - tf.nn.ctc_unique_labels() 00:55 - Ending notes # # TensorFlow Guide # ctc_unique_labels: www.tensorflow.org/api_docs/python/tf/nn/ctc_unique_labels
123: ctc loss | TensorFlow | Tutorial
Переглядів 1697 місяців тому
The video discusses in TensorFlow: tf.nn.ctc_loss() 00:00 - Start 00:46 - tf.nn.ctc_loss() 01:05 - Ending notes # # TensorFlow Guide # ctc_loss: www.tensorflow.org/api_docs/python/tf/nn/ctc_loss
122: ctc greedy decoder | TensorFlow | Tutorial
Переглядів 607 місяців тому
The video discusses in TensorFlow: tf.nn.ctc_greedy_decoder() 00:00 - Start 00:55 - tf.nn.ctc_greedy_decoder() 01:24 - Ending notes # # TensorFlow Guide # ctc_greedy_decoder: www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder
121: ctc beam search decoder | TensorFlow | Tutorial
Переглядів 1187 місяців тому
The video discusses in TensorFlow: tf.nn.ctc_beam_search_decoder() 00:00 - Start 00:48 - Input: logits tensor 01:17 - Sequence length 01:27 - tf.nn.ctc_beam_search_decoder() 02:11 - Ending notes # # TensorFlow Guide # ctc_beam_search_decoder: www.tensorflow.org/api_docs/python/tf/nn/ctc_beam_search_decoder
120: dropout | TensorFlow | Tutorial
Переглядів 397 місяців тому
The video discusses in TensorFlow: tf.nn.experimental.stateless_dropout() 00:00 - Start 00:11 - tf.nn.dropout(): to be deprecated 01:34 - tf.nn.experimental.stateless_dropout() 03:29 - Ending notes # # TensorFlow Guide # stateless_dropout: www.tensorflow.org/api_docs/python/tf/nn/experimental/stateless_dropout
119: silu | TensorFlow | Tutorial
Переглядів 397 місяців тому
The video discusses in TensorFlow: tf.nn.silu() 00:00 - Start 00:49 - tf.nn.silu() 02:44 - Ending notes # # TensorFlow Guide # silu: www.tensorflow.org/api_docs/python/tf/nn/silu
118: gelu | TensorFlow | Tutorial
Переглядів 317 місяців тому
The video discusses in TensorFlow: tf.nn.gelu() 00:00 - Start 01:00 - tf.nn.gelu() 04:34 - Ending notes # # TensorFlow Guide # gelu: www.tensorflow.org/api_docs/python/tf/nn/gelu
117: selu | TensorFlow | Tutorial
Переглядів 137 місяців тому
The video discusses in TensorFlow: tf.nn.selu() 00:00 - Start 00:59 - tf.nn.selu() 03:17 - Ending notes # # TensorFlow Guide # selu: www.tensorflow.org/api_docs/python/tf/nn/selu
115: leaky relu | TensorFlow | Tutorial
Переглядів 377 місяців тому
115: leaky relu | TensorFlow | Tutorial
111: fractional avg pool | TensorFlow | Tutorial
Переглядів 207 місяців тому
111: fractional avg pool | TensorFlow | Tutorial
110: fractional max pool | TensorFlow | Tutorial
Переглядів 567 місяців тому
110: fractional max pool | TensorFlow | Tutorial
109: max pool 3d | TensorFlow | Tutorial
Переглядів 508 місяців тому
109: max pool 3d | TensorFlow | Tutorial
108: max pool 2d | TensorFlow | Tutorial
Переглядів 358 місяців тому
108: max pool 2d | TensorFlow | Tutorial
107: max pool 1d | TensorFlow | Tutorial
Переглядів 628 місяців тому
107: max pool 1d | TensorFlow | Tutorial
106: max pool | TensorFlow | Tutorial
Переглядів 318 місяців тому
106: max pool | TensorFlow | Tutorial
105: batch normalization | TensorFlow | Tutorial
Переглядів 878 місяців тому
105: batch normalization | TensorFlow | Tutorial
104: average pool 3d | TensorFlow | Tutorial
Переглядів 198 місяців тому
104: average pool 3d | TensorFlow | Tutorial
103: average pool 2d | TensorFlow | Tutorial
Переглядів 598 місяців тому
103: average pool 2d | TensorFlow | Tutorial
102: average pool 1d | TensorFlow | Tutorial
Переглядів 218 місяців тому
102: average pool 1d | TensorFlow | Tutorial
Bro can we can projects codes
thanks for explaining in simple language. Looking for function transformer applied to each feature...
Thank you. Glad to hear it helped.
Great work sir
Thank you. Appreciate your support.
What a waste of time!!
I understand that the video just mentions what a PermissionError. If you a have specific question, please feel free to ask.
Could you please consider creating the LLM playlist, sir, since you are providing us with such valuable and helpful videos? Thanks a lot ❣❣
Thank you for your suggestion and interest! I'm currently working on a series called 'DL Math.' Stay tuned-I'll be covering topics from an introduction to basic algebra all the way up to LLMs. Excited to share this journey with you!
Sir, I like your content Please make Deep Learning Series also
Thank you. Your support means a lot. Sure I will.
Hi Nilesh, I am just starting this playlist. Many thanks for sharing all these contents on your channel.
You are welcome! I hope you find the series helpful. Feel free to post comments if you have any questions along the way. Thanks for watching and supporting the channel!
could you give me the script it is much more advantage for us
Sure. The code and examples are derived from BigQuery docs: cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax Hope it helps! Thanks for watching.
thank you for a valuable information
Glad it was helpful!
i usually dont comment under videos, but youre really good at teaching, helped me alot!! thank you sir :D
Thank you so much. I am happy to hear that the videos were helpful. Your support means a lot to me.
grt job sir
Thank you so much. Appreciate your support.
Great work, Thanks
Thank you for your support.
how can we assign the corresponding labels in the multiple csv files section?
Hoping below helps. Thank you for watching! #-------------------------------------------------------------- # Create multiple CSV files #-------------------------------------------------------------- import os import pandas as pd # Create directory for CSV files os.makedirs("csv_data", exist_ok=True) # Create sample CSV files for i in range(1, 4): df = pd.DataFrame({ "feature1": [i*10, i*20, i*30], "feature2": [i*40, i*50, i*60], }) df.to_csv(f"csv_data/file_{i}.csv", index=False) #--------------------------------------------------------------------------------------------- # Read multiple CSV files into a dataset and attach label #--------------------------------------------------------------------------------------------- import tensorflow as tf import pathlib # Path to the CSV files csv_path = pathlib.Path("csv_data") # List all CSV files csv_files = list(csv_path.glob("*.csv")) def load_and_label(file_path): # Read the CSV file file_name = tf.strings.split(file_path, os.sep)[-1] label = tf.strings.regex_replace(file_name, ".csv", "") # Load CSV content content = tf.data.experimental.CsvDataset( file_path, [tf.float32, tf.float32], header=True ) # Add label to each row return content.map(lambda *row: (row, label)) # Create a dataset of file paths file_dataset = tf.data.Dataset.from_tensor_slices([str(f) for f in csv_files]) # Interleave the datasets and assign labels labeled_dataset = file_dataset.interleave( lambda file: load_and_label(file), cycle_length=len(csv_files), num_parallel_calls=tf.data.AUTOTUNE ) # View the dataset content for data, label in labeled_dataset: print("Data:", data) print("Label:", label.numpy().decode()) #--------------------------------------------------------------------------------------------- # Output #--------------------------------------------------------------------------------------------- Data: (<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=120.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=10.0>, <tf.Tensor: shape=(), dtype=float32, numpy=40.0>) Label: file_1 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=40.0>, <tf.Tensor: shape=(), dtype=float32, numpy=100.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=150.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=50.0>) Label: file_1 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=120.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>) Label: file_1
Keep going thanks for the knowledge🎉
Thanks for the support!
thank, very clear, very important for strong foundation in deep learning
Appreciate your support. Thanks for watching.
Could you please share the code repository?
The code is available at the link in the description. github.com/learndataa/shared Thanks for watching!
where is the table name?
The 'FROM' statement usually has the table name. In the example in video, the table is directly create in the FROM statement, hence no table name is needed. Thanks for watching
thanks a lot! is the jupyter notebook available?
All the notebooks from scikit-learn docs are available at link below: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples/gaussian_process While these notebooks may not be exactly those in the video, the are well commented. Hope it helps!
@@learndataa great and thanks a lot!!!
@@learndataahowever I get an error using your link 😕
@@user-ce3ip5lx9t I was able to recreate the error when not logged in to Github. Could you try logging in?
@@learndataa in github I see your scikit learn repository but it appears empty to me
result is not visible
Apologies. Unfortunately, the video frame got clipped. Hope the code is still helpful.
If you agree , answer me please
I have posted a rely to the earlier comment. Thanks for watching!
Hello , Dear . I don’t know your name . About two weeks I learn basic Ai . Firstly I started to learn Python , numpy , pandas and data analysis , but I need teaching . If you have some free time two times a week , can you help me learning AI tools ? 😢
Sorry for the delayed reply. First welcome to the world of AI and Data Science. To answer your question in short, I think you are already on your way and on track. May be all that is needed now is practice, practice and some more practice. I see that you have already self taught Python. The same would work for AI tools as well. Below are few personal thoughts on getting up to AI tools. Again, the route to get there may vary based on background, experience, learning style and time available for practicing code. Chances are that you may already know steps below. (Analysis) Step-1: Learn Python basics Step-2: Learn Numpy, Pandas (in detail), and Matplotlib Step-3: Try analyzing open source datasets available online: UCI ML Repository etc. Step-4: Practice, practice and practice [Note: If learning without any prior coding background. May take 6 months. The Beginner series on this channel will cover all of the topics needed.] (Machine Learning) Below assumes prior basic background in Math, Algebra and Calculus Step-5: Learn the theory of ML (Andrew Ng's Course on Machine learning) Available for free on youtube. Step-6: Begin learning implementations in scikit-learn [Intermediate course on the channel] Step-7: Practice, practice and practice Step-8: Continue learning ML fundamentals [Note: May take about 6 to 8 months, without prior ML background.] (Deep Learning) Step-9: Learn the theory of DL (again Andrew Ng's course is good start; there are others as well) Step-10: Learn a framework of your choice. On this channel TensorFlow, Keras is covered so far. Step-11: Practice, practice and practice. [Note: May take one or two semesters; 6+ months] Overall, to answer your question, I think putting in more practice time may help. Trying to understand what each line of code does makes a huge difference. I believe learning to code and getting good at AI/ML is a marathon!!! Just keep going and do not give up!!! You will get there!!! If you have any questions or suggestions, please feel free to post them as comments on the videos, I'll try to reply as best as I can. Hope it helps.
wow, this is very good informative video
Thanks for watching and your support. It means a lot.
Thank you. The Tutorial is helpful!
Appreciate your support. Thanks for watching!
Can we do this without using the UNPIVOT() clause?
Thanks for watching the video. Code may help: WITH sales_data as ( SELECT 1 AS product_id, 10 AS Q1, 15 AS Q2, 20 AS Q3, 25 AS Q4 UNION ALL SELECT 2, 5, 10, 15, 20 ) /* # Option-1: UNPIVOT SELECT product_id, quarter, sales FROM sales_data UNPIVOT ( sales FOR quarter IN (Q1, Q2, Q3, Q4) ) */ /* # Option-2: UNION ALL SELECT product_id, 'Q1' AS quarter, Q1 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q2' AS quarter, Q2 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q3' AS quarter, Q3 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q4' AS quarter, Q4 AS sales FROM sales_data */ # Option-3: UNNEST and STRUCT SELECT product_id, quarter, sales FROM sales_data, UNNEST([ STRUCT('Q1' AS quarter, Q1 AS sales), STRUCT('Q2' AS quarter, Q2 AS sales), STRUCT('Q3' AS quarter, Q3 AS sales), STRUCT('Q4' AS quarter, Q4 AS sales) ]) AS t
This is incredibly helpful. Thank you!
Glad to hear it! Appreciate your support. Thanks for waching!
Thanks for the video. I have a question about scikit learn GP. I have multiple observations of the heart pressure traces. Can it be fitted to a single Gaussian Process to capture the uncertainty among multiple observations. I need multiple observations to be fitted to a single GP. But When I use scikit learn to fit, I am getting mean and covariance matrix for each pressure trace !! Thank you :)
Appreciate your support. It means a lot. Thanks for watching. To answer your question, I've tried to put together a code below. Hope it helps! import numpy as np import matplotlib.pyplot as plt from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C # Create data np.random.seed(42) X1 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 1 X2 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 2 X3 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 3 # Pressure waves y1 = np.sin(X1).ravel() + np.random.normal(0, 0.1, X1.shape[0]) y2 = np.sin(X2 - 1).ravel() + np.random.normal(0, 0.1, X2.shape[0]) y3 = np.sin(X3 + 1).ravel() + np.random.normal(0, 0.1, X3.shape[0]) # Put it together y = np.concatenate([y1, y2, y3]) X_combined = np.vstack([X1, X2, X3]) # Create a kernel: Constant kernel * RBF kernel kernel = C(1.0, (1e-4, 1e1)) * RBF(1.0, (1e-4, 1e1)) # Initialize and fit GP gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10) gp.fit(X_combined, y) # Predict mean, covariance = gp.predict(X1, return_cov=True) # SD and COV std_dev = np.sqrt(np.diag(covariance)) # Plot plt.figure(figsize=(10, 6)) # Original data plt.plot(X1, y1, 'r.', markersize=10, label='Observation 1') plt.plot(X2, y2, 'g.', markersize=10, label='Observation 2') plt.plot(X3, y3, 'b.', markersize=10, label='Observation 3') # Predicted GP mean plt.plot(X1, mean, 'k-', label='GP Mean') # CI of GP plt.fill_between(X1.ravel(), mean - 1.96 * std_dev, mean + 1.96 * std_dev, color='gray', alpha=0.2, label='95% Confidence Interval') plt.title('Gaussian Process Regression on Multiple Observations') plt.xlabel('Time') plt.ylabel('Pressure') plt.legend() plt.show()
The result is shwoing number.x in row the value is 1. Its not showing curly bracket in the resulat window
Apologies for delayed reply. Trying to understand the question. At 2:18 in the video, the output table i.e result is {"x":"1"} in the "number" column. Could you please elaborate! Thanks for watching!
I need to talk to you regarding this project how to connect with you
Apologies for delayed reply. Thanks for reaching out! If you have specific questions about the project please feel free to ask here, and I'll do my best to answer.
help!! i REPLICATE DE CODE AND ITS NOT WORKING :(
I would need the python version, line that causes the error and the error message?
i followed your series , its very amazing
Thank you for your support. Happy to hear that you enjoyed the series.
How to extract this data?
For the video, data was tabulated manually page by page. You could try their API's such as: developer-docs.amazon.com/amazon-business/docs/product-search-api-v1-reference
Could i have the source code please, sir?
Code is available in the repository: github.com/learndataa/examples/tree/master/lite/examples Thanks for watching
from line 45 its not working . kindly help
It is difficult to answer the question without more information such as error. Line # 45 @ 22:18 data['nutrient'].head(2) Try checking the shape of DataFrame 'data' to make sure it is correct. Thanks for watching!
Yes I have done it like this itself but it is showing error … it would be better if I could show you the error .
could you post the error traceback?
I am not able to get the explorer page
The explorer panel should be accessible after you login to BigQuery at link below BigQuery: console.cloud.google.com/bigquery Docs: cloud.google.com/bigquery/docs/sandbox Hope it helps!
Hi realy nice video. Can you share the jupyter notebook?
Thank you for your support. All the code is derived from the docs . Below are 300+ notebooks from docs with original code and description (not directly from the video). Hope it helps! scikit-learn code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples
Where in this code would you split the data into training, testing, and evaluation sets? Would it be after concatenating the preprocessed inputs but before the line "titanic_preprocessing = tf.keras.Model(inputs, preprocessed_inputs_cat)"? Can you give an example of how this would be done?
The data should be split before any preprocessing begins to avoid information from test set getting to the model building process. This could lead to misleading performance of the final model. Depending on the data I would do a (stratified shuffle split to get train, validation (if needed) and test set right after the line below: titanic = pd.read_csv("storage.googleapis.com/tf-datasets/titanic/train.csv") CSV data --> split (train, validation, test) --> write code to preprocess train data --> put preprocess code in function or pipeline --> call function to preprocess train/validation --> iterate/optimize to get final model --> final model ready --> run the test data through same function used to preprocess train data --> use test data function output as input to final model for prediction Split could be something like the code below using random index or much easier using sklearn.model_selection.train_test_split(). ########## # Code ########## # (not stratified) # Trying to split data into train, validation and test set titanic = pd.read_csv("storage.googleapis.com/tf-datasets/titanic/train.csv") X = titanic.drop(columns=['survived']) y = titanic['survived'] # Define the number of samples in the dataset num_samples = len(X) # Define the ratios for train-validation-test split train_ratio = 0.6 val_ratio = 0.2 test_ratio = 0.2 # Compute the number of samples for each set train_size = int(num_samples * train_ratio) val_size = int(num_samples * val_ratio) test_size = num_samples - train_size - val_size # Shuffle the indices shuffled_indices = np.arange(0,num_samples) #<--not yet shuffled np.random.shuffle(shuffled_indices) #<-- inplace shuffle # Split the shuffled indices into train, validation, and test sets train_indices = shuffled_indices[:train_size] val_indices = shuffled_indices[train_size:train_size+val_size] test_indices = shuffled_indices[train_size+val_size:] # Split X_train = X.iloc[train_indices, :] y_train = y.iloc[train_indices] X_val = X.iloc[val_indices, :] y_val = y.iloc[val_indices] X_test = X.iloc[test_indices, :] y_test = y.iloc[test_indices] print("X_train:", X_train.shape) print("y_train:", y_train.shape) print("X_val:", X_val.shape) print("y_val:", y_val.shape) print("X_test:", X_test.shape) print("y_test:", y_test.shape) # Begin preprocessing ...
Thank u so much .... Can I send the full code to me please?
Thanks for watching. All the code in. this series is derived from the examples in the docs. I have created a new repository at the link below that has a compilation of ~302 notebooks from scikit-learn. Although these are not directly from the videos, they are better commented and descriptive. Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples
Can you further clarify the purpose of using tf.data.Dataset.from_tensor_slices? Why is it used and how does it change the dataset into a useful format?
From what I understand: "from_tensor_slices": - especially for larger datasets - creates a separate tensor for each row of input to make it easier to iterate and batch process - so if input has 3 rows and 4 columns (features), it would create 3 different tensors in the dataset for each row with 4 columns. "from_tensors": - especially for smaller datasets - creates just one tensor - so if input has 3 rows and 4 columns (features), it would create 1 tensor with all 3 rows and 4 columns ######## # Code ######## # Import libraries import tensorflow as tf import numpy as np # -------------------------------- # from_tensor_slices: input array # creates 3 arrays in dataset # -------------------------------- data = np.random.randn(3,4) dataset = tf.data.Dataset.from_tensor_slices(data) for element in dataset: print(element) # Output tf.Tensor([ 1.6346394 1.13362992 0.42821694 -0.15339032], shape=(4,), dtype=float64) tf.Tensor([ 0.90122249 0.27264101 0.26286328 -1.14954752], shape=(4,), dtype=float64) tf.Tensor([-0.27845238 -0.78464886 -0.11236994 -0.18858366], shape=(4,), dtype=float64) # -------------------------------- # from_tensor_slices: input tensor # creates 3 arrays in dataset # -------------------------------- data = tf.random.uniform(shape=[3,4]) dataset = tf.data.Dataset.from_tensor_slices(data) for element in dataset: print(element) # Output tf.Tensor([0.38493872 0.44316375 0.14045477 0.8924254 ], shape=(4,), dtype=float32) tf.Tensor([0.7913748 0.9827099 0.8950583 0.36067998], shape=(4,), dtype=float32) tf.Tensor([0.65940714 0.5389466 0.7395221 0.8307824 ], shape=(4,), dtype=float32) # -------------------------------- # from_tensors: input array # creates 3 arrays in dataset # -------------------------------- data = np.random.randn(3,4) dataset = tf.data.Dataset.from_tensors(data) for element in dataset: print(element) # Output tf.Tensor( [[-1.41672221 0.81045198 -0.3883847 -0.86726604] [ 0.69639162 -1.14857263 0.37013669 0.56729552] [-0.1541059 0.09261183 -0.00200572 -0.12433269]], shape=(3, 4), dtype=float64) # -------------------------------- # from_tensors: input tensor # creates 3 arrays in dataset # -------------------------------- data = tf.random.uniform(shape=[3,4]) dataset = tf.data.Dataset.from_tensors(data) for element in dataset: print(element) # Output tf.Tensor( [[0.30948663 0.27289176 0.6494436 0.7968806 ] [0.10863554 0.36693168 0.18443334 0.07225335] [0.2699784 0.26086116 0.88859296 0.03361833]], shape=(3, 4), dtype=float32) Thanks for watching!
Hi Learndataa! Quick question, why do you code in Google Colab vs Jupyter or another IDE?
It depends! Google Colab: Feels easier for deep learning to later on use free GPU (TensorFlow, Keras, GPU) or larger datasets Jupyter: Easiest to learn analytics on a local computer (Python, numpy, pandas, matplotlib, scikit-learn) PyCharm: Steeper learning curve to teach/learn analytics Thanks for watching!
Helloo! I had a doubt. Is there a way to augment data after loading it using 'image_dataset_from_directory'? I am getting input shape errors and all other stuff. Can you please help? Thank you!
In the code below images are retrieved from directory to augment. The directory needs to have a structure such as: image_data |___ train |__ class_1 |__ class_2 |__ class_3 |___ validation |__ class_1 |__ class_2 |__ class_3 |___ test |__ class_1 |__ class_2 |__ class_3 Check link below for further details: www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory ################## # sample code ################## import tensorflow as tf from tensorflow.keras.layers.experimental import preprocessing import matplotlib.pyplot as plt # Define the directory containing your images image_dir = "/content/data2/" # Create an image dataset from the directory image_dataset = tf.keras.preprocessing.image_dataset_from_directory( image_dir, batch_size=32, # You can adjust batch size as needed image_size=(224, 224), # Adjust image size as needed shuffle=True, ) # Define your data augmentation pipeline data_augmentation = tf.keras.Sequential([ preprocessing.Rescaling(1./255), # Rescale pixel values to [0,1] preprocessing.RandomFlip("horizontal"), # Random horizontal flip preprocessing.RandomRotation(0.2), # Random rotation with 20% angle preprocessing.RandomZoom(0.2), # Random zoom with 20% zoom range # Add more preprocessing layers as needed ]) # Apply data augmentation to the image dataset augmented_dataset = image_dataset.map(lambda x, y: (data_augmentation(x), y)) # Define a function to visualize augmented images def visualize_augmented_images(dataset): plt.figure(figsize=(10, 10)) for images, labels in dataset.take(1): for i in range(9): # Visualize the first 9 images ax = plt.subplot(3, 3, i + 1) plt.imshow(images[i].numpy().astype("float32")) plt.title(f"Class: {labels[i].numpy()}") plt.axis("off") plt.show() # Visualize the augmented images visualize_augmented_images(augmented_dataset) # Now you can use augmented_dataset for training # .... Hope it helps.
@@learndataa Thank you so much! This helps a lot!❤️
How would the training loop be different if you had a large dataset and wanted to load it in small batches?
Does below help! 00:30:52 - Train and evaluate the model: set batch size ua-cam.com/video/avWkSfSZwhc/v-deo.html Thanks for watching.
When triggering a rolling regression (or for that matter, a simple mean) with a window size of 10, I wonder if you could show us how I can start the rolling process at a particular time date, let's say, '2024-01-02 9:31:00' .
How about getting a subset from '2024-01-02 09:31:00' (code below)!. Thanks for watching! ##### # Code ##### import pandas as pd # Create data date_range = pd.date_range(start='2024-01-01', end='2024-01-10', freq='H') data = {'value': range(len(date_range))} df = pd.DataFrame(data, index=date_range) # Specify the start time start_time = pd.Timestamp('2024-01-02 09:31:00') # Select subset of data starting from start_time subset = df.loc[start_time:] # Calculate rolling mean or regression on the subset window_size = 10 rolling_result = subset['value'].rolling(window=window_size, min_periods=1).mean() print(rolling_result)
Thank you guys!!!
In your answer the date_range was only for 10 days. My project deals with a large data unfortunately. How would you handle the situations when the date range is for longer than 1 year with periods = 1 minute? Can you help me? I look forward to getting an answer from a pro. Thanks again.
I'll look into it. If a window has half a million records that would be memory intensive.
With a window size of half a million it may need a GPU. In the code below, I have tried to use the cudf and cupy libraries. Note the code ran out of free GPU memory of 15 GB. That would be expected because we are looking at an array of of size (365*24*60)x(365*24*60) i.e. about 554,429,160,000 (554 billion) values! I may be wrong here but we may be looking at 4000 GB of RAM! I am sorry I do not have an answer at this moment. But GPU would be the way to go! and writing a custom code and/or exploring pre-built libraries such as "cudf". ############# # example code ############# ### Install cuda (if not installed) #!pip install cudf-cu12 --extra-index-url=pypi.nvidia.com ### Import libraries import pandas as pd import numpy as np import math import cudf import cupy as cp ### Create date range # Large dataset t = pd.date_range(start='2000-01-01', end='2030-12-31', freq='1 min') # Small dataset #t = pd.date_range(start='2030-01-01', end='2030-01-02', freq='1 min') print(t.shape) # Create dataframe df = cudf.DataFrame({ 'x': np.random.randn(len(t)), 'y1': np.random.randn(len(t)), 'y2': np.random.randn(len(t)), 'y3': np.random.randn(len(t)), }, index = t ) print(df.shape) df.head(3) ### Calculate covariance by window window_size = 365*24*60 # 1 year rolling window size in minutes rolling_covariance = [] for i in range(window_size, len(df)): window_df = df.iloc[i - window_size:i] # Get the rolling window window_values = window_df.values.get() # Convert cuDF DataFrame to NumPy array window_values = window_values.T # Transpose to ensure proper shape for covariance calculation covariance_matrix = cp.cov(window_values) # Calculate covariance matrix using cuPy rolling_covariance.append(cudf.DataFrame(covariance_matrix, index=window_df.index[-len(covariance_matrix):])) rolling_covariance_df = cudf.concat(rolling_covariance) print(rolling_covariance_df.shape)
Thank you for clear n crisp explanation, exactly what I was stuck upon..
Thank you. Glad it helped.
Hi thanks for the tutorials. If possible can you also provide the jupiter notebook links or git link for the notebook where we can find the notebook for all tutorials. These are really helpful thanks for your efforts.
Thank you for your support. All the code is derived from the docs. Below are 300+ notebooks from docs with original code and description. Hope it helps! Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples
PLS discriminant analysis please
Something like this! plsda = Pipeline([ ('pls', PLSRegression(n_components=n_components)), ('classifier', LogisticRegression()) ]) where, - PLSRegression: for multicollinearity, high dimensionality - a classifier (logistic, svc, RF): for classification Thanks for watching!
Hi Nilesh, I really appreciate what you did here. It's an enormous effort to put together such a vast material. It's a shame the videos have so low number of views, your channel is hugely underrated. If by any chance you could share the scripts in any way that would be great. I wish you all the bestT
Thanks a bunch for your awesome feedback and support. All the code in. this series is derived from the examples in the docs. I have created a new repository at the link below that has a compilation of ~302 notebooks from scikit-learn. Although these are not directly from the videos, they are better commented and descriptive. Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples If you have any questions regarding any specific video please feel free to post a comment. Thanks again for the support, and hope you enjoy diving into the series!
@@learndataa Thanks a lot and all the best!!!
thanks. clear!
You're welcome!
Thank you! Your videos are very helpful!
You are welcome. Glad you found the videos helpful.
Hello, thank you for this series. I have a question. Why do we have 3 different functions we sample our priors from? Is the idea that we're sampling each point 3 times and we're using our functions to generate the 3 values at each point? In practice, do we expect that sampling the same point multiple times will result in different values due to normally distributed noise? Are we capturing the mean of those 3 output values at the sample point and determining the mean of those points?
Thanks for watching! (1) Why do we have 3 different functions we sample our priors from? - Because we want to explore different possibilities or hypotheses about the underlying data generating process. Each sampled function represents a possible function that could describe the data. (2) Is the idea that we're sampling each point 3 times and using our functions to generate the 3 values at each point? - Yes, exactly. When we sample three functions from the Gaussian Process prior, we are effectively generating three sets of function values at each point in the input space. These values represent different possible outcomes or predictions for the target variable. (3) In practice, do we expect that sampling the same point multiple times will result in different values due to normally distributed noise? - Yes, that's correct. In Gaussian Process regression, the function values are distributed according to a multivariate Gaussian distribution. This implies that for the same input point, we would expect different function values in general due to the randomness introduced by the Gaussian noise. (4) Are we capturing the mean of those 3 output values at the sample point and determining the mean of those points? - Yes, in practice, we can capture the mean of the sampled function values at each point to estimate the mean function of the Gaussian Process. This mean function represents the expected value or average behavior of the target variable at each point in the input space. Additionally, we can also compute other statistics such as the variance to assess uncertainty in the predictions. Thus, sampling multiple functions from the Gaussian Process prior allows us to explore different hypotheses about the data, and by observing the variations in function values across samples, we can estimate the mean function and assess uncertainty in our predictions. Hope it helps!