💻 Build this Flu Shots Dashboard with SQL and Tableau!
Вставка
- Опубліковано 21 лип 2024
- In this tutorial we build a SQL extract to plug into Tableau so that we can build a flu shots dasboard! This is a good example of how I might approach a project in my day to day work.
To follow the SQL tutorial, check this video out if you haven't already, as you will need it to install PostgreSQL, pgAdmin, and the data files for this lesson: • 👉 SQL Basics with Heal...
If you would like to skip ahead to the Tableau tutorial, click on 31:14. If you do skip the SQL tutorial, just remember to download the csv spreadsheet; you can get it at my website: datawizardry.academy/flu-shot...
Update Sept 2023: My formula in the video for age is setup incorrectly (kudos to @JoelTorres1237 for finding the bug). Try using this formula instead for the age calculation
EXTRACT(EPOCH FROM age('2022-12-31',pat.birthdate)) / 2592000
Update Oct 2023: I realize that I around 14:00, I forgot to pull in the age calculation as one of the columns I'm selecting. Which should be something like this:
extract(YEAR FROM age('12-31-2022', birthdate)) as age
-------------------------------
HTML codes I used:
#57ecb2
#5c5a74
#005f9c
#2b3048
31:14 Skip to Tableau Lesson
------------------------------------------------
00:00 Intro
01:13 SQL Setup
01:55 SQL Demo
28:18 Tableau Setup
33:00 Tableau Demo
My GitHub SQL code:
github.com/Data-Wizardry/Yout...
My Tableau file is downloadable here:
public.tableau.com/app/profil...
Note: all of this data is synthetically generated (i.e. fake), and comes from a project called Synthea. No real patient data was used in this lesson.
// ABOUT DATA WIZARDRY:
Hey I'm Josh! I've been a data analyst in the healthcare industry since 2015. Join me as I teach you how to get a job as a data analyst. I do SQL, Python, Tableau, and Networking tutorials.
// BECOME A CLINICAL DATA ANALYST!
My course teaches you everything you need to know in just 10-12 weeks:
datawizardry.academy/clinical...
🔴 UA-cam: / @datawizardry
🟢 TikTok: / data.wizardry
🔵 LinkedIn: / joshuamatlock
🔗 Website: datawizardry.academy
#sql #tableau #dataanalytics #dataanalyst #healthcaretechnology #healthcare #career - Наука та технологія
👋 when using extract ( ) function on the age ( ) are you only pulling the months value of the age and not the total number of months of patient. For example if age( ) results in 21 years 2 months 11 days by extracting your result will be 2 and not the actual age in months.
You are correct! Good catch! and thanks for pointing out my error. I should have used something like this instead:
EXTRACT(EPOCH FROM age('2022-12-31',pat.birthdate)) / 2592000
This should give you the age in seconds, the 2592000 will divide by seconds in a month to give you the age in months.
However, the tutorial is going to result in a something a little different once folks correct that part of the code because my tutorial is based off of the version with the incorrect code.
My apologies for the inconvenience
No worries, it's the same process just different data. I only caught it because I'm using SQL Server and I had to use datediff( ) to calculate age in months. Anyhow, looking forward to your next courses and/or academy.
Dead patients can't get flu, that got me, hahaha. But I definitely appreciate how in-depth you are and the time you take to break things down! I've been following along with you vids and hope to break into Healthcare Data Analytics soon, thanks Josh!
If anyone else is having an issue where the map is not being filled with the table entries at 47:37, click on the grey 14 or 8 unknown button at the bottom right (that's why in the video it felt a little bit empty).
From there, you must set the Country as United States and the state as Massachusetts. It will auto find the correct counties.
After that, click the Pin looking icon to fix the map to focus on the correct location.
Thank you so much Josh! This is an amazing tutorial
Thank you for the informative video! Appreciate your teaching skills.
I’ve been looking for something just like this! Thanks for the great video
This video helped me understand CTE's/Subqueries a bit better. I find this the most challenging aspect of learning SQL - most tutorials do not include the one-to-many relationship explanation. Cannot wait until the spring clinical analytics session.
this is awesome! I have been looking for tutorials how to send the data from sql to tableau!
Such a great video Josh!! I've been interested in getting into data analytics but your channel has made me so much more excited about doing my own projects and breaking into the field. Looking forward to more videos and courses in the future! :)
Glad you're liking the channel!
Thank you so much!
My pleasure!
Another Project added to my portofolio
Is this what a real world dashboard would look like when you build it out for your stakeholders? Great video as always!
Glad you like it! Yes, I tried to make the dashboard as applicable to real life as possible, and what stakeholders might be looking for.
This is a simpler example though, often times my dashboards at work will have more pages than this.
Now I can see the tables panel after I clicked tableau icon on the right top menu. I just don't know how disappeared/appeared the tables and the filter options to do calculation. Baby steps here.
@27:20 your output shows 'Dorthea Reichel' as the first output from the query. I am getting 'Hedy Von' as my first output from the main query. PS- I am using MYSQL. I checked manually and I believe that there should be three more entries before 'Dorthea Reichel'.
I got stuck in the percentage. I don't see the data table showing the columns fields on the left side and when you click on the filter top right to do calculation but I don't see on my end to follow your calculations. Why I don't see the table data panel just like yours
The Map does not show any shading when I give Avg flu shots by county. Can you help me with the issue?
I finished the project following you, and it's almost 100% like yours. Here is the problem I had: the figures percent does not centralized on the bars race and age, I did over and over following your step, it does not center like yours. Another problem. On the Area map, when I hoover the edge of the mountain, the blue part from jan to december, the tool tip does not show the increase number of cases, it keeps showing always the same number which is 4,927, from jan to dec it keeps showing 4,927 instead of showing increasing number of cases according to the months. What did I miss?
Thank you for this, Josh. Your video was incredibly helpful. I have a couple of questions as I work on my project.
Do you recommend including a data profiling step to identify null values or cleaning the data while documenting the project?
I'm curious about the tools you use for this process in real-world SQL or Pandas.
Lastly, when working with datasets from Synthea and similar sources, does the data consistently produce accurate and reliable results?
Data profiling can be a great first step in exploring how clean or dirty your data is, as well as getting some information about the data in the columns that you're working with. Pandas can be a good resource for this.
That said, I don't usually do data profiling because I tend to work with the same data a lot, and am already quite familiar with which columns will be null, and such.
Regarding Synthea, it is just a fake-patient generator; it is definitely consistent, because you can regerate the same results with a given seed value. As far as "accuracy", it is accurate in the sense that it emulates disease prevalence of real world diseases. And it also does a great job of emulating what an EMR would look like. Though there are still lots of improvements that need to be implemented into the tool to make more realistic looking data.
are you pressured to think very thoroughly b4 running statements? as a starter I run my statement with carefreeness and trials and errors.
do you see any link between SQL and PHP? PHP have my interest in it so I wonder if any data roles collide with PHP..
Typically, there is very low pressure in writing SQL queries if they're of the SELECT FROM variety, which is what I almost always write.
I get more nervous if I have to run UPDATE, or INSERT, which deals with maintaining raw tables that could be utilized by other people, and for me, is very, very rare, and usually handled by the data engineers of my organization. And even then, there are usually safeguards, like GIT versioning control to rollback any deterimental changes.
Regarding PHP, I don't know much about it, other than its use in web development. I suppose you could be using it a lot if you were somehow involved in digital analytics in healthcare.
I’m at the very last part @ 1:17:43 .
My Filters box is empty so I can’t right click on anything to add to context. Do you know what could be the issue ?
Yes, @1:17:01, make sure you follow the steps to click on that funnel icon. That should create what is callled an action filter. Then click on a county. After that, you should be able to follow the steps @1:17:43
This missing Age column people are mentioning seems to point to a bigger inconsistency. The Tableau tables on Sheet 1 listed after the text upload from part one @29:47 don't match the Tableau tables on Sheet 1 listed after the data link @34:12? Birthdate and Patient are gone. Gender and Age have been added. Trying to figure the discrepancy out now.
Yes, I think there are a few errors in my tutorial. Was having an off day when I made this!
My errors:
1. I Forgot to pull in the age calculation into the final query before plugging into Tableau. That should be something like: extract(YEAR FROM age('12-31-2022', birthdate))
2. I didn't use the proper age calculation for the filter that kicks out patients less than 6 months of age. Should be something like this which tracks months old: EXTRACT(EPOCH FROM age('2022-12-31',pat.birthdate)) / 2592000
3. Forgot to show the pulling of gender (though this part isn't needed because I decided later not to use gender in the final dashboard, since all the patients in this dataset are female).
My apologies for the inconvenience! Thanks for pointing this out. I'll add some notes to the video description.
Also the column "patient" is simply the identification number for the patient. Which I don't think I ended up using in the Tableau tutorial, so its absence shouldn't affect the tutorial. (I think I did the same thing with Ethnicity too).
Hello Josh, thanks for an amazing informative video. I have a question, it would be great if you could help me answer that, let's say the data is in real-time and we need to create a dashboard based on a real data set.
Every time the data is updated (say we need flu shots information on the dashboard for 2023 and information about patients will keep adding for the existing year), I need my SQL to refresh it and the tableau shows the updated information in the dashboard. Is there any way to do it? If yes, how to do that? Thanks!
Good question.
In reality you'll probably be working at a company that has the sql tables update with new data everyday. So if you run a query against that, as long as you aren't filtering down to a fixed time range, you should see new data come in each day when the sql query is run.
What that means for tableau is, everytime Tableau refreshes, it would pull in those new cases. There's a few ways where you could pull in this data (using Tableau Desktop, and Tableau Server which is probably what your company would have, and this is going to have additional options we don't see in the tutorial):
1. Just click the refresh button. The underlying SQL is run in Tableau, and new cases trickle in.
2. Publish to Tableau server with the "extract" button enabled in data source; the data would then be updated on a specified basis (daily, weekly, monthly, etc).
3. Publish to server with the "live" button enabled on data source. This would make it continuously look for, and implement the new cases when detected.
@@DataWizardry Got it!! Thank you so much!
I HAD PROBLEM AGE RANGE IN CALUCALTION IN TABLEAU , ig the sql querry doesnt create age am i wrong?
same... I thought I was going crazy ...it's saying the string is not defined...
Hi, great video, yet again. May I know when you would be ready with your course ?
Hi, I have two things I'm working on:
1. A free intro to Tableau UA-cam Video
2. My clinical data analytics accelerator course
#1 should be complete either later this month or early next,
#2 will take longer, probably ready sometime in the fall.
I was asking about the 2nd one. Are there any further details that you can provide now?
Asking because I'm planning to shift my career towards healthcare DA.@@DataWizardry
@abhishekrajeev1028 I hope to have a beta version ready by sometime in the fall.
It will cover SQL, Tableau, and networking. There will be lots of practice problems, and guided tutorials like this (but more in depth) that you can put in your portfolio.
It will be designed to help you get a entry level job as a data analyst in a hospital or clinic.
@@DataWizardry looking forward !!!!
Are those requirements always provided on a paper or you are just told to do so aurally ?
Usually I will get requests verbally where me and a service line manager or clinician meet virtually to discuss the thing they need. I'll ask clarifying questions, and figure out what metrics need to be built.
Sometimes though, if the request is simple, it will just be emailed to me.
Is running SELECT statements on large database time-consuming?
Yes, it definitely can be. Some tables have millions of records. If you find yourself working with large tables, you can use LIMIT 100 at the end of a query to return a sample of 100 records to make it load more quickly.
Can I add this project to my portfolio? Would I get flagged? Or should I find another data set to use to create my own. If so where can I find other healthcare data sets?
You have my permission to use this for your portfolio!
Josh, God gives you all the blessings in double. Can you make a video about covid: example: how many people got, how many got cured, how many got covid even taking the " ", country, county, etc, how many times got covid, blah, blah. Something like your flu shot and more.
do you think I could intern as a 'part-time' data analyst while I have a busy college schedule (1st year)? What advice you can give me?
given the current economic transitioning, and with the IT industry prospect from your perspective...
Possibly, but what I would recommend instead is, see if you can volunteer your time in a lab at your college that utilizes stats, and data analysis. This is a great way to build experience early. Preferably one that uses excel, sql, python, R or maybe a stats tool like SPSS.
why i am not getting the age column. I tried putting this in my sql code- extract(YEAR FROM age('12-31-2022', pat.birthdate)) as age, but im getting a error. i dont know where I should plug it in my code
What is the full block of code you're trying to run? In the tutorial, I only use:
extract(month from age('2022-12-31',pat.birthdate)) >= 6
In the where clause. Since I only used it in the where clause, it doesn't show up as a column. But if you want to see it as a column, make sure that your formula is in the select clause. Also make sure that you're not missing a comma, (or there isn't a comma where there shouldn't be a comma).
I can better troubleshoot if you post the full code you're using though.
did you find out where to plug in ?
My total rows is exactly double of yours. I’m guess I imported in the data twice but how do I remove this now.
You can use a truncate command which will keep the table, but remove all the rows within it. Something like this:
TRUNCATE TABLE table1
Just be sure to do that with each of the tables (and replace 'table1' with the actual name of your tables').
Once you erase the data in the tables, you will just need to reimport the same data again.
My wife and I work as cancer registrars (collect and report cancer statistics), aspiring to become healthcare data analysts. I have an associate degree in cancer information management and am considering a bachelor's in Health Information Management. Is this the right bachelor's degree, or do you have other suggestions? Also, which specific certifications would boost our qualifications?
If you could build the perfect resume to become a healthcare data analyst, what would be on it? Specific degrees, certifications, portfolio, etc? We both are highly motivated and have the time and money to get any degree or certification.
Also, what do you think about the Certified Health Data Analyst (CHDA®)? Is this something worth pursuing? Or a waste of time?
What about a Master of Science in Health Informatics and Information Management or a Data Analytics Graduate Certificate or a Master of Science in Data Analytics? We are willing to go all the way! Please help! What do we choose???
Thank you for your insights.
Wow, lots of good questions here!
I'd recommend checking out a couple videos to get you started.
In this video, I talk about how I became a data analyst in healthcare, and what I would have done differently: ua-cam.com/video/s15X30BAfe8/v-deo.html
And in this video I provide a roadmap for becoming a data analyst in healthcare:
ua-cam.com/video/ime7OBlERiM/v-deo.html
In general, there's lots of majors that are well suited for a career in clinical analytics. LIke: Econ, Stats, Physics, Public Health, Business, Health Information Management. My top recommendation would be computer science, but Health Information Management is a solid choice too.
I wouldn't worry too much about a master's degree unless you really want to. They're really pricy though.
Regarding certifications, the only I can think of that would be valuable is Epic certification, which makes you more desireable as a data analyst, but you have to work at a hospital first that has Epic and sponsors that training to get certified. Otherwise, certifications are only useful for the trainings that they provide. They can't hurt to have, but I don't think they make a huge difference.
I'm working on a clinical analytics accelerator that might be the training you're looking for (hoping to enroll my first cohort in the Fall), so check this out if you haven't already: datawizardry.academy/accelerator/
Neither I nor my colleagues have CHDA. So I don't think it's worth persuing.
But there are patients who were active during '2021-01-01 00:00' and '2022-12-31 23:59' in the conditions table and they doesnt appear in Encounters table
so my question is when we are finding % of patients who took flu shot we have to consider patients from conditions table who were active right?
Hi @YDMCA, I want to make sure I understand the question correctly. Are you asking why we aren't also including patients with conditions between 2020-2022 (that did not have an encounter?)
1. The first thing to consider is, my code where I check the age > 6 Months if actually off. So that's one thing that could be throwing things off a little bit: the updated code is: (EXTRACT(EPOCH FROM age('2022-12-31',pat.birthdate)) / 2592000) > 6. Once I do that, the only patients I see that are not in my active patients query are patients that are either dead, or no more than 6 months old at the end of the time period, or didn't have any encounters past the year 2019. Patients that I don't consider active.
2. My definition for 'active' is, there must have been evidence of an encounter taking place within the time period of 2020-01-01 and 2022-12-31 (and not dead). If there wasn't an encounter, I don't consider them actively involved with the hospital.
#2 is the most important point. I'm not considering patients that are active to be patients that had some sort of active condition. I'm only interested in patients relevant to the hospital in some way where there is some evidence that they are still getting services from the hospital.
Think of it this way, let's say someone had a diagnosis of diabetes at our hospital in 2019, and their problem has no recorded resolution date. Let's also say that their last encounter with the hospital was 2019-05-01. That patient might have moved away (they might be living in the bahamas for the past few years!) in which case it would have been impossible to get them in for their flu shot.
Thanks so much for this very helpful tutorial.
I inserted the new code into my query and it returned the age column. I used this code:
SELECT patient, (EXTRACT(EPOCH FROM age('2022-12-31', pat.birthdate)) / 2592000) AS age
FROM encounters AS e
JOIN patients AS pat
ON e.patient = pat.id
WHERE start BETWEEN '2020-01-01 00:00' AND '2022-12-31 23:59'
AND pat.deathdate IS NULL
AND (EXTRACT(EPOCH FROM age('2022-12-31',pat.birthdate)) / 2592000) >= 6
However, when I run the whole query the age column doesn't still appear in the final table. What could be wrong? I'm looking forward to your reply. Thanks!@@DataWizardry
Never mind. I figured it out. The age column from the 'active_patients' CTE was not included in the final statement. The correct code is below:
SELECT pat.birthdate, pat.race, pat.county, pat.id, pat.first, pat.last,
flu.earliest_flu_shot_2022, flu.patient, active.age,
CASE WHEN flu.patient IS NOT NULL THEN 1
ELSE 0
END AS flu_shot_2022
FROM public.patients AS pat
LEFT JOIN flu_shot_2022 AS flu
ON pat.id = flu.patient
LEFT JOIN active_patients AS active
ON pat.id = active.patient
--CASE statement to create binary field to help calculate count
--and percentage of those who didn't get the flu shot
WHERE 1=1
AND pat.id IN (SELECT patient FROM active_patients);
Hi..am doing my masters in healthcare informatics.. and am dentist from india..im curious to know if u are giving any courses this month like Sql or tableau or power bi.
am in my 3rd sem now...and one sem left to complete my masters so i want to build my resume with some projects if u can help...thank u...will be waiting for ur reply ....have a beautiful day ahead.
I plan to record a intro to Tableau session probably sometime in the next 3-4 weeks.
But other than that, I'll be building a clinical analytics accelerator course. I hope to have that completed by sometime in the Fall.
Hey Josh, really noob question here - what's the difference between the query shown in 15:17 vs a query like this:
SELECT p.birthdate, p.race, p.county, p.id, p.first, p.last, MIN(i.date)
FROM patients p LEFT JOIN immunizations i ON p.id = i.patient
WHERE i.code = "5302" AND i.date BETWEEN "2022-01-01 00:00" AND "2022-12-31 23:59"
GROUP BY i.patient --edit: p.id doesn't work either
I'm asking because my query returned 8147 rows instead, and I suspect it's something to do with the GROUP BY statement being used in the temporary table first in your video, but a more in-depth explanation would really help me understand better! Thanks!
I have been trying to source the dataset used for this project but yet to find it. Please, can you help me with the link to it.
Hello, here is the link: datawizardry.academy/flu-shot-dashboard/
Just have to provide your email, and the lesson file will get sent to you.
That's for the Tableau lesson files only. If you want to get the actual data for SQL, that's in another tutorial here: datawizardry.academy/sql-basics-healthcare/
@@DataWizardry
Thank you. I did that earlier but the dataset I got was the one obtained from an already written SQL code. I need the one I will be able to execute the SQL query on. Thank you!
@@sullivandepoet7530Do you mean the code that is written in github? github.com/Data-Wizardry/UA-cam_Lessons/blob/main/Flu_shots.txt
@@DataWizardry
I mean the initial dataset on which the SQL query was executed.
in sql demo you haven't made age column and how can you come up with zipped data
In the SQL demo, we only had birthdate to work with. When you're working against a SQL table, it's unlikely that it's going to come with the age pre-calculated.
This is because there's different ways to calculate the age. You could calculate the age at the time that a patient received a vaccination, the age of a patient at the time that they had an encounter at a hospital, the age that a patient is now. etc.
Because there's different ways we can calculate age, usually only the birthdate is included in the raw tables; which means it's up to the person writing the query to plug in the start date (their birthdate) and an end date (for example: today's date, encounter date, vaccine date, etc.)
Does that make sense?
@@DataWizardry yes sir ive done it using age and extract function and sir is it allowed for me to include this to my own project?
Sure!
@@clayherz_ I found it extract(YEAR FROM age('12-31-2022', pat.birthdate)) as age,
@@thamoru extract(month from age('2022-12-31', pts.birthdate)) >= 6 here's mine
Hello, is SQL a requirement for healthcare data analyst
Most places will require this, yes. For a further breakdown of the skills you need, you can checkout: ua-cam.com/video/ime7OBlERiM/v-deo.html
missing the age part in the demo I was wondering why the if in tableau was acting funny
I got the same problem when doing the first calculated field. Followed the SQL portion and imported Immunization Demo. I saw you commented extract(YEAR FROM age('12-31-2022', pat.birthdate)) as age,
Should I just run this query in SQL and then import the new file to tableau?
too many errors , not enough structure back to back CTE's ..... no foundational sql to build you up. thumbs down but i still like your channel.
hey Josh where did you plug in the code "extract(YEAR FROM age('12-31-2022', pat.birthdate)) as age"
Nevermind I found it extract(YEAR FROM age('12-31-2022', pat.birthdate)) as age,
Cna you show me how you implemented this into your code ?
Where*?
@@twntysevn727 /*==============================================================================================================
Notes:
================================================================================================================
The code uses Common Table Expressions (CTEs) to create two temporary tables (active_patients and flu_shot_2022)
that are used in the final SELECT statement. The active_patients CTE selects patients who had encounters within
a specified time range, are not marked as deceased, and whose age is greater than or equal to 6 months old at
the end of 2022.
The flu_shot_2022 CTE selects the earliest flu shot date in 2022 for each patient with the specified
immunization code. The main SELECT statement combines patient information from the patients table with flu shot
information from the flu_shot_2022 CTE. It calculates the patient's age, and if the patient has received a flu
shot in 2022, it assigns a value of 1; otherwise, it assigns 0.
The LEFT JOIN ensures that all patients are included in the result, even if they did not receive a flu shot in
2022. The IN clause filters the patients to those who are present in the active_patients CTE.
Overall, the query retrieves patient demographics, calculates their age, and indicates whether they received a
flu shot in 2022, based on the provided criteria.
==============================================================================================================*/
-- This part creates a temporary table (CTE) named "active_patients"
WITH active_patients AS
(
-- It selects distinct patient IDs for patients who:
SELECT distinct patient
FROM encounters AS enc
JOIN patients AS pat
ON enc.patient = pat.id
WHERE start BETWEEN '2020-01-01 00:00' AND '2022-12-31 23:59'
AND pat.deathdate IS null
AND EXTRACT(MONTH FROM age('2022-12-31',pat.birthdate)) >= 6
),
-- This part creates another temporary table (CTE) named "flu_shot_2022"
flu_shot_2022 AS
(
-- It selects the earliest flu shot date in 2022 for each patient
SELECT
patient
,MIN(date) AS earliest_flu_shot_2022
FROM immunizations
WHERE code = 5302
AND date BETWEEN '2022-01-01 00:00' AND '2022-12-31 23:59'
GROUP BY patient
)
-- The final SELECT statement retrieves patient information along with flu shot details
SELECT DISTINCT
EXTRACT(YEAR FROM age('12-31-2022', pat.birthdate)) AS age
,pat.id
,pat.first
,pat.last
,pat.birthdate
,pat.race
,pat.county
,flu.earliest_flu_shot_2022
,CASE WHEN flu.patient IS NOT null THEN 1 ELSE 0 END AS flu_shot_2022
FROM patients AS pat
LEFT JOIN flu_shot_2022 AS flu
ON pat.id = flu.patient
WHERE 1=1
AND pat.id IN (SELECT patient FROM active_patients);
where did you find it? i cant figure out where to put it that in my code
where did you plug it in ?