*Timestamps* 0:00 Intro (Keith Adams) 1:26 Agenda 2:37 Scale and Style 6:18 Architecture Overview 8:19 Logging in 14:11 MySQL Shards 18:18 RTM.start payload 19:51 Message server 24:32 Deferring Work 27:39 Putting it all together 31:41 Challenges 32:59 Challenge 1: Mains Failure 34:14 Challenge 2: Rtm.start for large teams 35:02 Challenge 3: Mass reconnects Plan of attacks 35:50 Scale-out mains 37:04 Rtm.start for large teams 38:33 Mass reconnects 41:45 Stuff left out 43:19 Wrapping up 44:20 Questions
Very interesting talk. Usually you hear people from huge established companies discussing massive arch migrations, or brand new startups showing off their flashy new stack. This was a very humble, grounded talk on what it’s like to work in a real system architecture-what works for them, and what could be improved as they scale.
Slack in 2016 was a pretty new app, and I completely respect the devs decisions to favor faster deployment. I have experience with a product which is almost 10 yr+ old and - we are still scrambling for DB alerts.
I realised after 4 years of experience working in software field, I still don't have basic knowledge of making a high scalable app. Great video, thanks for sharing this great knowledge
active-active can run into collisions when same data is updated in both parallely. Think your profile updated from phone and laptop simultaneously and one gets updated on Master1 and other on Master2. which one is the correct one? Time stamps could solve these collisions but manual intervention is needed sometimes there too. To solve this I think they have partitioned the queries (based on team id Keith says, but I think it would be more complex ) such that certain group goto Master 1 only and nver to master 2 , and vice-versa. Avoiding conflicts.
@Karan A Yeah it made no sense if the parition was just based on teamId because in that profile example the teamId's should be the same and so both requests go to the same master without introducing any conflict...
@12:19 what does it mean when he mentioned the time to the shard is arbitrary? It means it just randomly picks up a shard and record it? What's the benefit of doing in this way vs. consistent hashing?
search "control and user plane separation" on google. it has many benefits. efficiency, security, isolation, capacity, scaling, easy managing, maintenance, failure-handling, etc.
Also likely to handle live websocket connections. You can only support 65k connections per server so scaling 80 full blown web apps just to handle dumb websocket connections may be overkill.
Their team was experienced with PHP. Time to market is key. Everything else is overrated for a start up. Now they can do whatever they want since they are successful.
It's not a poor design. It's a design that allowed Slack to scale from a small start-up to one of the largest and best communication platforms. It's a very successful design. Good architecture and good design isn't one that contains all the hip buzzwords. Good designs are often very boring, and that's ok. That means an adult is in charge of the operations, not the hippies who sprinkled the latest immature fad and then leave the mess to someone else. The result is what matters, Slack is a very slick communication platform. Its architecture design may not contain sexy buzzwords, but it is what allows it to outshine all the other competitors who have buzzwords-driven design. That's not a "poor design".
It’s a very real design. You wouldn’t pass a system design interview with this, but it is what you’ll likely find on Day 1 of your new job. It’s nice to know the design decisions of real systems and what could be done better.
...learning about the architecture of famous, successful companies making bags of money isn't applicable to system design interviews? Boring is *good*. I give system design interviews. I give massive points for boring, easy to operate, and easy to understand.
Who do we have here? Someone who's too smart for our entire era. The title clearly says how Slack works or at least worked at the time this video was made. Now, since you found this video boring, assuming after coming here yourself, I wouldn't ever want to hire you adding already to the fact that you watched this video for preparing for a job interview? For system design? The whole idea of system design is that it may be different for every product, and you need to build one depending on what resources you have, what your timeline is and most importantly, what you really know for sure, say the fundamentals. If there was a standard and a fixed path, I'm sure this video wouldn't even exist. You wasted 20 minutes, but to everyone a favor and don't take any interviews. You wouldn't want to waste their time too.
He didn't go in very deep, I'll give you that, but if you really found it boring, you'd not be trolling UA-cam trying to win SD interviews. Let's chalk it up to the fact that you didn't hear some fancy words you might have expected, like Kafka, Cassandra, leaderless replication, and consistent hashing.
*Timestamps*
0:00 Intro (Keith Adams)
1:26 Agenda
2:37 Scale and Style
6:18 Architecture Overview
8:19 Logging in
14:11 MySQL Shards
18:18 RTM.start payload
19:51 Message server
24:32 Deferring Work
27:39 Putting it all together
31:41 Challenges
32:59 Challenge 1: Mains Failure
34:14 Challenge 2: Rtm.start for large teams
35:02 Challenge 3: Mass reconnects
Plan of attacks
35:50 Scale-out mains
37:04 Rtm.start for large teams
38:33 Mass reconnects
41:45 Stuff left out
43:19 Wrapping up
44:20 Questions
🐐
I love how honest this guy is
which part
Very interesting talk. Usually you hear people from huge established companies discussing massive arch migrations, or brand new startups showing off their flashy new stack.
This was a very humble, grounded talk on what it’s like to work in a real system architecture-what works for them, and what could be improved as they scale.
I love how articulate he is. He comes across as a smart guy. I'd like to work with him.
Handling great scale with simplicity in design. Excellent !!
Slack in 2016 was a pretty new app, and I completely respect the devs decisions to favor faster deployment. I have experience with a product which is almost 10 yr+ old and - we are still scrambling for DB alerts.
I feel with you. We have a large SharePoint Farm solution that is ~10 years old we work on
I realised after 4 years of experience working in software field, I still don't have basic knowledge of making a high scalable app. Great video, thanks for sharing this great knowledge
SUPER insightful and detailed deep dive. thanks for sharing!
At 18:00, can someone help me understand what he means by the left and right heads, writing left and writing right?
active-active can run into collisions when same data is updated in both parallely. Think your profile updated from phone and laptop simultaneously and one gets updated on Master1 and other on Master2. which one is the correct one? Time stamps could solve these collisions but manual intervention is needed sometimes there too. To solve this I think they have partitioned the queries (based on team id Keith says, but I think it would be more complex ) such that certain group goto Master 1 only and nver to master 2 , and vice-versa. Avoiding conflicts.
@Karan A Yeah it made no sense if the parition was just based on teamId because in that profile example the teamId's should be the same and so both requests go to the same master without introducing any conflict...
@12:19 what does it mean when he mentioned the time to the shard is arbitrary? It means it just randomly picks up a shard and record it? What's the benefit of doing in this way vs. consistent hashing?
Maybe due to the thundering hoard effect.
I just wish the "Microsoft Teams" would watch this and learn to not suck
There is nothing great about this design btw.
@@arunsatyarth9097 No need a fancy architecture if the classic one get the job done and still easier to maintain.
@@lattelover7186 I didnt say the word "fancy". I just said there is nothing in particular here for Teams folks to look at and learn as OP suggests.
@@lattelover7186 "easier to maintain" on a 1M loc monolith php app...
Do you even code?
this may be a dumb question but why is there a message server in the first place? why a separate service?
search "control and user plane separation" on google. it has many benefits. efficiency, security, isolation, capacity, scaling, easy managing, maintenance, failure-handling, etc.
I will not be surprised that it wasn't easy to support websockets/two way communication in LAMP at that time and this was easiest/fastest solution
Also likely to handle live websocket connections.
You can only support 65k connections per server so scaling 80 full blown web apps just to handle dumb websocket connections may be overkill.
Nice
40 times a day of code push 😮
too honest.
a PHP Monolith?
Slack is a really great piece of Software but those design decisions though..
Their team was experienced with PHP. Time to market is key. Everything else is overrated for a start up. Now they can do whatever they want since they are successful.
Discord is best team app
Would prefer to use Discord, but beggars can't always be choosers. Oh well.
ahh GDRP
GRPC or GDPR?
Slack is slow as sh*t great job keith
Very poor design...
What's poor about it?
It's not a poor design. It's a design that allowed Slack to scale from a small start-up to one of the largest and best communication platforms. It's a very successful design.
Good architecture and good design isn't one that contains all the hip buzzwords. Good designs are often very boring, and that's ok. That means an adult is in charge of the operations, not the hippies who sprinkled the latest immature fad and then leave the mess to someone else.
The result is what matters, Slack is a very slick communication platform. Its architecture design may not contain sexy buzzwords, but it is what allows it to outshine all the other competitors who have buzzwords-driven design. That's not a "poor design".
It’s a very real design. You wouldn’t pass a system design interview with this, but it is what you’ll likely find on Day 1 of your new job. It’s nice to know the design decisions of real systems and what could be done better.
Poor presentation. He shows a slide and talks about something else.
Sweet lord, this is the most SV-esque mumble job ever.
It works for them and obviously a successful product but God! what a boring architecture! I wouldn't want to work there.
Flock is better
Lol is slack still even a thing or in business? Pretty sure the markets owned by Microsoft teams now.
too boring, nothing much to learn for people preparing for system design interviews. I wasted 20 minutes, don't do that mistake.
...learning about the architecture of famous, successful companies making bags of money isn't applicable to system design interviews? Boring is *good*. I give system design interviews. I give massive points for boring, easy to operate, and easy to understand.
@@zbbentley and cheap total cost of ownership. ie cheap to build and maintain
Who do we have here? Someone who's too smart for our entire era. The title clearly says how Slack works or at least worked at the time this video was made.
Now, since you found this video boring, assuming after coming here yourself, I wouldn't ever want to hire you adding already to the fact that you watched this video for preparing for a job interview? For system design? The whole idea of system design is that it may be different for every product, and you need to build one depending on what resources you have, what your timeline is and most importantly, what you really know for sure, say the fundamentals. If there was a standard and a fixed path, I'm sure this video wouldn't even exist.
You wasted 20 minutes, but to everyone a favor and don't take any interviews. You wouldn't want to waste their time too.
Point is if u see this design u should have enuf knowledge to figure out the flaws and improve it on ur own
He didn't go in very deep, I'll give you that, but if you really found it boring, you'd not be trolling UA-cam trying to win SD interviews. Let's chalk it up to the fact that you didn't hear some fancy words you might have expected, like Kafka, Cassandra, leaderless replication, and consistent hashing.