Optimize is a perfomance optimization technique available in delta lake table. Whenever we perform any kind of DML operations in Delta table each and every time it generates a new records. Over a period of time it generates a huge number of small files and it is kind of overhead for the delta engine to effectively perform the execution of our query as it eventually increases our resource usage such as i/o read/write and the computaion. Optimize command helps us to combine these small files to a larger file which eventually improves the performance of the delta table. As after this optimize operation, delta table refers to this latest snapshot file in order to retrieve results whever we query our table. And we can delete the obselete small files and free up the space with the help of vaccum command.
Check this. create table oldest_youngest(person varchar(10),type varchar(20),age int); insert into oldest_youngest values ('A1','ADULT',54), ('A2','ADULT',53), ('A3','ADULT',52), ('A4','ADULT',58), ('A5','ADULT',54), ('C1','CHILD',20), ('C2','CHILD',19), ('C3','CHILD',22), ('C4','CHILD',15); WITH ranked_adult AS ( SELECT person as adult, ROW_NUMBER() OVER(ORDER BY age desc) as r_a FROM oldest_youngest where type = 'ADULT' ), ranked_child as ( SELECT person as child, ROW_NUMBER() OVER(ORDER BY age asc) as r_c FROM oldest_youngest where type = 'child' ) SELECT adult,child FROM ranked_adult a left join ranked_child c on a.r_a = c.r_c
create table oldest_youngest(person varchar(10),type varchar(20),age int); insert into oldest_youngest values ('A1','ADULT',54), ('A2','ADULT',53), ('A3','ADULT',52), ('A4','ADULT',58), ('A5','ADULT',54), ('C1','CHILD',20), ('C2','CHILD',19), ('C3','CHILD',22), ('C4','CHILD',15); WITH ranked_adult AS ( SELECT person as adult, ROW_NUMBER() OVER(ORDER BY age desc) as r_a FROM oldest_youngest where type = 'ADULT' ), ranked_child as ( SELECT person as child, ROW_NUMBER() OVER(ORDER BY age asc) as r_c FROM oldest_youngest where type = 'child' ) SELECT adult,child FROM ranked_adult a left join ranked_child c on a.r_a = c.r_c
Optimize is a perfomance optimization technique available in delta lake table. Whenever we perform any kind of DML operations in Delta table each and every time it generates a new records. Over a period of time it generates a huge number of small files and it is kind of overhead for the delta engine to effectively perform the execution of our query as it eventually increases our resource usage such as i/o read/write and the computaion. Optimize command helps us to combine these small files to a larger file which eventually improves the performance of the delta table. As after this optimize operation, delta table refers to this latest snapshot file in order to retrieve results whever we query our table. And we can delete the obselete small files and free up the space with the help of vaccum command.
Thank you Sumit sir for arranging such a insightful session..
Always happy to help the community!
Thank u sumit sir
Hello sir, i have interview for big data/Etl developer at amazon please guide me for that.
Sir, need one video to know how bussiness requirement is, and how data engg gets the bussiness requirement and working strategy
Noted. Will have a session around this aspect!
I would like to add to this. If I am understanding correctly, @user_j% is talking about questions like Design Yelp but from a Database perspective.
Is it the join is correct or we can go for left join
Less questions from Azure Synapse....
Tough sql question, 😭
Check this.
create table oldest_youngest(person varchar(10),type varchar(20),age int);
insert into oldest_youngest values
('A1','ADULT',54),
('A2','ADULT',53),
('A3','ADULT',52),
('A4','ADULT',58),
('A5','ADULT',54),
('C1','CHILD',20),
('C2','CHILD',19),
('C3','CHILD',22),
('C4','CHILD',15);
WITH ranked_adult AS
(
SELECT person as adult, ROW_NUMBER() OVER(ORDER BY age desc) as r_a FROM oldest_youngest where type = 'ADULT'
),
ranked_child as
(
SELECT person as child, ROW_NUMBER() OVER(ORDER BY age asc) as r_c FROM oldest_youngest where type = 'child'
)
SELECT adult,child FROM
ranked_adult a
left join
ranked_child c
on a.r_a = c.r_c
create table oldest_youngest(person varchar(10),type varchar(20),age int);
insert into oldest_youngest values
('A1','ADULT',54),
('A2','ADULT',53),
('A3','ADULT',52),
('A4','ADULT',58),
('A5','ADULT',54),
('C1','CHILD',20),
('C2','CHILD',19),
('C3','CHILD',22),
('C4','CHILD',15);
WITH ranked_adult AS
(
SELECT person as adult, ROW_NUMBER() OVER(ORDER BY age desc) as r_a FROM oldest_youngest where type = 'ADULT'
),
ranked_child as
(
SELECT person as child, ROW_NUMBER() OVER(ORDER BY age asc) as r_c FROM oldest_youngest where type = 'child'
)
SELECT adult,child FROM
ranked_adult a
left join
ranked_child c
on a.r_a = c.r_c