I Tried Sorting Pixels
Вставка
- Опубліковано 25 бер 2023
- Pixel sorting has been a niche data moshing technique in the generative art community for many years, but an efficient real time pixel sorting shader for games has never been made for a few reasons. Can I solve the puzzle and invent one?
Topics covered include: GPU sorting algorithms (the parallel bitonic merge sort), basic shader optimizations and advanced shader optimizations regarding memory use.
Download my GShade shader pack!
github.com/GarrettGunnell/Ace...
Support me on Patreon!
/ acerola_t
Socials:
Twitter: / acerola_t
Twitch: / acerola_t
Discord: / discord
Code: github.com/GarrettGunnell/Pix...
Music:
During The Test - Persona 3 OST
Afternoon Break - Persona 3 OST
This Mysterious Feeling - Persona 3 OST
Fearful Experience - Persona 3 OST
Just Like This - Persona 3 OST
Iwatodai Dorm - Persona 3 OST
Soft Oversight - Sonny Boy OST
Judgement - Sonny Boy OST
GO!GO!STYLE - Paradise Killer OST
Midori Eyes - Paradise Killer OST
Dead End Chaos Theatre - Earthbound OST
Vehicle Handling - Persona 2 Innocent Sin OST
Police Station - Persona OST
Thanks for watching!
This video is dedicated to my friend, Alotryx.
#acerola #graphics #gamedev #unity3d #graphics #shaders - Наука та технологія
the 30 second parallel bitonic merge segment took me 15 hours to edit
anyways what was your favorite cat clip
the cat one definitely
I actually wanted to listen to your optimization explanation cuz optimizing is something I struggle with immensely but your cat clips were too distracting :(
the one with the cat
gotta be cat clip #2
ill watch it again if it makes you feel better
Graphics Programmers: My program renders an image for 3 days, but I was able to shave off 4.3 hours, I'm so good at it
Real-Time Graphics Programmers: I just wasted quarter of a millisecond, it was reserved for pathfinding calculations, my managers is going to kill me
“My optimized bogosort only takes one billion years to finish sorting now, down from 1.1 billion, it’s so much faster”
@@Numbabuto be fair that is 100 million years less than before
@@Numbabu the optimization? remembering not to choose the same random numbers as last time
@@Numbabu bogo sort lowkey underrated, literally the fastest sorting algorithm if you're lucky enough
@@senzmaki4890 that's like saying "casting fireball on dnd is underrated, it creates a portal to the hell realm if you're lucky"
In art class, I opened my photos in notepad++ and wrote some stories. My teacher was shockingly confused how I'd intentionally altered a pic to make it "glitched." They're just numbers in an order, and I changed it.
niice
did you just straight up type into it?
Like Counter: 40 (in case yt hides it)
A am wondering what that looks like. I'll have to try that out at some point.
I did that too, way to go you can really get a variety of effects depending on type of file, bmp, png, jpeg, propably a bunch more interesting lossy and lossless compression algorithms, but jpg is really good if you want a lossy compression artifacts, and for lossless compression you can introduce major glitches into a png file, also you can mess with pure non compressed bitmap if you want more precise glitch manipulation, try them all out and remember that there are different types of .bmp to choose from
@@Cyberfishofant Yep, you can open an image in a text editor and just start typing. Or even manually write an image, but most formats are compressed.
I read the title of the video as “I tried snorting pixels” - so you’re welcome for your next video idea.
They make me feel like... So digital... Like fingers, yo!
I almost read the channel name as areola
I would like to point out that the effect would be great for gameplay if it were only applied to certain objects in the game, via a two-color buffer that shows which areas are allowed to be affected by the effect before the contrast map is applied to those areas, allowing the effect to be limited to certain portions of the image.
the uses would be awesome.
you could have a character's skin glitching out, but their clothes stay stable, you could have a sword that leaves behind a glitchy trail in the air, and so on and so forth.
A game mechanic that allows you to interact with "glitched" objects in a specific way would be pretty damn cool, it'd definitely be an eye-catching indicator.
considering it has to sort less pixels too, if the screen isnt covered in areas to pixel sort im sure it'd get to 2 ms or less
I'm not gonna retain any of this information, but I always feel like acerola got to learn so much in making each video! This was really interesting to watch!
just yesterday Andreas Kling in his 1000th video talked about consumption patterns and how he found it hard to retain information from watching tech videos himself. he mentioned trying taking notes about the videos and how that led him to being more selective in what to watch. not sure I'd ever have the discipline but found it interesting anyway.
as a 3D graphics student, this comment is every day of my life lmao
Me neither, especially because I was 100% focused on the cats
the cat cams were the best parts
I would love to see a game made with this filter. something like psychological or surreal horror. I think this filter paired with good sound design and a decent art style would make for a really mind bending and fun game
I think he needs to blur the final stage a bit, but interesting..
cyberpunk isnt having it?
cyberpunk kind of implemented it when the relic effects affected the player.
Not a horror game but Splatter uses effects like this very often
Cyberpunk is your game if you like existential horror
Came for the programming knowledge, stayed for the reminders that the monogatari series live forever in my heart
i have never and likely will never need to create graphics shaders, yet these videos are so endlessly entertaining and informative that i can't stop watching
i love learning about other programming specialties through vids like this
But because of this vid, you were there to witness how someone else was! And it was *so* cool
hello fellow void pfp
I mean, cat videos, right?
i am no longer void pfp
Babe, wake up. New Acerola video just dropped
Ok honey
@@lolcat69 😘😘😘
Just 5 more minutes ok
Babe, wake up, this comment trend started years ago
Im so fucking sick of seeing this comment trend
I thought the title said snorting pixels
No Comments?
@@RivertedYT-1 comment? (start a chain dont break)
@@zoranradakovic2199 chain dont break
new coke flavor (i broke the chain, watchu gonna do about it?)
THATS WHY I CLICKED ON THE VIDEO, LIKE WTF IS PIXELS
The editing is very spesifically cultured and I now want more.
In animation, PixelSorters are a great way to create scene transitions with this glitch aestethic! Ist awesome to see how they work
What is "black scene"'s importance
@@kindauncoolI don't that think it is anything of importance.
@@kindauncool not sure if you still care, but it's a reference to the monogatari anime series
I'm pretty new to shader programming and have no idea of 90% of the shader optimizations you were talking about, but just so I didn't feel dumb, I covered the cat videos with my hands in order to avoid getting distracted!
Also, instead of having groups that have already finished a thread wait for the others to finish, can't you divide the spans in 2s (two spans a thread) so you can roughly get them 2 spans done by one group at the same time as others? It may sound weird because I'm probably mixing up some of these terms
That would make the worst case be twice the amount of time that it is now(bc you have no way to prevent a group of 2 spans to not be 2 worst-case ones). Right now we have the same amount of active groups than there are spans in the image. Reducing the number of groups is not a thing that's going to speed the whole thing up bc GPUs can easily do thousands of things in parallel. On a CPU implementation, this would be a better approach, bc CPUs can't do that many threads in parallel, so there'd be a queue of spans to still be processed.
@@sephdebusser So it essentially positively double the best case's performance and negatively doubles the worst case's performance?
@@dotdotmod no, bc best case scenario of the original is still the time of one group processing a single small span. In your case, the best case is one group processing two small spans. Still double the time
The trouble is knowing you need to do that, because it's the CPU that starts threads, but the GPU that knows how many threads are needed.
It is doable, I'm pretty sure, but I think a better approach would be to try to figure out how to apply quicksort, O(n log n) is a lot better than O(n²). I think you could maybe pull it off by coloring the entire span in the thread mask with a span id instead of the start index having the length? Damn, now I want to try and figure this out.
@@SimonBuchanNz "Damn, now I want to try and figure this out." This video is such a nerd snipe, yeah.
I didn't understand 95% of the video.. but i feel smarter somehow.
I'm a video editor who loves glitchy effects and learning about how pixel sorting actually works has been very entertaining. I love how the post-processing workflow is so similar to doing VFX and colour correction too lol
how would you implement this as a sort of Adobe or AE plugin? is it possible? Pleas get back to me
There’s another one called PixSort as well
Actually one of the most beginner friendly descriptions of Compute Shaders i have ever heard.
You should do more with them. I had a hard time when learning Compute Shader Concepts the first time.
I think there IS an improvement on the shader code to be made: If your sortvalue buffer is of a known data type (e.g. uint8), I think you can use a radix sort - which should be a lot faster than your current alg.
it would still be single threaded radix sort which would be a yikes but I should try it yeah
@@Acerola_t radix sort is like O(3n) space and O(7n) time, we won't even need to transfer the buffer to group memory as each value in the buffer is accessed only once
Its only problem is that its slower than the current sorting alg for small spans.
@@Sloimay why not both then. The control mask already has the span lengths!
Each group can decide it's preferred algo (AFAIK doing this means that you should not even try to do more than one thread per group, GPUs are SIMT machines, right?)
@@IgnacioLosiggio imo probably not worth the effort. As long as the short spans don't take longer than the long spans, it doesn't help to optimize them.
There exist efficient `stable_sort_by_key` algorithms for the GPU.
The solution here would be to sort the original pixels using a key.
We can calculate it by first running a prefix sum over the mask, giving us 'spans' each filled with a unique index.
Multiply the prefix_sum by the maximum value that will be used in the sort (ensures a pixel can't 'escape' it's span).
Then, calculate the sort key using (original value * mask + prefix_sum).
All the masked-out values will have the exact same key within a span. All the masked in values will retain the same local delta within a span. Each span's maximum value is guaranteed to be less than the following spans minimum value.
Using a stable sort ensures that although masked out values have the same key value, their order doesn't change.
7:12 you can't keep teasing me with that tiger
13:42
Mr. Rola, as in, short for acerola.
I legit fucking died here, omg.
As an engineer with over a decade of experience in a completely different part of the field (distributed systems), graphics and shader programming has always felt like magic to me. Very cool to see it broken down like this.
Oh boy time for another sorting rabbit hole.
Also having Nekopara as the associated visual for a "video game" just killed me every time.
If the parallel bitonic merge sort was fast enough you can also use it in the span sorting case: Do a first pass to assign to each pixel the index of its corresponding span, and then sort the entire column lexicographically first by span index. Lexicographical sorting can be accomplished by putting span index into some bits higher than the highest data bits of the sort keys.
17:09 In the end, the real time pixel sorter is the friends we made along the way.
1:40 I wanted history tho ;-;
Same
You would get an effect that looks very similar if you simply set a hard number of samples per span. Because they're sorted they tend to look like gradients, so even a simple min and max of a span would look very similar.
you may be able to speed up the algorithm by using the parallel prefix sums algorithm to calculate the span of each row. If you have enough compute cores, it can drop the time for creating this mask from O(n) to O(logn). Also, once you build the span mask using prefix sums, there are some handy parallel sorting algorithms that should let you divide up the sorting of each sub section without having to ask the CPU for guidance. If you want I can see if I can find my psudocode for this (im pretty sure we solved basically this problem in my parallel algorithms class), but I havent worked much with shaders, so your on your own translating it
heya, this is crazy interesting.
do you have the source?
@ianI think cs.wmich.edu/gupta/teaching/cs5260/5260Sp15web/lectureNotes/thm14%20-%20parallel%20prefix%20from%20Ottman.pdf covers prefix sums well and maybe www.dcc.fc.up.pt/~ricroc/aulas/1516/cp/apontamentos/slides_sorting.pdf for sorting?
Hey! I like the fact that you spend time explaining us the history behind it all!!!
I agree. I also thought it was interesting that a woman was an early pioneer of sorting algorithms. The world was a lot more sexist in the 50s, so it's pretty impressive that she made the contributions that she did.
@@strongmungus Except that it really isn't that surprising when you know women ended up working in the early programming profession a lot, as back then it was kind of an extension of a data entry job or similar. Then it transitioned into an engineering job, which in turn was more popular among men.
13:10 where tf did my cats go >:(
FINALLY THERE'S A VIDEO ON THIS!!!!!! LET'S GOOOOOOOO!!!!!!!!!
this effect is such a banger. they should implement it into every competitive game ever made.
14:43 waittt i was literally thinking "hey this looks like a serial experiments lain edit" this ENTIRE time
10:23 no way💀 bro really pulled the subway surfers and family guy combo on us
Thank you for providing a full, in-depth explanation of how everything works. Graphics programming is a niche field and basic theory is hard to come by when searching the internet.
I think they used an effect like this in cyberpunk. It would actually be useful for small segments of play in a sci-fi game where the matrix is glitching.
Your videos are amazing. You’re good enough at coding to talk about it in a easy to understand way, you structure your videos in easy to digest and entertaining way, and your editing complements it all perfectly.
Thank you! This was so informative and easy to understand. Can’t wait to see what else you have planned.
they did sorting on a pixel
indeed
you're so professional for displaying the other videos for each technique you applied toward the end of the video
I read the title as “I tried snorting pixels” and thought this was going to be a trip report on some new compound
You won me at the moon album cover. Great video and amazing explanation. I like a lot the way you explain aspects in an academic manner while still keeping it funny and entertaining. Great video and great skills!
Idea for making it possible to parallelize the pixel sort algorithm (keep in mind I have no idea what I'm talking about):
Instead of generating a texture with the start of each span encoded by the position of pixels, generate a texture where the value of each pixel in a column represents what segment of the column it's pixel is in. Say there are two spans in a particular column, then the values of pixels in the texture from top to bottom would be a block of pixels with value 1, then for the extent of the first span, pixels would have a value of 2, then a value of 3 between the spans, 4 within the bounds of the second span, and 5 to the end. Also, if the first pixel is part of a span, then the first block of pixels should be even, so it can start at 0, to maintain the relationship that even numbers represent pixels within a span, and odd numbers represent pixels outside a span.
Then during the sort phase, instead of using a thread for every pixel, use a group of n/2 threads for every column of pixels, where n is the height of the columns. Then sort the entire column using the parallel bitonic merge sort algorithm, except make sure to first multiply the pixels sort value by 255 times the value from the corresponding texture location, or use a second comparison between the two texture values. Either way, the sorting algorithm will sort the entire column, and the increasing index will prevent mixing of spans and inter-span regions by making each region's values all greater than the previous region's values, and all smaller than the next region's values, or else achieve the same by some other logic. Then, either within the sort logic, or on a separate pass, take the original (unsorted, or just don't swap in the sort algorithm if both tex values are odd) value for pixels with an odd numbered texture value.
So, in the end, the CPU can dispatch the same number of thread to every column of the image regardless of the number of spans in each column, and you can do the parallel sort algorithm instead of single threading them.
Would love to see Acerola try this. I saw someone else also suggest this same idea after reading all 300-something comments as of now.
For someone who has no idea what they're talking about, this is pretty spot-on to what I was going to suggest as a graphics programmer.
You would probably keep the ranges that don't want sorting by simply modifying the swap logic, so that either even or odd spans (depending on code design and/or artistic choice) evaluate as their pixel position instead of value (easiest way to preserve position). This would be preferable to a second access since you're already loading the texture and writing anyway, so copying the original values in would be less efficient by adding more accesses.
is parallel bitonic an in-place sort algorithm? If so, i'd not worry about spans per-se but just play around with functions you AND with the comparison logic to see what happens. Maybe only swap two values if their luminance has the same first 3 bits or if their original positions are within n pixels of each other.
I think they may have used pixel sorting shaders to do some of the effects in Cyberpunk 2077. The effect at 16:15 looks really similar to the effect when you're on the Net or while viewing the edges of braindances
Looks like the tarot too
I like how simple yet in depth the explanations were. Im at university wanting to do this and thankfully this was easy enough for me to understand but no so simple to know without any context. Thanks !
genious, i used to work a lot with pixel sortring processing scripts back in the day but this is kind of a game changer, to run it on the GPU in realtime is quite a challenge you overcame, love the content and thanks for sharing the code!
Never did graphics programming, but what about the following solution?
Calculate for each pixel whether it is inside a span and if so, what span number.
Then use the bitonic sort with the following comparator:
If pixels are both in a span and have the same span number then compare them by value. Otherwise, compare them by index.
That was my thought as well! We could even combine the process of computing the simplified sorting value into span marking to avoid excess data writing.
@@donaldhobson8873 right
I haven't seen the video yet but can't you just get the RGB average of every pixel and then just sort that
Would love to see a follow up with this approach. Someone else suggested this in another comment too after reading through all 300-something
Incorporating this into a game itself instead of as a filter would be interesting. Imagine if you're playing as a character who's had a memory-altering implant for something like schizophrenia, and slowly start to see phantoms appear from a malfunction of the memory device.
The character could slowly begin to realize the device was hiding the true reality all along, and fight to discern what the world is really like beyond the deep-fried digital veil. There could be quests exploring previously invisible areas, and a neverending struggle to balance the mental illness and the glitchy device.
a scanner darkly?
13:36 I was already enjoying the video, and then the MGMT reference made it even better
The effect on still images was super cool, nice work!
Love your videos! One of the few channels I have notifications for, consistently interesting and entertaining. ❤
I really dig this aesthetic that you created from this. It's really a awesome vibe. The amount of creativity that could be utilized by this is, is huge. Especially with Analog Horror, and other Avant guard type art styles.
I'm sorry, but this kind of algorithmic manipulation of discrete image pixels is exactly the opposite of analog horror. It literally could not be more digital. Like I know this is a nitpick, but not everything glitchy is analog horror!
Awesome explanation and visualization of bitonic merge sort!
you're always talking about so much topics that i never would have thought for, i used some of those ideas for my own projects
Great video. All it needs is a quick historical recap of all sorting algorithms
I absolutely love the video and the idea. The one thing I have to ask for the next one is a short explanation at the start with example images. I wasn't sure what you were going for with the pixel sorting, since I thought you would do so over the entire image, thus rendering it entirely unrecognizable.
It wasn't until 4:42 that I finally understood what the goal was.
Honestly blows my mind seeing how effects like these work behind the scenes. Loved playing around in after affects using datamosh 2 and ae pixel sorter 2, but to see a pixel sorter in real time in a game engine while also being open source is mind blowing.
Been recently been trying to create a little suite of shaders in unity of the majority of effects that I use in After Effects, the main ones being datamoshing, pixel sorting and ae's colorama effect (like a custom heat map effect, annoying me atm 💀). This pixel sorting video and the rest of your series of creating these custom shaders for FF14 has given me a hell of a lot of motivation and i cant thank you enough for that. 🙏
I think it would be really interesting to separate the chroma and luminance channels, and sort only one of them before recombining.
I've seen similar looking effects in some glitch art communities, both in image and video, but never in a live rendering shader! Very cool video!
I was so damn happy watching this, I never thought I'd see an in depth exploration of my niche interest like this! I do Glitch Photography and I'm so passionate; I almost cried when you mentioned Kim Asendorf!
i love the editing and the way you explained this topic...
Your content is incredible - subbed!
As much as I love your cat, it's getting in the way of all the interesting graphics programming optimization visuals.
I crack up completely from these videos. Love how you manage to keep it entertaining while still being so informative.
Considering you set a cap on the span length anyway, would it make sense to make an indirect dispatch with one threadgroup per span to bitonic sort them instead of just running one thread per span?
In the absolute worst case yes since you know the whole image is being sorted but like you could just do a full bitonic merge sort if you want to sort the whole image where as I wanted to only sort the spans in the mask, which the cpu can't optimally dispatch groups for.
I may have missed the point, but my idea wasn’t that you would dispatch the groups from the cpu. You would do an indirect dispatch based on what you get from the pass that generates the span-mask you had. Instead of writing that mask, you could count the number of spans (using some atomic counter) and store that in an indirect-buffer. For each unique span you also store the start pixel and the number of pixels that span covers. Then you can just dispatch the number of groups (one per span) indirectly using the indirect buffer. So the cpu does not need to know anything about spans, and that data is kept on the GPU :)
Are there reasons that would not work?
I was looking for the comments if anyone had mentioned this. IndirectDispatch would work, you can even bucket a few fixed span lengths and dispatch shader variations. 1 threadgroup per span so you could do the bitonic sort in parallel. The problem is not dissimilar to tiled classification deferred shading .
First video I've seen from this channel, but lots of fun :)
Very glad I discovered this channel a week before starting my GPU programming class
i love the monogatari style editing you have for some scenes
When I saw some of the performance near the beginning, I thought there was no way that it would be even partially salvageable. That was some super neat optimization!
That effect looks amazing! I can already think of usecases for it!
I haven't thought of sorting in that way.. great video!
this is sick cuh, keep da good work up
12:06 sus
Did... Did you make a Monogatari reference throughout the video with the different colored scenes and chapters?
next you'll wonder where my name comes from!
@@Acerola_t ohhh. Didn't even think of that. Lol
I love your editing style
I do QA for a rendering software and this is genuinely so much more interesting than I imagined. It’s facinating to take a look under the hood even if it’s not something the devs I work with would ever do.
0:29 ... bro what's up with that tony the tiger cropped yiff?
howd you know what it was? 🤨🤨🤨🤨🤨🤨🤨
9:27 what if i wanna watch both the explanation and the cat!! they're BOTH cool!!
10:23 OH MY GOD NOOOO
@@vintage08too many cats!
11:35 T O O M A N Y
I was only able to pay attention thanks to the cat clips. Truely the inovation in tutorials and teaching mateiral we needed. Please keep using them.
I'd actually love to see someone play a game with this shader. Amazing video!
I think you could use simple quicksort or mergesort (no tread division) to make the sorting of the spans significantly faster, maybe not 2ms fast but way faster
The sort "keys" are small one- or two-byte integers, so you could also try a radix sort. That's O(N), whereas the quicksort or mergesort algorithms are O(N log N).
Acerola, this channel is so damn cool. I feel like you're pioneering the "white paper" of the modern age.
I don’t know about pixel sorting but subbed for the momento reference. Great movie. Tbh been using some pixel sorting and displacement map in node video and liking some of the results. Gona check if u have a vid on datamoshing been messing with that too.
here at exactly 100k, congrats :D
I think a sorting algorithm for image data could be an interesting way to compress its size, even if it's not as efficient as other methods. Go through the image, list the colors as 24bit values, and use the numbers they correspond to and how many of each are along the lines.
I'm pretty sure PNG already does something similar with its lossless compression.
0:06 man of culture I see
First video ive seen from this channel and I have to say I love the almost exclusively Persona bgm
This video is both hilarious and extremely informative. Definitely subscribing
The furry Frosted Flakes tiger drawing made me want to sever my ocular nerves
11:37 he he he cat-egorised
What if you use the mask's X (or Y) coordinate as a high significance value added to the value you're sorting. You'd use the start of the span as the number, so on black areas it would keep counting up but on white areas it would stall and create a span that all has the same high sig value added to it. Then you can use the Parallel Bitonic Merge Sort again.
This shifting perspective animation made my day :D new subscriber :) greetings from Poland :D
thanks for the added tony. it's much appreciated.
That tony the tiger picture is 100% certified cursed.
Your videos really helped my five year old learn about real-time image rendering, it's way better than the cocomelon version. she always asks for "the long hair man" (: thank you so much
my true target audience
The first time I heard and saw pixel sorting was on a community post from Max0r and it just so happens the beginning has a song he used in one of his videos, so this gives me sort of flashback feeling to it.
Also damn interesting and well explained. Good job
For cell shading, we typically take the color's projected shadow line and instead of being linear gradient we make it a unit ramp. this is basically our contrast map, and if we add more steps then we have our staircase, where each step has it's own layer of contiguous data...
“Snorting Pixels”
I watched this video and I will never be the same ! Wow !
I forgot about pixel sort! Great video, you just got yourself a fan.
first time on the channel, love the jokes and the content and good information
10:44 What toy is that? (in the bottom left. the rod or whatever sticking out of the blue ball rolly toy) i think my cats would like it 🥺🙌
lol I just stuck one of the rod toys into the hole that was in the blue ball toy, so it's not an official thing you can buy.
10:55 help, I'm trying to pay attention but the cat cam's are distracting me
I actually used this shader before finding this video. I had no idea what it was supposed to be for.
Now, after watching this video, I have even less of an idea about what it's supposed to be for.
Great job!
I don't have any experience with shader coding, and I don't know if anyone else has presented this same idea (there are a lot of comments), but I had an idea for a way to use the bitonic sort with the spans in place.
The process would start with creating a luminance (or any other sorting metric) buffer image, with pixel values being singular floats. Then, one thread per row/column would be dispatched to comb through the rows/columns of the luminance buffer in reverse (so if higher luminance values are getting sorted to the top, the program works from bottom-to-top here). Pixels with luminance values outside our threshold get set to some special value (say, -1). Every time we reach the end of a span, a counter (which started at 0) increases by 1. The current value of this counter is then added to pixel values we come across that fall within the threshold. This ensures that the bitonic sort doesn't swap pixels from different spans, as pixels from higher spans will definitely have higher values than pixels from lower spans. The comparison check for the bitonic sort will also check if either pixel value is the special value (-1), and ignores the pair if so (I don't know if float comparisons with NaNs work the same on the GPU as on the CPU, where comparisons involving them always return false, but if they do, picking NaN as the special value and skipping the manual check for it might help). The bitonic sort will then be run on the entire image (both the original and the luminance buffer), using the values in the luminance buffer for its comparisons.
There's probably ways to optimize this that I can't think of, and I haven't tested it myself, but I hope me writing this out here could be of some use.