Reverse engineering a drone's IP cam. stream

I’ve bought a cheap drone which has a 2 camera in it. And the drones camera connects with a app called WIFI UAV. The drone creates it’s own wifi ap and when ever any one tries to connect to it by sending a pacific header the drone sends video stream over UDP port 8800 and i’ve wrote a simple python script to get the data from the drone.

So, the drone sends exactly 1080 Bytes long data packets. And I’ve also decompiled the apk of WIFI UAV. I’ve found that the drone sends H.264 stream.

but after analyzing the packets that i got from my script and wireshark i can’t decode the data as a file. So, here is just one single packet of 1080 Bytes from the packet that drone sends.

http://s000.tinyupload.com/index.php?file_id=04212327903416803839

PS: I’ve a very little knowledge about the video different video codec .

1 Like

Hi,

I am also working on reverse engineering the stream. I believe a code snippet from the picture is from the lxHwEncoder.java class: WiFi UAV_v2021.01.05.apk/lxHwEncoder.java - Decompiler.com, but could I also ask you to share your python script?

Thanks a lot!

1 Like

Hello!
Have you found anything else ? I stopped working on the project and I don’t have the script rn.

Have you found how to decode ? I’m approaching the same problem

no. have you found anything interesting ?

Other than controlling the drone via keyboard, not much. I’ve been stuck on decoding the video for weeks haha.

Seems like the App is using Ffmpeg, AVPlayer, EglView, Opencv, CNN, and etc. Idk if it’s helping.

I’ve read many articles about “hacking” bunch of drone. And their way of accesing the video stream so simple. Idk why it’s hard in this case haha. I thought that the drone is encapsulating the stream using something like RTP but after testing, the packet send using RTP doesn’t match what the drone have sent. Now i’m stuck. Still researching sending video over udp tho, will reply when i’ve found the solution.

wow! great. i would like to learn more about how you are controlling the drone via keyboard. its been almost 2 years i’ve stoped working on this.

So basically the WIFI UAV App sends 124 Bytes of Data. The 13th, 14th, 89th, 90th, 109th, and 110th byte are about time counter. Where 13 and 14 starts with 00, 89 and 90 starts with 01, and 109 and 110 starts with 02.

For controlling, the 21st byte is left and right from 00 to FF where in the middle is 80 ( 126 decimal )
22nd byte is forward and backward where backward from 00 and forward to FF
23th byte is throttle from 00 to FF
24th byte is for yaw from 00 to FF
25th byte is for command, where 01 takeoff, 02 emergency stop/killswitch, 03 landing, 04 Calibrate Gyro
26th byte is also command but for toggle between Headless mode and non-headless mode

There must be many commands but i didnt found yet

1 Like

Hi,
Is there any update regarding the video stream from the uav-wifi apk, I am also trying to find a way to decode the video stream and process it further

Unfortunately i didn’t found anything. I’m resorting to make my own ip-cam using ESP32Cam, maybe if you want to know about control stuff you can visit my documentation here. Or let me know if you found anything interesting.

Hi! I found a solution to this problem. It’s actually a jpeg bitstream of data. In technicality, you need to provide your own headers and quantization tables and huffman tables. But its a 4:4:4 YCbCr data stream. In wireshark you’ll see a group of 1080 packets(They have counters with 00 marking start of streamed frame) along with a packet length less than 1080(final stream in frame). The data stream starts at 0x62 for each of the packets and all you have to do is clump them together, attach the right jpeg headers to the top and end code the stream with jpeg ender FFD9 to the end of the data and viola! Your system will recognize the jpeg stream correctly!

check it out by running this example github. The .jsons are example packets I intercepted from wireshark from the drone. There’s both a still image example and a video example.
(GitHub - JadanPoll/DroneUAVHack: Decoding the stream of a UAV Drone)

Hello, I am a university student studying software engineering. I am using the same drone and application, and I have started a project on this topic. I would like to learn more and get in touch with you via email or social media. First of all, I am curious about the type of encoding because, in its current form, I cannot see the start and end format of the JPEG file after decoding it. I would appreciate it if we could get in touch and you could help me out.

1 Like

hello, after seeing all this comments i think i should restart this project. and if you want to get in touch with me you can find my github and twitter with the same username.

Hi Emirhan, thanks for reaching out. Wherever you’re studying from I’m glad to hear you’ve taken an interest in this project. It had been a delightful haunt for me for days when I came across it. How much do you know about the JPEG specification and how JPEG files work? Sending headers and quantization and Huffman tables is highly redundant as the data need only contain the SOS(Start of actaully data scan) if the source and the destination are both already fully aware of the nature and format specification of the data its receiving. You wont find the typical FF D8 and FF D9 markers in it as such but that is not necessary to decode JPEG

I think you should start over again

Hi Zeroday00, I can’t say I’m very advanced at the moment; I’m still in the learning phase. I want to learn and be able to achieve things alongside my project. I captured packets from my drone using Kali Linux and a Wi-Fi card, resulting in .cap, .csv, and .netxml files. The goal of my project is to manipulate the drone while it’s flying and capture images. I opened the .cap file with Wireshark, filtered for UDP, and converted the data to .json. The .json file created from the code provided, such as 0x30c4, displayed videos or photos without issues. However, my own .json file displayed lines in the video when it was played. Additionally, I want to learn how to identify the type of encoding used in the UDP hexadecimal code fragments within the .json file and decode it to see the JPEG start (ffd8) and end (ffd9) markers directly. Understanding this will help me better comprehend the content of the communication and achieve my goal

This is a great project to learn and you are off to a good start. In case anyone would find it helpful I will share a bit about my learning process. One way reverse-engineer enthusiasts decode full industry NES and GBA game devices is by firing bits into memory locations and observing the results. It’s easier than simply observing the image stream data due to the sheer complexity and entropy of states the stream bits could be in(Was very very fun). Initially while attempting to decode I applied the same to single complete frames of the byte stream and observed how image changes are made on the Wifi UAV app. Then I collected my data
1.Changing certain bits alters brightness or darkness(sometimes completely dark other times completely white)
2. Changing other bits alters the intensity of the green or pink component(however a lot rarer and few and even further between)
3. Changing other bits seems to delete a ‘pixel’ and cause a tearing shift in subsequent rendered parts of the image
4. If the ‘pixel’ didn’t delete the ‘pixel’ itself appears to be distorted
5. These ‘pixels’ are not in fact pixels but blocks
6. The FF bit would break the Wifi UAV App on IOS and force an exit on the app

Eventually, you discover this is statistically similar to how JPEG images work. The usual Huffman, Quantization and Chrominance tables are not transmitted in the UDP stream. To read the data on your desktop you’d have to create your headers. That was why I had to write the program in my github. It simply puts the stream in a .jpg readable format, not by manipulating the raw data but by prepending the appropriate headers and mathematical values to the top. As you observed “it works” but it still glitches with lines when playing back videos on some frames. Whether it’s ‘Bad Frames’ which should be dropped or my algorithm needs some tweaking on the mathematical tables it uses or some odd combination of both, the final solution could still further be improved.

The flying and drone manipulation part is orders of magnitude easier however, you have maybe already realized that

1 Like

I am reviewing the code and trying to understand what it does step by step. I have learned the Huffman algorithm and have an intermediate understanding of the JPEG algorithm, but I couldn’t make sense of this code. Could you help me understand the code?