A few quick things out of the way as I know these type of questions often have a malicious background.
- I am not trying to execute something in a file format (jpg)
- I am not trying to simply achieve the end goal (i.e. make something look like a jpg but exectue)
- I'm simply trying to learn the difference between how a computer parses information
So from my understanding a computer dealing with a data file format, such as JPG, PDF, etc etc will only parse the information in what essentially is a container adhering I imagine to specific layouts / specifications. Commonly you hear the difference between a data file format and an executable is that the computer won't "run (execute)" the data file. My question is - how does the computer know what to read, and what to execute? What is the difference? My mental image of a computer is something that reads instructions step by step. If in the middle of a jpg that it was parsing it saw shellcode for popping a message box up, why doesn't it activate when read?
I'm aware that things could be executed via exploits for the software, tricking the parser - perhaps via buffer overflows etc etc. Again, not so much interested in the end goal of how it can be achieved. I'm more interested in how the computer can tell what is meant to be read, what is meant to be executed, and how reading can be achieved without execution.
2 Answers
Well, data files are opened by programs. And programs/executable files themselves open those files and interpret the data.
Executable files, are opened/run by the OS for the CPU. They contain instructions which is data for the CPU.
If a data file contained instructions for a CPU, it wouldn't normally go to the CPU because it's just a file your program is reading, so at most it could have instructions for your program. But if something goes wrong in a program, like a data file causes a buffer overflow in a program, then what's in the data file could I suppose get sent to the CPU.
One could rename a jpg file to exe and execute it and it will get run by the CPU and give an error, unless it's really got machine code in it (CPU instructions), then it's really an executable file that had a wrong extension and now has the right extension.
3On Windows, it's only the extension that makes file executable - to be specific EXEs will execute and some other formats will be interpreted by cmd or PowerShell.
On Unix-like systems, there's an execution bit. If you're familiar with the concept of Windows file attributes, then you can think of it as of a kind of attribute. Any file can have an execution bit set - it will be possible to execute it (as a program, script etc.). Otherwise operating system will always treat it as a regular file that cannot be executed.
Unix-like OSes don't use the concept of file extensions, but rather try to identify files by their content. Usually few first bytes of a file make up a magic number - an unique file-type identifier (see Wikipedia article.)
1