Please correct me if I'm wrong, I had this thought this morning and wanted to share it. Haven't tested it myself yet.

The typical normal input script is 106 bytes and the output 25 bytes. Here's how the usual check that decides whether or not someone is allowed to spend an output:

<input signature> <input public key> OP_DUP OP_HASH160 QQQQQQQQQQQQQQQQQQQQ OP_EQUALVERIFY OP_CHECKSIG

Here's what I'm thinking you could do:

<input signature> <input public key> OP_PUSHDATA2 XXYYYYYZZZZZZZZZZZZZZZZZZZ...Z OP_DROP OP_DUP OP_HASH160 QQQQQQQQQQQQQQQQQQQQ OP_EQUALVERIFY OP_CHECKSIG

Note: There's a "hidden" opcode after OP_HASH160 which is usually not printed out when script is displayed like this. It's opcode 20 (displayed as "14" in hex), which means that the next 20 bytes are pushed into the stack. This is why when you "count the bytes" above in the normal script you get 24 instead of 25. There are are 75 of these "hidden" opcodes that lack words for them. Sidenote: Opcode numbers shouldn't be confused with opcode words, for example opcode 2 is not OP_2. Opcode 82 is OP_2 (displayed as "52" in hex).

<input signature> = (32 bytes.) Pushed onto the stack by the input script, not part of the output script.

<input public key> = (32 bytes.) Pushed onto the stack by the input script, not part of the output script.

Each opcode is 1 byte.

X = (Two bytes.) Length of total data used by OP_PUSHDATA2 (both Y and Z). Two bytes allowes for a number up to 65535.

Y = (Five bytes.) Data block header. Four bytes are a magic number that parsers can scan for in order to know that this is interesting data. The fifth byte is block sequence number, 0-255.

Z = (X-5 bytes.) Data block. Part of the JPG image or ZIP file or whatever else.

Q = (20 bytes.) Standard RIPEMD-160 hash of the SHA-256 input signature. This is always part of a standard output script.

*Later thought concerning Y: The header bytes should probably be extended to include UTF8 path/filename as well. ** Later thought concerning Z: Always-compressed data would be best (meaning jpgs are packed before stored and must be unpacked by the viewer, all done automatically of course). Even later thoughts about the two thoughts above (sorry about the mess of this presentation) : It's probably better to bake path data into the data blocks themselves so that several files would be a one continuous flow of bytes. So the viewer would first parse all data blocks and combine them using the sequence numbers. Then it would unpack this compressed data. Then it would check the unpacked data for headers that have path/filename info as well as the length of the file. If bytes still remain after that it means there are more files, just repeat until the bytes run out. So bottom line is the header in the output script wouldn't need to be changed from XXYYYYY after all, the path/filename info is found inside the Z data after decompression.

With these changes, what happens in all Bitcoin clients is that the output script is parsed as if the script was a normal 25 byte script (a normal "pubkeyhash"-type of script).

The file data is pushed onto the stack by OP_PUSHDATA2. But it is then immediately removed by OP_DROP and the script continues as if it was never there.

However parsers that just look at the transaction as a whole will check all output scripts for the magic number represented by four bytes.

If detected it checks three bytes earlier if there's indeed an OP_PUSHDATA2 here. If so it reads the total data length in the next two bytes, reduces that number by 5 (the header length), skips ahead 4 bytes (past the magic number), reads 1 byte (the sequence number) and finally proceeds to read the remaining bytes as determined by the number that was reduced by 5. Once all output signatures have been checked the data blocks are ordered by their sequence numbers and put together into the final file, which can be whatever. (Later thought.) After the file blocks are put together you don't have a file, you have compressed data. It is now decompressed and scanned for actual files. First 2 bytes are the path length in bytes. Then follows those bytes that translates into a UTF8 string, which determines the file path and its file name (includes file name extension). Then there are 2 bytes that determine how long the actual file is. After that, if there are still bytes left, it means that there are more files. Repeat checking 2 bytes for the path length etc. A transaction could contain one or several files. Images, music tracks, zip archives (which can be password-protected), movie files, text files... Even a root html file that then uses the rest of the files to put together a little website. You can also store private encrypted file containers (created by for example TrueCrypt 7.1a) on the blockchain, though you'd probably need to split them over several transactions. (Even later thought.) At the very beginning of the data (after it has been uncompressed) there could be a 22 bytes long master header of the data (before the first file header). This can be used to chain together several transactions into one huge flow of data. The first byte would be the sequence number, 0-255. The highest sequence number needed in order to complete the data (so if the first two bytes are both 0 it means that this single transactions contains all of the data). After that there's a 20 byte RIPEMD-160 hash of the finished data (minus the first 22 bytes, which is the master header). So after a parser decompresses the bytes it checks the master header's first two bytes. If they are not 00 it checks all of the transactions sent to this address for master headers with the same RIPEMD-160 hash. After all are found the data is put together in the order determined by the first byte of each master header, then the finished data is hashed to see if it has the same RIPEMD-160 hash is. Upon success it can finally extract all the files from the flow of data. That way you could store much larger files on the blockchain, those spanning several transactions. Something like maybe 256*90k = 21.97 MiB using lazy math. If four bytes are used instead of two to determine the total transactions in the master header that number grows to 65536*90k = 5.49 GiB although that would be super expensive to store at 1 sat/byte.

*The header could (should) be expanded to contain a UTF8 encoded filename so that scanners will know how to best present its discovered file to humans. If we allow 2 bytes for the filename length we could put entire paths in there, meaning a small multi-file homepage could exist inside a single transaction.

The max script size is 10k so with 255 sequence numbers you could theoretically store 2550k file bytes in a transaction (minus headers and the rest of the output script), although max size for a single transaction is 100k and there is a bunch of other bytes that need to accompany every output script (the actual transaction). However you'd probably still be able to easily store a file over 75 KiB in every transaction. **Especially if you compress it, just a simple deflate could save a lot of bytes. Make it part of the protocol so that scanners always enlarge files after putting together all the data blocks and so that whichever program creates the transaction compress the file before cutting it into data blocks.

Now you'd need to send some satoshis at the dust value (currently 546) for every data block you need (1 output = 1 output script), so putting a, after compression, 55k JPG on the blockchain will require 6 outputs, which means about 3276 satoshis plus the ~56k miner fee due to the size of the transaction (it still cost 1 sat/byte). However the good news is that since the data is stored in the scripts you can just send the 3276 satoshis back to yourself so the cost is just the miner fee.

Let's compare this to methods where you use the 20 byte output addresses to store data. When you do that all satoshis send will be burned since nobody own these addresses. If you want to put a 40k image on the blockchain using that method you'd need to have a whopping 2000 outputs, all burning 546 satoshis each. It would be a very costly storage at 1092000 satoshis (0.01092 BCH) + maybe 80k sat in miner fees. Compared to that method my method sounds like a bargain.

So, unless I'm missing something it looks like you can store whatever data you want in the output script(s) while still making it work like a regular transaction in all actual bitcoin clients. As far as they are concerned the script is just a normal output script. Only special parsers in for example JavaScript viewers will see the files.

The immediate use case I see for this is avatars on memo.cash. You could even use this to store much longer text messages; instead of putting your message in OP_RETURN you put something that tells the scanner that the actual message is in the first text file found inside all of the data blocks. As a bonus it would then still be possible to store images (and zip files etc) inside the data blocks so not only should messages well beyond 10k characters be possible on memo.cash but they could even be decorated with several small images. All for the cost of just 1 sat/byte (after data compression).