SPM 2.0 writes to self - Advanced AVR 3.0 Core Bootloader
Application section writes to application section is almost impossible because the SPM instruction required to write to flash memory is also required to be residing in the bootloader section, but it can be done easily so by defining a function within the bootloader section using linkage switch start-section. This way, the SPM is happily living in the bootloader section, while the application code can write to itself.
One simple function is created to do one page write: 1, SRAM is filled with one page worth of data (64 bytes for the Atmega8); 2, the writeFlash() function is called: A, it fills the flash buffer with the SRAM buffer; B, it erases the whole page; C, it writes to the page with the data from the flash buffer. 3, done! Filling up one page to SRAM before calling the writeFlash() function is more efficient.
Note: Before EEPROM has been done writing, it's not valid to write to buffer nor to the flash.
Note2: SPM 2.0 is the finalized c code using boot.h; It's good enough. SPM 3.0 had already been done with custom written inline asm spm code, but it's only slightly better, but it will not work with larger flash memory MCU, so SPM 3.0 is discarded. The SPM 2.0 writeFlash() code compiled to 94 bytes, less than 2 pages, good enough is good enough.
Note3: Even though how SPM writes to flash is not described in words fully and in detail, the compiled assembly code for writeFlash() are fully commented, line by line, and this is better. For those who can't read assembly, the comments serve as pseudo code for understand how to write with SPM.
Note4: Actually, bad is bad, and for going the extra mile, a simplified routine will be test with simple asm and c. Writing to SPM is actually very simple, and there is no need to write the complicated hard coded routine all in inline asm. It's less portable and hard to understand for beginner. It's less about code optimization, but more out of despite for badly written code using macro. Finally, it can't be called advanced if it can't be done in multiple ways, and c code is more readable than anything else, and easier for the compiler to optimize, and so it's done, using 86 bytes instead of 94 bytes.