Index #
- The case of a few bytes
- Results
- Why is this happening?
- A better way to do things
- Bitmap File header
- Device-Independent Bitmap
- Too Long didn't read
- Footnotes
Original Date of writing: 17 February 2023
The case of a few bytes #
I've been working on a small hobby project/program of mine. It implements image rotation by some angle (in degrees). It's an exercise in Igor Zhirkov's book where the reader is to use bitmap format to write back the rotated image, the input is also expected to be a bitmap image.
Anyhow, the bitmap file header begins with a 2 byte
number that identifies the BMP and DIB header. It is commonly 0x424d
(BM
in ascii, 0x4d42
in little endian,
how it shall appear on my machine and probably yours as well.)
Now, the first approach I decided on was to read the first two bytes and compare those two bytes with 0x42
and 0x4d
and then parse the header fully if we have the matching subset, else we don't parse.
During this, we need to return back to the start of the stream so we can collect the file header in one full fread()
call rather than manually setting our structure's type field1 and then fread()
another few bytes
into our header
structure2
Here are some examples of the three cases. A few details and error checking have been omitted for brevity.
ungetc()
1 short int byte1, byte2;
2 byte1 = fgetc(bitmap);
3 byte2 = fgetc(bitmap);
4
5 if(byte1 != '0x42' && byte2 != '0x4d') { /* do not parse further */}
6
7 /* what we care about */
8 ungetc(byte2, bitmap);
9 ungetc(byte1, bitmap); // probably check for eof because only 1 push back is guaranteed
Do note that with this approach only one push back is guaranteed as specified in the standard. Although, the cppreference's notes section indicates that on popular operating systems the pushback buffer is large enough to support our operations.
The size of the pushback buffer varies in practice from 4k (Linux, MacOS) to as little as 4 (Solaris) or the guaranteed minimum 1 (HPUX, AIX).
The apparent size of the pushback buffer may be larger if the character that is pushed back equals the character existing at that location in the external character sequence (the implementation may simply decrement the read file position indicator and avoid maintaining a pushback buffer).
fseek()
to stream start
1 short int byte1, byte2;
2 byte1 = fgetc(bitmap);
3 byte2 = fgetc(bitmap);
4
5 if(byte1 != '0x42' && byte2 != '0x4d') { /* do not parse further */}
6
7 /* what we care about */
8 fseek(bitmap, 0, SEEK_SET);
rewind()
1 short int byte1, byte2;
2 byte1 = fgetc(bitmap);
3 byte2 = fgetc(bitmap);
4
5 if(byte1 != '0x42' && byte2 != '0x4d') { /* do not parse further */}
6
7 /* what we care about */
8 rewind(bitmap)
Now, the question that arises, which is faster? Well the obvious thing is to benchmark this, so I
rolled up my little get_time_nsec()
3 function and measured the time it took for both cases and calculated the delta.
I ran the benchmark 10 times and then took the average, the code was compiled with these compiler options.4
Before I reveal the results, make a guess. Which is faster? I asked the same question to two of my friends and both agreed
that it should be fseek()
, (at the time, the choice for rewind()
was not presented to them).
For the sake of keeping this article short, I shall not discuss the logic behind their arguments.
Results #
ungetc()
1ungetc_avg = (791+1793+1002+1012+1393+1002+1303+1583+1022+972)/10
2ungetc_avg
31187.3
fseek()
1fseek_avg = (3396+2936+3326+3356+1834+3576+2855+1783+3056+2285)/10
2fseek_avg
32840.3
rewind()
1rewind_avg = (4749+2374+5657+5727+4819+4889+6216+6146+5657+6984)/10
2rewind_avg
35321.8
The units for these numbers is
ns
ornanosecond
.
This indicates that in this particular case, ungetc()
is about 58.19% faster than fseek()
5 and 77.68% faster than rewind()
5.
From the presented options, it seems that going a little off-standard is worth this micro-optimization but I do not recommend it, as there is better alternative to doing the above shenanigans. Although, this benchmark does have its issues, e.g includes time taken for function calls but it does showcase a pretty large difference for a specific case.
Why is this happening? #
fseek()
Let us take a look at glib source for fseek()
. Upon inspection, it is defined as follows:
1int
2fseek (FILE *fp, long int offset, int whence)
3{
4 int result;
5 CHECK_FILE (fp, -1);
6 _IO_acquire_lock (fp);
7 result = _IO_fseek (fp, offset, whence);
8 _IO_release_lock (fp);
9 return result;
10}
Looking for _IO_fseek()
results in finding out that it is a macro that actually calls _IO_seekoff_unlocked()
that further calls _IO_seekoff()
.
1#define _IO_fseek(__fp, __offset, __whence) \
2 (_IO_seekoff_unlocked (__fp, __offset, __whence, _IOS_INPUT|_IOS_OUTPUT) \
3 == _IO_pos_BAD ? EOF : 0)
1off64_t
2_IO_seekoff_unlocked (FILE *fp, off64_t offset, int dir, int mode)
3{
4 if (dir != _IO_seek_cur && dir != _IO_seek_set && dir != _IO_seek_end)
5 {
6 __set_errno (EINVAL);
7 return EOF;
8 }
9
10 /* If we have a backup buffer, get rid of it, since the __seekoff
11 callback may not know to do the right thing about it.
12 This may be over-kill, but it'll do for now. TODO */
13 if (mode != 0 && ((_IO_fwide (fp, 0) < 0 && _IO_have_backup (fp))
14 || (_IO_fwide (fp, 0) > 0 && _IO_have_wbackup (fp))))
15 {
16 if (dir == _IO_seek_cur && _IO_in_backup (fp))
17 {
18 if (_IO_vtable_offset (fp) != 0 || fp->_mode <= 0)
19 offset -= fp->_IO_read_end - fp->_IO_read_ptr;
20 else
21 abort ();
22 }
23 if (_IO_fwide (fp, 0) < 0)
24 _IO_free_backup_area (fp);
25 else
26 _IO_free_wbackup_area (fp);
27 }
28
29 return _IO_SEEKOFF (fp, offset, dir, mode);
30}
Now, it seems that fseek()
acquires a lock and calls _IO_seekoff_unlocked()
is calculating
a bunch of stuff regarding the backup buffer
it seems. And then finally calling _IO_SEEKOFF()
.
This fseek()
call seems to involve calculations and management of a buffer, and locks in order
to go back to the beginning of the stream. This is a possible reason as to why it might be slow.
Thanks to cortexauth#6190 for pointing me here.
rewind()
After going through the source similarly at glibc, it is following the same pattern asfseek()
.
1void
2rewind (FILE *fp)
3{
4 CHECK_FILE (fp, );
5 _IO_acquire_lock (fp);
6 _IO_rewind (fp);
7 _IO_clearerr (fp);
8 _IO_release_lock (fp);
9}
1#define _IO_rewind(FILE) \
2 (void) _IO_seekoff_unlocked (FILE, 0, 0, _IOS_INPUT|_IOS_OUTPUT)
Where rewind()
is also calling _IO_seekoff_unlocked()
but with the offset and the stream position indicators being 0
and 0
respectively.
However, rewind()
was actually slower than fseek()
and that might be explained by what _IO_clearerr()
might be doing which causes it to be
additionally slower than fseek()
. But wait! _IO_clearerr()
is actually a macro that modifies the FILE
structure. It seems to be masking off
the error bits.
1#define _IO_clearerr(FP) ((FP)->_flags &= ~(_IO_ERR_SEEN|_IO_EOF_SEEN))
So what is the actual reason behind this slow down with rewind()
? It could likely be the backup buffer mentioned in the source code, which is
being free'd. Take note that in the ungetc()
example written above, there was a note from cppreference which indicated a pushback buffer
for the stream which might actually be coming into play with _IO_seekoff_unlocked()
's backup buffer here. It could likely be the same buffer(verification needed).
But then it should have been in effect when using fseek()
as well. rewind()
and fseek()
should perform similar if not equally here in my opinion.
Perhaps a more experienced reader could email me about this if they know more.
ungetc()
Now, for our fastest contender ungetc()
seems to be having a rather simple implementation. There is some minimal error checking and for
"putting back the character on the stream" it is just performing checks if the previous character in stream is the same character passed
as argument and then just decrements the FILE
internal read counter/stream indicator.
1int
2ungetc (int c, FILE *fp)
3{
4 int result;
5 CHECK_FILE (fp, EOF);
6 if (c == EOF)
7 return EOF;
8 if (!_IO_need_lock (fp))
9 return _IO_sputbackc (fp, (unsigned char) c);
10 _IO_acquire_lock (fp);
11 result = _IO_sputbackc (fp, (unsigned char) c);
12 _IO_release_lock (fp);
13 return result;
14}
1int
2_IO_sputbackc (FILE *fp, int c)
3{
4 int result;
5
6 if (fp->_IO_read_ptr > fp->_IO_read_base
7 && (unsigned char)fp->_IO_read_ptr[-1] == (unsigned char)c)
8 {
9 fp->_IO_read_ptr--;
10 result = (unsigned char) c;
11 }
12 else
13 result = _IO_PBACKFAIL (fp, c);
14
15 if (result != EOF)
16 fp->_flags &= ~_IO_EOF_SEEN;
17
18 return result;
19}
Above 8 code snippets (indicated by different code blocks) are directly taken from glibc source, their licensing applies these snippets respectively.
And there you have it! We finally have some grasp over why rewind()
and fseek()
are slower than ungetc()
in our specific case.
A better way to do things #
Bitmap File header #
1struct __attribute((packed)) BitmapFileHeader
2{
3 u16 bf_type;
4 u32 bf_fsize;
5 u32 bf_reserved;
6 u32 bf_pixels_offset;
7} // 14-bytes
Device-Independent Bitmap (BITMAPINFOHEADER) #
1struct __attribute((packed)) BitmapInfoHeader_DIB
2{
3 u32 bi_size;
4 i32 bi_width;
5 i32 bi_height;
6 u16 bi_panes;
7 u16 bi_bitcount;
8 u32 bi_compression;
9 u32 bi_sizeimage;
10 u32 bi_xpixels_permeter;
11 u32 bi_ypixels_permeter;
12 u32 bi_clrused;
13 u32 bi_clrimportant;
14} // 54-bytes
The previously described niche case is unrealistic and should not be used in real code.
Since we know that the bitmap file header i.e the the first 14-bytes are all the same, we could perform a read
of 14-bytes and fill out the file header, and then perform the check on bf_type
and then parse the rest of DIB (Device-Independent Bitmap) header.
But there is another problem here, the DIB header has many versions with variying sizes, eg.
BITMAPCOREHEADER
(the oldest) is 12 bytes but BITMAPINFOHEADER
(most common) is 54 bytes, and OS22XBITMAPHEADER
adds
additional 24 bytes to the BITMAPINFOHEADER
.
So we know the kind of mess we've got ourselves into. The subset that I have chosen to implement in my program is BM
for file header type
and BITMAPINFOHEADER
structure for the in-memory DIB. The goal is to only allow BITMAPINFOHEADER
to be parsed (and optionally support others
in the future).
The initial 12-bytes of all DIB headers is the same (which is actually the BITMAPCOREHEADER
), thus it is reasonable to perform another read
of 12-bytes and then conditionally parse the rest of the header based (similar to how it was done earlier2) on bi_size
field of the header.
This way we minimize fiddling with the stream and be done in about 3 or fewer reads.
Minimizing the number of reads is important so what we should ideally do is to have a packed struct with the file header and DIB header combined
and read 14 + 12 = 26 bytes in one read, perform the checks on bf_type
and bi_size
and report any errors if unsuccessful otherwise continue
parsing the rest of BITMAPINFOHEADER
6. Ofcourse, this advice is limited to the subset that was defined earlier. You may want to do things differently.
Too Long didn't read #
Considering The case of a few bytes first, and then after the previous analysis, the following conclusions can be drawn.
ungetc()
is about 58.19% faster thanfseek()
5 and 77.68% faster thanrewind()
5.ungetc()
only modifies the internal structure ofFILE
and performs some checks.fseek()
andrewind()
are a lot more involved, calling many internal functions for acquiring locks, performing checks on pointers, handling a buffer etc.
If you're parsing bitmap images and want to only support the most common sub-format in BMP format, then
- Perform a read of 26 bytes, check for file type and dib header size.6
- parse further eg. the pixel array or the color profiles or the extra bit masks when there is specific case of compression of pixel data.
- Be done with it.
and remember, always minimize the number of reads or poking with the file stream.
The case described in the beginning was a naive approach.
Please email me if you wish to discuss these results with me, I would be delighted for any kind of feedback. This is mostly a study based on my specific case with a better approach highlighted above.
Footnotes #
-
header->bf_type = (byte2 << 8) | byte1
or if you read into ashort int
then simplyheader->bf_type = id
perhaps ↩︎ -
Let us define an exemplary header
1struct __attribute((packed)) BITMAPHEADER { 2 u16 bf_type; 3 u32 bf_fsize; 4 u32 bf_reserved; 5 u32 bf_pixels_offset; 6 7 /* rest of the fields in the header */ 8 ... 9};
Since we've manually set bf_type field already, we need to read into the structure from
bf_fsize
field.
↩︎ ↩︎1fread (&header->bf_fsize, 1, sizeof(header) - sizeof(header->bf_type), bitmap);
-
Benchmark
↩︎1long 2get_time_nsec (void) 3{ 4#ifndef ICE_DONT_BENCHMARK_CODE 5struct timespec time; 6if (timespec_get (&time, TIME_UTC) == 0) 7 fprintf (stdout,"Error: Filling timespec structure failed with timespec_get()\n"); 8return time.tv_nsec; 9#endif 10}
-
-Wall -Wextra -Werror -O3
↩︎ -
((original_value - new_value) / original_value) * 100 = X%
. Increaese/Decrease depends on whether theoriginal_value
was larger/smaller respectively. ↩︎ ↩︎ ↩︎ ↩︎ -
Parsing the bitmap header in two reads. Error checking and some details have been omitted for the sake of brevity.
↩︎ ↩︎1struct __attribute ((packed)) BITMAP_HEADER 2{ 3 u16 bf_id; 4 u32 bf_fsize; 5 u32 bf_reserved; 6 u32 bf_pixels_offset; 7 8 u32 bi_size; 9 i32 bi_width; 10 i32 bi_height; 11 u16 bi_panes; 12 u16 bi_bitcount; 13 u32 bi_compression; 14 u32 bi_sizeimage; 15 u32 bi_xpixels_permeter; 16 u32 bi_ypixels_permeter; 17 u32 bi_clrused; 18 u32 bi_clrimportant; 19}; // 54-bytes 20 21struct BITMAP_HEADER* 22read(FILE* bitmap) 23{ 24 struct BITMAP_HEADER *metadata = calloc(1, sizeof(*metadata)); 25 fread(data, 1, 26, bitmap); 26 if(metadata->bf_id != 0x4d42) // little-endian machine, use switch if you wish to support multiple formats or something 27 ... // or perhaps do some macro magic, dealer's choice 28 if(metadata->bi_size != 40) 29 ... 30 return metadata; 31}