A look at ungetc fseek rewind

Index #

The case of a few bytes
Results
Why is this happening?
A better way to do things
Bitmap File header
Device-Independent Bitmap
Too Long didn't read
Footnotes

Original Date of writing: 17 February 2023

The case of a few bytes #

I've been working on a small hobby project/program of mine. It implements image rotation by some angle (in degrees). It's an exercise in Igor Zhirkov's book where the reader is to use bitmap format to write back the rotated image, the input is also expected to be a bitmap image.

Anyhow, the bitmap file header begins with a 2 byte number that identifies the BMP and DIB header. It is commonly 0x424d (BM in ascii, 0x4d42 in little endian, how it shall appear on my machine and probably yours as well.)

Now, the first approach I decided on was to read the first two bytes and compare those two bytes with 0x42 and 0x4d and then parse the header fully if we have the matching subset, else we don't parse. During this, we need to return back to the start of the stream so we can collect the file header in one full fread() call rather than manually setting our structure's type field¹ and then fread() another few bytes into our header structure²

Here are some examples of the three cases. A few details and error checking have been omitted for brevity.

ungetc()

1    short int byte1, byte2;
2    byte1 = fgetc(bitmap);
3    byte2 = fgetc(bitmap);
4
5    if(byte1 != '0x42' && byte2 != '0x4d') { /* do not parse further */}
6
7    /* what we care about */
8    ungetc(byte2, bitmap);
9    ungetc(byte1, bitmap); // probably check for eof because only 1 push back is guaranteed

Do note that with this approach only one push back is guaranteed as specified in the standard. Although, the cppreference's notes section indicates that on popular operating systems the pushback buffer is large enough to support our operations.

The size of the pushback buffer varies in practice from 4k (Linux, MacOS) to as little as 4 (Solaris) or the guaranteed minimum 1 (HPUX, AIX).

The apparent size of the pushback buffer may be larger if the character that is pushed back equals the character existing at that location in the external character sequence (the implementation may simply decrement the read file position indicator and avoid maintaining a pushback buffer).

fseek() to stream start

1    short int byte1, byte2;
2    byte1 = fgetc(bitmap);
3    byte2 = fgetc(bitmap);
4
5    if(byte1 != '0x42' && byte2 != '0x4d') { /* do not parse further */}
6
7    /* what we care about */
8    fseek(bitmap, 0, SEEK_SET);

rewind()

1    short int byte1, byte2;
2    byte1 = fgetc(bitmap);
3    byte2 = fgetc(bitmap);
4
5    if(byte1 != '0x42' && byte2 != '0x4d') { /* do not parse further */}
6
7    /* what we care about */
8    rewind(bitmap)

Now, the question that arises, which is faster? Well the obvious thing is to benchmark this, so I rolled up my little get_time_nsec()³ function and measured the time it took for both cases and calculated the delta. I ran the benchmark 10 times and then took the average, the code was compiled with these compiler options.⁴

Before I reveal the results, make a guess. Which is faster? I asked the same question to two of my friends and both agreed that it should be fseek(), (at the time, the choice for rewind() was not presented to them). For the sake of keeping this article short, I shall not discuss the logic behind their arguments.

Results #

ungetc()

1ungetc_avg = (791+1793+1002+1012+1393+1002+1303+1583+1022+972)/10
2ungetc_avg
31187.3

fseek()

1fseek_avg = (3396+2936+3326+3356+1834+3576+2855+1783+3056+2285)/10
2fseek_avg
32840.3

rewind()

1rewind_avg = (4749+2374+5657+5727+4819+4889+6216+6146+5657+6984)/10
2rewind_avg
35321.8

The units for these numbers is ns or nanosecond.

This indicates that in this particular case, ungetc() is about 58.19% faster than fseek()⁵ and 77.68% faster than rewind()⁵.

From the presented options, it seems that going a little off-standard is worth this micro-optimization but I do not recommend it, as there is better alternative to doing the above shenanigans. Although, this benchmark does have its issues, e.g includes time taken for function calls but it does showcase a pretty large difference for a specific case.

Why is this happening? #

fseek()

Let us take a look at glib source for fseek(). Upon inspection, it is defined as follows:

 1int
 2fseek (FILE *fp, long int offset, int whence)
 3{
 4  int result;
 5  CHECK_FILE (fp, -1);
 6  _IO_acquire_lock (fp);
 7  result = _IO_fseek (fp, offset, whence);
 8  _IO_release_lock (fp);
 9  return result;
10}

Looking for _IO_fseek() results in finding out that it is a macro that actually calls _IO_seekoff_unlocked() that further calls _IO_seekoff().

1#define _IO_fseek(__fp, __offset, __whence) \
2  (_IO_seekoff_unlocked (__fp, __offset, __whence, _IOS_INPUT|_IOS_OUTPUT) \
3   == _IO_pos_BAD ? EOF : 0)

 1off64_t
 2_IO_seekoff_unlocked (FILE *fp, off64_t offset, int dir, int mode)
 3{
 4  if (dir != _IO_seek_cur && dir != _IO_seek_set && dir != _IO_seek_end)
 5    {
 6      __set_errno (EINVAL);
 7      return EOF;
 8    }
 9
10  /* If we have a backup buffer, get rid of it, since the __seekoff
11     callback may not know to do the right thing about it.
12     This may be over-kill, but it'll do for now. TODO */
13  if (mode != 0 && ((_IO_fwide (fp, 0) < 0 && _IO_have_backup (fp))
14		    || (_IO_fwide (fp, 0) > 0 && _IO_have_wbackup (fp))))
15    {
16      if (dir == _IO_seek_cur && _IO_in_backup (fp))
17	{
18	  if (_IO_vtable_offset (fp) != 0 || fp->_mode <= 0)
19	    offset -= fp->_IO_read_end - fp->_IO_read_ptr;
20	  else
21	    abort ();
22	}
23      if (_IO_fwide (fp, 0) < 0)
24	_IO_free_backup_area (fp);
25      else
26	_IO_free_wbackup_area (fp);
27    }
28
29  return _IO_SEEKOFF (fp, offset, dir, mode);
30}

Now, it seems that fseek() acquires a lock and calls _IO_seekoff_unlocked() is calculating a bunch of stuff regarding the backup buffer it seems. And then finally calling _IO_SEEKOFF(). This fseek() call seems to involve calculations and management of a buffer, and locks in order to go back to the beginning of the stream. This is a possible reason as to why it might be slow.

Thanks to cortexauth#6190 for pointing me here.

rewind() After going through the source similarly at glibc, it is following the same pattern as fseek().

1void
2rewind (FILE *fp)
3{
4  CHECK_FILE (fp, );
5  _IO_acquire_lock (fp);
6  _IO_rewind (fp);
7  _IO_clearerr (fp);
8  _IO_release_lock (fp);
9}

1#define _IO_rewind(FILE) \
2  (void) _IO_seekoff_unlocked (FILE, 0, 0, _IOS_INPUT|_IOS_OUTPUT)

Where rewind() is also calling _IO_seekoff_unlocked() but with the offset and the stream position indicators being 0 and 0 respectively. However, rewind() was actually slower than fseek() and that might be explained by what _IO_clearerr() might be doing which causes it to be additionally slower than fseek(). But wait! _IO_clearerr() is actually a macro that modifies the FILE structure. It seems to be masking off the error bits.

1#define _IO_clearerr(FP) ((FP)->_flags &= ~(_IO_ERR_SEEN|_IO_EOF_SEEN))

So what is the actual reason behind this slow down with rewind()? It could likely be the backup buffer mentioned in the source code, which is being free'd. Take note that in the ungetc() example written above, there was a note from cppreference which indicated a pushback buffer for the stream which might actually be coming into play with _IO_seekoff_unlocked()'s backup buffer here. It could likely be the same buffer(verification needed). But then it should have been in effect when using fseek() as well. rewind() and fseek() should perform similar if not equally here in my opinion.

Perhaps a more experienced reader could email me about this if they know more.

ungetc()

Now, for our fastest contender ungetc() seems to be having a rather simple implementation. There is some minimal error checking and for "putting back the character on the stream" it is just performing checks if the previous character in stream is the same character passed as argument and then just decrements the FILE internal read counter/stream indicator.

 1int
 2ungetc (int c, FILE *fp)
 3{
 4  int result;
 5  CHECK_FILE (fp, EOF);
 6  if (c == EOF)
 7    return EOF;
 8  if (!_IO_need_lock (fp))
 9    return _IO_sputbackc (fp, (unsigned char) c);
10  _IO_acquire_lock (fp);
11  result = _IO_sputbackc (fp, (unsigned char) c);
12  _IO_release_lock (fp);
13  return result;
14}

 1int
 2_IO_sputbackc (FILE *fp, int c)
 3{
 4  int result;
 5
 6  if (fp->_IO_read_ptr > fp->_IO_read_base
 7      && (unsigned char)fp->_IO_read_ptr[-1] == (unsigned char)c)
 8    {
 9      fp->_IO_read_ptr--;
10      result = (unsigned char) c;
11    }
12  else
13    result = _IO_PBACKFAIL (fp, c);
14
15  if (result != EOF)
16    fp->_flags &= ~_IO_EOF_SEEN;
17
18  return result;
19}

Above 8 code snippets (indicated by different code blocks) are directly taken from glibc source, their licensing applies these snippets respectively.

And there you have it! We finally have some grasp over why rewind() and fseek() are slower than ungetc() in our specific case.

A better way to do things #

Bitmap File header #

1struct __attribute((packed)) BitmapFileHeader
2{
3    u16 bf_type;
4    u32 bf_fsize;
5    u32 bf_reserved;
6    u32 bf_pixels_offset;
7} // 14-bytes

Device-Independent Bitmap (BITMAPINFOHEADER) #

 1struct __attribute((packed)) BitmapInfoHeader_DIB
 2{
 3    u32 bi_size;
 4    i32 bi_width;
 5    i32 bi_height;
 6    u16 bi_panes;
 7    u16 bi_bitcount;
 8    u32 bi_compression;
 9    u32 bi_sizeimage;
10    u32 bi_xpixels_permeter;
11    u32 bi_ypixels_permeter;
12    u32 bi_clrused;
13    u32 bi_clrimportant;
14} // 54-bytes

The previously described niche case is unrealistic and should not be used in real code. Since we know that the bitmap file header i.e the the first 14-bytes are all the same, we could perform a read of 14-bytes and fill out the file header, and then perform the check on bf_type and then parse the rest of DIB (Device-Independent Bitmap) header.

But there is another problem here, the DIB header has many versions with variying sizes, eg. BITMAPCOREHEADER (the oldest) is 12 bytes but BITMAPINFOHEADER (most common) is 54 bytes, and OS22XBITMAPHEADER adds additional 24 bytes to the BITMAPINFOHEADER.

So we know the kind of mess we've got ourselves into. The subset that I have chosen to implement in my program is BM for file header type and BITMAPINFOHEADER structure for the in-memory DIB. The goal is to only allow BITMAPINFOHEADER to be parsed (and optionally support others in the future).

The initial 12-bytes of all DIB headers is the same (which is actually the BITMAPCOREHEADER), thus it is reasonable to perform another read of 12-bytes and then conditionally parse the rest of the header based (similar to how it was done earlier²) on bi_size field of the header. This way we minimize fiddling with the stream and be done in about 3 or fewer reads. Minimizing the number of reads is important so what we should ideally do is to have a packed struct with the file header and DIB header combined and read 14 + 12 = 26 bytes in one read, perform the checks on bf_type and bi_size and report any errors if unsuccessful otherwise continue parsing the rest of BITMAPINFOHEADER⁶. Ofcourse, this advice is limited to the subset that was defined earlier. You may want to do things differently.

Too Long didn't read #

Considering The case of a few bytes first, and then after the previous analysis, the following conclusions can be drawn.

ungetc() is about 58.19% faster than fseek()⁵ and 77.68% faster than rewind()⁵.
ungetc() only modifies the internal structure of FILE and performs some checks.
fseek() and rewind() are a lot more involved, calling many internal functions for acquiring locks, performing checks on pointers, handling a buffer etc.

If you're parsing bitmap images and want to only support the most common sub-format in BMP format, then

Perform a read of 26 bytes, check for file type and dib header size.⁶
parse further eg. the pixel array or the color profiles or the extra bit masks when there is specific case of compression of pixel data.
Be done with it.

and remember, always minimize the number of reads or poking with the file stream.

The case described in the beginning was a naive approach.

Please email me if you wish to discuss these results with me, I would be delighted for any kind of feedback. This is mostly a study based on my specific case with a better approach highlighted above.

Footnotes #

header->bf_type = (byte2 << 8) | byte1 or if you read into a short int then simply header->bf_type = id perhaps ↩︎

Let us define an exemplary header

1struct __attribute((packed)) BITMAPHEADER {
2    u16 bf_type;
3    u32 bf_fsize;
4    u32 bf_reserved;
5    u32 bf_pixels_offset;
6
7    /* rest of the fields in the header */
8    ...
9};

Since we've manually set bf_type field already, we need to read into the structure from bf_fsize field.

1fread (&header->bf_fsize, 1, sizeof(header) - sizeof(header->bf_type), bitmap);

↩︎ ↩︎

Benchmark

 1long
 2get_time_nsec (void)
 3{
 4#ifndef ICE_DONT_BENCHMARK_CODE
 5struct timespec time;
 6if (timespec_get (&time, TIME_UTC) == 0)
 7  fprintf (stdout,"Error: Filling timespec structure failed with timespec_get()\n");
 8return time.tv_nsec;
 9#endif
10}

↩︎

-Wall -Wextra -Werror -O3 ↩︎
((original_value - new_value) / original_value) * 100 = X%. Increaese/Decrease depends on whether the original_value was larger/smaller respectively. ↩︎ ↩︎ ↩︎ ↩︎

Parsing the bitmap header in two reads. Error checking and some details have been omitted for the sake of brevity.

 1struct __attribute ((packed)) BITMAP_HEADER
 2{
 3    u16 bf_id;
 4    u32 bf_fsize;
 5    u32 bf_reserved;
 6    u32 bf_pixels_offset;
 7
 8    u32 bi_size;
 9    i32 bi_width;
10    i32 bi_height;
11    u16 bi_panes;
12    u16 bi_bitcount;
13    u32 bi_compression;
14    u32 bi_sizeimage;
15    u32 bi_xpixels_permeter;
16    u32 bi_ypixels_permeter;
17    u32 bi_clrused;
18    u32 bi_clrimportant;
19}; // 54-bytes
20
21struct BITMAP_HEADER*
22read(FILE* bitmap)
23{
24    struct BITMAP_HEADER *metadata = calloc(1, sizeof(*metadata));
25    fread(data, 1, 26, bitmap);
26    if(metadata->bf_id != 0x4d42) // little-endian machine, use switch if you wish to support multiple formats or something
27        ...             // or perhaps do some macro magic, dealer's choice
28    if(metadata->bi_size != 40)
29        ...
30    return metadata;
31}

↩︎ ↩︎