Lab 6: File I/O

external data representation; fopen(), fclose(), fread() and fwrite(); struct practice

Goals

After this lab you will be able to

  1. Explain what XDR is, and how it relates to files.
  2. Open files with fopen() with correct usage mode, and fclose() them.
  3. Handle error return codes from the relevant system calls.
  4. Read and write to files using with fread() and fwrite() using simple arrays and structs.

Setup

  1. Add the new directory '6' to the root directory of your working copy and repository. This will be the working directory for all the instructions below.

External Data Representation using Files

A program's working memory of stack and heap data structures only exists while the program is running. To store data between runs, and to capture output, we can use the filesystem. The filesystem is a service provided by the OS that provides files to your programs. A file is like a named array of bytes, and once created a file will persist until deleted, even when the computer is turned off.

You are familiar with files: text files like C sourcecode; sound files like MP3s; executable files like your compiled programs. At the filesystem abstraction level, these are all the same thing: just a contiguous sequence of bytes. The interpretation of these bytes is up to your program.

Files are a common special case of the general problem of storing data outside a running program. In general this is called External Data Representation (XDR). Other examples of XDR occur when using databases or networking.

Files are identified by a path, which is a generalization of a filename and can be any of:

The programmer's interface to the filesystem is quite basic, with most of the work done with four abstract operations: A less-used fifth operation, SEEK(), allows you to set the read/write position directly without reading or writing.

Almost every programming language supports a version of this interface. You may recognize it from Python. For the C programmer, this interface is provided by these four system calls defined in stdio.h:

FILE * fopen( const char * filename, 
              const char * mode);

size_t fwrite( const void * ptr, 
               size_t size, 
               size_t nitems, 
               FILE * stream);

size_t fread( void * ptr,
              size_t size, 
              size_t nitems, 
              FILE * stream);

int fclose( FILE *stream);

These calls closely match their abstract versions, except that read and write have a convenient extension that makes it easy to work with structs (see example code above). The following links give the specifications of each of these functions according to the Open Group standard:

Documentation is also available as man pages on your local computer. The advantages of the Open Group specifications are that they are sometimes better written, cover only the functionality supported by all standard implementations and often contain examples. The man pages will contain details that are specific to your local OS.

You should get used to reading documentation in these forms.

Unless you have a good reason, stick to the standard interfaces. This will make it easier (i) to port your code to another OS; and (ii) to find another programmer who can understand it. Also, new versions of OS are more likely to implement the standard than to retain their previous quirks.

A note on interface design

These functions are a masterpiece of interface design. fopen() has the most complex functionality, but a very simple interface. fwrite() and fread() have the same interface to opposite functionality. Your calls to read and write look exactly the same, which makes it easy to write them correctly.

Useful extras

You may find these useful:

Files by example

Examples of using the file API as demonstrated in class, and beyond. Background on files and links to the interface specifications are provided below.

Write a simple array to a file

#include <stdio.h>

int main( int argc, char* argv[] )
{
  const size_t len = 100;
  int arr[len];

  // put data in the array
  // ...

  // write the array into a file (error checks ommitted)
  FILE* f = fopen( "myfile", "w" ); 
  fwrite( arr, sizeof(int), len, f );
  fclose( f );

  return 0;
}

Read a simple array from a file

#include <stdio.h>

int main( int argc, char* argv[] )
{
  const size_t len = 100;
  int arr[len];

  // read the array from a file (error checks ommitted)
  FILE* f = fopen( "myfile", "w" ); 
  fread( arr, sizeof(int), len, f );
  fclose( f );

  // use the array
  // ...

  return 0;
}

Write an array of structs to a file, then read it back

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef  struct 
{
  int x,y,z;
} point3d_t;

int main( int argc, char* argv[] )
{
  const size_t len = atoi(argv[1]);
  
  // array of points to write out
  point3d_t wpts[len];
  
  // fill with random points
  for( size_t i=0; i<len; i++ )
    {
      wpts[i].x = rand() % 100;
      wpts[i].y = rand() % 100;
      wpts[i].z = rand() % 100;
    }
  
  // write the struct to a file (error checks ommitted)
  FILE* f1 = fopen( argv[2], "w" ); 
  fwrite( wpts, sizeof(point3d_t), len, f1 );
  fclose( f1 );
  
  // array of points to read in from the same file
  point3d_t rpts[len];
  
  // read the array from a file (error checks ommitted)
  FILE* f2 = fopen( argv[2], "r" ); 
  fread( rpts, sizeof(point3d_t), len, f2 );
  fclose( f2 );
  
  if( memcmp( wpts, rpts, len * sizeof(rpts[0]) ) != 0 )
    puts( "Arrays differ" );
  else
    puts( "Arrays match" );
	 
  return 0;
}

Saving and loading an image structure, with error checking

This example shows the use of a simple file format that uses a short "header" to describe the file contents, so that an object of unknown size can be loaded.

Make sure you understand this example in detail. It combines elements from the examples above into a simple but realistic implementation of a file format.

/* saves an image to the filesytem using the file format:
   [ cols | rows | pixels ]
   where:
     cols is a uint32_t indicating image width
     rows is a uint32_t indicating image height
     pixels is cols * rows of uint8_ts indicating pixel grey levels
*/
int img_save( const img_t* img, const char* filename )
{
  assert( img );
  assert( img->data );

  FILE* f = fopen( filename, "w" ); 
  if( f == NULL )
    {
      puts( "Failed to open image file for writing" );
      return 1;
    }

  // write the image dimensions header
  uint32_t hdr[2];
  hdr[0] = img->cols;
  hdr[1] = img->rows;

  if( fwrite( hdr, sizeof(uint32_t), 2, f ) != 2 )
    {
      puts( "Failed to write image header" );
      return 2;
    }    
  
  const size_t len = img->cols * img->rows;
  
  if( fwrite( img->data, sizeof(uint8_t), len, f ) != len )
    {
      puts( "Failed to write image pixels" );
      return 3;
    }    

  fclose( f );
  return 0;
}

/* loads an img_t from the filesystem using the same 
   format as img_save().

   Warning: any existing pixel data in img->data is not free()d.
*/
int img_load( img_t* img, const char* filename )
{
  assert( img );

  FILE* f = fopen( filename, "r" ); 
  if( f == NULL )
    {
      puts( "Failed to open image file for reading" );
      return 1;
    }
    
  // read the image dimensions header:
  uint32_t hdr[2];
  
  if( fread( hdr, sizeof(uint32_t), 2, f ) != 2 )
    {
      puts( "Failed to read image header" );
      return 2;
    }    
  
  img->cols = hdr[0];
  img->rows = hdr[1];

  // helpful debug:
  // printf( "read header: %u cols %u rows\n", 
  //	  img->cols, img->rows );
  
  // allocate array for pixels now we know the size
  const size_t len = img->cols * img->rows; 
  img->data = malloc( len * sizeof(uint8_t) );
  assert( img->data );

  // read pixel data into the pixel array
  if( fread( img->data, sizeof(uint8_t), len, f ) != len ) 
     { 
       puts( "Failed to read image pixels" ); 
       return 3;
     }    

  fclose( f );
  return 0;
}

Usage:

  1. img_t img;
  2. img_load( &img, "before.img" );
  3.  
  4. image_frobinate( img ); // manipulate the image somehow
  5.  
  6. img_save( &img, "after.img" );

Task 1: Serialize an array of integers to a binary-format file

Extend the functionality of your integer array from Lab 5 to support saving and loading arrays from the filesystem in a binary format.

Fetch the header file "intarr.h". It contains these new function declarations:

/* LAB 6 TASK 1 */

/*
  Save the entire array ia into a file called 'filename' in a binary
  file format that can be loaded by intarr_load_binary(). Returns
  zero on success, or a non-zero error code on failure. Arrays of
  length 0 should produce an output file containing an empty array.
*/
int intarr_save_binary( intarr_t* ia, const char* filename );

/*
  Load a new array from the file called 'filename', that was
  previously saved using intarr_save_binary(). Returns a pointer to a
  newly-allocated intarr_t on success, or NULL on failure.
*/
intarr_t* intarr_load_binary( const char* filename );

Requirements

  1. Add and commit a single C source file called "t1.c" containing implementations of these two functions.
  2. The file must include the "intarr.h" header file..
  3. The code may call any other functions declared in "intarr.h". Your code will be linked against our reference implementation for testing, so make sure the submitted file does not contain functions with the same names.
  4. Use your own implementation of the intarr_t functions for your local testing. Note that you do not need to have completed all of Lab 5 to do this, but seek help right away if you have not completed intarr_create() at least.
  5. Performance hint: calls to write() are relatively expensive. Try to use as few as you can.

Submission

Commit the single file "t1.c" to your repo in the lab 6 directory.

Task 2: Serialize an array of integers to a JSON text-format file

Extend the functionality of your integer array from Lab 5 to support saving and loading arrays from the filesystem in JSON, a common human- and machine-readable text format.

Sometimes it is useful for humans to be able to read your stored data, or to import your data into another program that does not understand your binary format. The most readable, portable XDR format is plain text. A popular syntax for text files is JSON (JavaScript Object Notation), which, as the name suggests, was originally an XDR format for web programs. It is easier to use and less verbose than the also-popular Extensible Markup Language (XML) and more expressive than the bare-bones Comma-Separated Values (CSV) formats you may have seen.

The down side of text formats is that they are:

  1. inefficient in space, since e.g. a four-byte integer (int32_t) could require up to 12 bytes to represent its minimum value of -2147483647 as a decimal string;
  2. inefficent in time, since parsing the text file to convert it back into a binary format is much more expensive than loading a binary file.

The header file "intarr.h" also contains these new function declarations:

The standard library has two functions that can be very helpful for rendering text into files:

They work just like the familiar printf() and scanf() but read to and write from FILE* objects instead of standard input and standard output. You should probably use these to solve this task.

Notice from those man pages that another pair of functions snprintf() and sscanf() is also available to print and scan from C strings too. (sprintf() exists, but the lack of array length checking means this is not safe or secure to use. Always use snprintf()).

/* LAB 6 TASK 2 */

/*
  Save the entire array ia into a file called 'filename' in a JSON
  text file array file format that can be loaded by
  intarr_load_json(). Returns zero on success, or a non-zero error
  code on failure. Arrays of length 0 should produce an output file
  containing an empty array.
  
  The JSON output should be human-readable.

  Examples:

  The following line is a valid JSON array:
  [ 100, 200, 300 ]
  
  The following lines are a valid JSON array:
  [ 
   100, 
   200, 
   300 
  ]
*/
int intarr_save_json( intarr_t* ia, const char* filename );

/*
  Load a new array from the file called 'filename', that was
  previously saved using intarr_save(). The file may contain an array
  of length 0. Returns a pointer to a newly-allocated intarr_t on
  success (even if that array has length 0), or NULL on failure.
*/
intarr_t* intarr_load_json( const char* filename );

Requirements

  1. Add and commit a single C source file called "t2.c" containing implementations of these two functions.
  2. The other requirements of Task 1 apply.
  3. Hint: you should NOT create a single huge string in memory and write it out in one call to write(). The string could require a huge amount of memory when your array is large. Since you chose an inefficient text format, you're not optimizing for speed so don't worry about using many calls to write().

Submission

Commit the single file "t2.c" to your repo in the lab 6 directory.

Lab complete. Back to the list of labs.