CS248 Menu Buttons

UNIX Programming

"Chapter Three - Working with Files"

Chapter Outline

Lecture Notes

Working with Files

In this chapter we learn how to create, open, read, write, and close files.

UNIX File Structure

In UNIX, everything is a file.

Programs can use disk files, serial ports, printers and other devices in the exactly the same way as they would use a file.

Directories, too, are special sorts of files.

Directories

As well as its contents, a file has a name and 'administrative information', i.e. the file's creation/modification date and its permissions.

The permissions are stored in the inode, which also contains the length of the file and where on the disc it's stored.

A directory is a file that holds the inodes and names of other files.

Files are arranged in directories, which also contain subdirectories.

A user, neil, usually has his files stores in a 'home' directory, perhaps /home/neil.

Files and Devices

Even hardware devices are represented (mapped) by files in UNIX. For example, as root, you mount a CD-ROM drive as a file,

/dev/console - this device represents the system console.
/dev/tty - This special file is an alias (logical device) for controlling terminal (keyboard and screen, or window) of a process.
/dev/null - This is the null device. All output written to this device is discarded.

System Calls and Device Drivers

System calls are provided by UNIX to access and control files and devices.

A number of device drivers are part of the kernel.

The system calls to access the device drivers include:

Library Functions

To provide a higher level interface to device and disk files, UNIIX provides a number of standard libraries.

Low-level File Access

Each running program, called a process, has associated with it a number of file descriptors.

When a program starts, it usually has three of these descriptors already opened. These are:

The write system call arranges for the first nbytes bytes from buf to be written to the file associated with the file descriptor fildes.

With this knowledge, let's write our first program, simple_write.c:

Here is how to run the program and its output.

read

The read system call reads up to nbytes of data from the file associated with the file decriptor fildes and places them in the data area buf.

This program, simple_read.c, copies the first 128 bytes of the standard input to the standard output.

If you run the program, you should see:

open

To create a new file descriptor we need to use the open system call.

open establishes an access path to a file or device.

The name of the file or device to be opened is passed as a parameter, path, and the oflags parameter is used to specify actions to be taken on opening the file.

The oflags are specified as a bitwise OR of a mandatory file access mode and other optional modes. The open call must specify one of the following file access modes:

The call may also include a combination (bitwise OR) of the following optional modes in the oflags parameter:

Initial Permissions

When we create a file using the O_CREAT flag with open, we must use the three parameter form. mode, the third parameter, is made form a bitwise OR of the flags defined in the header file sys/stat.h. These are:

Foe example

Has the effect of creating a file called myfile, with read permission for the owner and execute permission for others, and only those permissions.

umask

The umask is a system variable that encodes a mask for file permissions to be used when a file is created.

You can change the variable by executing the umask command to supply a new value.

The value is a three-digit octal value. Each digit is the results of ANDing values from 1, 2, or 4.


For example, to block 'group' write and execute, and 'other' write, the umask would be:

Values for each digit are ANDed together; so digit 2 will have 2 & 1, giving 3. The resulting umask is 032.

close

We use close to terminate the association between a file descriptor, fildes, and its file.

ioctl

ioctl is a bit of a rag-bag of things. It provides an interface for controlling the behavior of devices, their descriptors and configuring underlying services.

ioctl performs the function indicated by cmd on the object referenced by the descriptor fildes.

Try It Out - A File Copy Program

We now know enough about the open, read and write system calls to write a low-level program, copy_system.c, to copy one file to another, character by character.



Running the program will give the following:

We used the UNIX time facility to measure how long the program takes to run. It took 2 and one half minutes to copy the 1Mb file.

We can improve by copying in larger blocks. Here is the improved copy_block.c program.

Now try the program, first removing the old output file:

The revised program took under two seconds to do the copy.

Other System Calls for Managing Files

Here are some system calls that operate on these low-level file descriptors.

lseek

The lseek system call sets the read/write pointer of a file descriptor, fildes. You use it to set where in the file the next read or write will occur.

The offset parameter is used to specify the position and the whence parameter specifies how the offset is used.

whence can be one of the following:

fstat, stat and lstat

The fstat system call returns status information about the file associated with an open file descriptor.

The members of the structure, stat, may vary between UNIX systems, but will include:

The permissions flags are the same as for the open system call above. File-type flags include:

Other mode flags include:

Masks to interpret the st_mode flags include:

There are some macros defined to help with determining file types. These include:

To test that a file doesn't represent a directory and has execute permisson set for the owner and no other permissions, we can use the test:

dup and dup2

The dup system calls provide a way of duplicating a file descriptor, giving two or more, different descriptors that access the same file.

The Standard I/O Library

The standard I/O library and its header file stdio.h, provide a versatile interface to low-level I/O system calls.

Three file streams are automatically opened when a program is started. They are stdin, stdout, and stderr.

Now, let's look at:


fopen

The fopen library function is the analog of the low level open system call.

fopen opens the file named by the filename parameter and associates a stream with it. The mode parameter specifies how the file is to be opened. It's one of the following strings:

If successful, fopen returns a non-null FILE * pointer.

fread

The fread library function is used to read data from a file stream. Data is read into a data buffer given by ptr from the stream, stream.

fwrite

The fwrite library call has a similar interface to fread. It takes data records from the specified data buffer and writes them to the output stream.

fclose

The fclose library function closes the specified stream, causing any unwritten data to be written.

fflush

The fflush library function causes all outpstanding data on a file stream to be written immediately.

fseek

The fseek function is the file stream equivalent of the lseek system call. It sets the position in the stream for the next read or write on that stream.

fgetc, getc, getchar

The fgetc function returns the next byte, as a character, from a file stream. When it reaches the end of file, it returns EOF.

The getc function is equivalent to fgetc, except that you can implement it as a macro.

The getchar function is equivalent to getc(stdin) and reads the next character from the standard input.

fputc, putc, putchar

The fputc function writes a character to an output file stream. It returns the value it has written, or EOF on failure.

The function putc is quivalent to fputc, but you may implement it as a macro.

The putchar function is equivalent to putc(c,stdout), writing a single character to the standard output.

fgets, gets

The fgets function reads a string from an input file stream. It writes characters to the string pointed to by s until a newline is encountered, n-1 characters have been transferred or the end of file is reached.

Formatted Input and Output

There are library functions for producing output in a controlled fashion.

printf, fprintf and sprintf

The printf family of functions format and output a variable number of arguments of different types.

Ordinary characters are passed unchanged into the output. Conversion specifiers cause printf to fetch and format additional argumetns passed as parameters. They are start with a %.

For example

which produces, on the standard output:

Here are some of the most commonly used conversion specifiers:

Here's another example:

This produces:

Field specifiers are given as numbers immediatley after the % character in a conversion specifier. Theya re used to make things clearer.

The printf function returns an integer, the number of characters written.

scanf, fscanf and sscanf
\

The scanf family of functions work in a similar way to the printf group, except that thye read items from a stream and place vlaues into variables.

The format string for scanf and friends contains both ordinary characters and conversion specifiers.

Here is a simple example:

The call to scanf will succeed and place 1234 into the variable num given either if the following inputs.

Other conversion specifiers are:

Given the input line,

this call to scanf will correctly scan four items:

In general, scanf and friends are not highly regarded, for three reasons:

Other Stream Functions

Other library functions use either stream paramters or the standard streams stdin, stdout, stderr:

You can use the file stream functions to re-implement the file copy program, by using library functions.

Try It Out - Another File Copy Program

This program does the character-by-character copy is accomplished using calls to the functions referenced in stdio.h.

Running this program as before, we get:

This time, the program runs in 3.7 seconds.

Stream Errors

To indicate an error, many of the stdio library functions return out of range values, such as null pointers or the constant EOF.

In these cases, the error is indicated in the external variable errno:


You can also interrogate the state of a file stream to determine whether an error has occurred, or the end of file has been reached.

The ferror function tests the error indicator for a stream and returns non-zero if its set, zero otherwise.

The feof function tests the end-of-file indicator within a stream and returns non-zero if it is set zero otherwise.

You use it like this:

The clearerr function clears the end-of-file and error indicators for the stream to which stream points.

Streams and File Descriptors

Each file stream is associated with a low level file descriptor.

You can mix low-level input and output operations with higher level stream operations, but this is generally unwise.

The effects of buffering can be difficult to predict.

File and Directory Maintenance

The standard libraries and system calls provide complete control over the creation and maintenance of files and directories.

chmod

You can change the permissions on a file or directory using the chmod system call. Tis forms the basis of the chmod shell program.

chown

A superuser can change the owner of a file using the chown system call.


unlink, link, symlink

We can remove a file using unlink.

The unlink system call edcrements the link count on a file.

The link system call cretes a new link to an existing file.

The symlink creates a symbolic link to an existing file.

mkdir, rmdir

We can create and remove directories using the mkdir and rmdir system calls.

The mkdir system call makes a new directory with path as its name.

The rmdir system call removes an empty directory.

chdir, getcwd

A program can naviagate directories using the chdir system call.

A program can determine its current working directory by calling the getcwd library function.

The getcwd function writes the name of the current directory into the given buffer, buf.

Scanning Directories

The directory functions are declared in a header file, dirent.h. They use a structure, DIR, as a basis for directory manipulation.

Here are these functions:

opendir

The opendir function opens a directory and establishes a directory stream.

readdir

The readdir function returns a pointer to a structure detailing the next directory entry in the directory stream dirp.

The dirent structure containing directory entry details included the following entries:

telldir

The telldir function returns a value that records the current position in a directory stream.

seekdir

The seekdir function sets the directory entry pointer in the directory stream given by dirp.

closedir

The closedir function closes a directory stream and frees up the resources associated with it.

Try It Out - A Directory Scanning Program

1. The printdir, prints out the current directory. It will recurse for subdirectories.


2. Now we move onto the main function:

The program produces output like this (edited for brevity):

How It Works

After some initial error checking, using opendir, to see that the directory exists, printdir makes a call to chdir to the directory specified. While the entries returned by readdir aren't null, the program checks to see whether the entry is a directory. If it isn't, it prints the file entry with indentation depth.

Here is one way to make the program more general.

You can run it using the command:

Errors

System calls and functions can fail. When they do, they indicate the reason for their failure by setting the value of the external varaible errno.

The values and meanings of the errors are listed in the header file errno.h. They include:

There are a couple of useful functions for reporting errors when they occur: strerror and perror.

The strerror function maps an error number into a string describing the type of error that has occurred.

The perror function also maps the current error, as reported in errno, into a string and prints it on the standard error stream.

It's preceded by the message given in the string s (if not null), followed by a colon and a space. For example:

might give the following on the standard error output:

Advanced Topics

fcntl

The fcntl system call provides further ways to manipulate low level file descriptors.

It can perform miscellaneous operations on open file descriptors.

The call,

returns a new file descriptor with a numerical value equal to or greater than the integer newfd.

The call,

returns the file descriptor flags as defined in fcntl.h.

The call,

is used to set the file descriptor flags, usually just FD_CLOEXEC.

The calls,

respectively get and set the file status flags and access modes.

mmap

The mmap function creates a pointer to a region of memory associated with the contents of the file accessed through an open file descriptor.

You can use the addr parameter to request a particular memory address.

The prot parameter is used to set access permissions for the memory segment. This is a bitwise OR of the following constant values.

The flags parameter controls how changes made to the segment by the program are reflected elsewhere.

The msync function causes the changes in part or all of the memory segment to be written back to (or read from) the mapped file.

The part of the segment to be updated is given by the passed start address, addr, and length, len. The flags parameter controls how the update should be performed.

The munmap function releases the memory segment.

Try It Out - Using mmap

1. The following program, mmap_eg.c shows a file of structures being updated using mmap and array-style accesses.

Here is the definition of the RECORD structure and the create NRECORDS versions each recording their number.

2. We now change the integer value of record 43 to 143, and write this to the 43rd record's string:

3. We now map the records into memory and access the 43rd record in order to change the integer to 243 (and update the record string), again using memory mapping:

Summary

This chapter showed how UNIX provides direct access to files and devices..


CS 248 - UNIX Programming Web Site Menu
Information | Syllabus | Schedule | Online "Lectures" | Projects | Quizzes | Web Board



Copyright © 2001 by James L. Fuller, all rights reserved.