Systems Notes

00. C Quickstart

01. printf

02. Pimiative Types

03. Operator Precedence

04. Variables

05. Declaring and Defining Functions

06. Compiling and Linking

07. Make

08. Memory

09. Memory Addresses

10. More printf

11. Poniters

12. Endianness

13. Arrays

14. Array Variables

15. Functions

16. Strings

17. Structs

18. Stack and Heap Memory

19. File Functions

20. Input

21. Signals

22. Exec

23. Fork & Wait

24. Redirection

25. System V IPC

26. Shared Memory

27. Semaphores

28. Pipes

29. Pipe Networking

30. Sockets

31. Select


C Quickstart

  • By convention, C source files should have a .c file extension (i.e. dylan.c).
  • The C compiler we will be using is gcc (the Gnu C Compiler)
  • usage: $ gcc dylan.c
    • This will create a standalone executable file.
    • The default name for the output file is a.out
    • There is no preferred extension for c executable files (think about programs like ls, ssh, chmod, these are all C programs, notice the lack of file extension).
    • You can provide your own output file name with the -o flag.
      • usage: $ gcc -o dj dylan.c
  • Compiled C programs are natively executable, to run them just type ./program (i.e. $ ./a.out or $ ./dj
  • The ./ is only needed because you probably compiled the file in a folder outside your PATH environment variable (if that is unclear, you can forget the previous sentence entirely for now).
  • Before moving on, you should write, compile and run the example program provided above. I promise it is 100% sytactically correct. (Yes, you may get a compiler warning message, this is the only time you’re allowed to ignore it.
Back to top

printf

  • The thinking person’s System.out.println
  • printf is the function normally used in C to print to standard out.
  • usage: printf( string, arg0, arg1, ...)
    • Sends string to standard out.
    • The first argument must be a literal string enclosed by ", as in the example program above.
    • string can contain special placeholder characters that are used to insert other values into the output.
    • %d is the placeholder to display a value as an int.
    • %lf is the placeholder to display a value as a double.
    • There are other placeholder characters that we will see later on.
    • If placeholder characters are used, then they will be replaced by the arguments following the string when printf is executed.
    • The value arguments can be either variables or literal values.
    • example: printf(“these are numbers: %d %lf\n”, 3, 845.273); would display: these are numbers: 3 845.273
  • Once you get used to it, most people prefer printf’s value replacement system to Java’s string concatenation (+) inside System.out.println. In fact, Java does have a System.out.printf because of it.
  • There is one difference between System.out.println and printf that you will find annoying. I leave it for you to discover…
  • As an exercise, add some printf statements to your existing example program. Try declaring variables and printing their values. Once you get the hang of it, try using the wrong formatting characters, see what happens…
  • Type Placeholder
    int %d
    long %ld
    float %f*
    double %lf*
    char %c
    string %s
    pointer %p
  • * %0.xf or %0.xlf will print x significant digits after the floating point
Back to top

Primitive Types

  • All C primitives are numeric. The only differences are floating point vs. integer and size of variable in memory.
  • Size can be platform dependent
  • sizeof(type) can be used to find the size in bytes (stdlib.h).
Type Size (bytes) Range
char 1 -128 –> 127
short 2 -32,768 —-> 32,767
int 4 -231 –> 231-1
long 8 -263 –> 263-1
float 4 7 digits of precision
double 8 14 digits of precision
  • char is an integer type, but can be used to refer to character literals as well.
    • char c = 97; and char c = 'a'; are both equally valid statements.
    • This also means you can perform arithmetic operations on chars natively.
  • Variables can be declared as unsigned. Unsigned variables do not use a bit to store the sign of the number, making the lower bound 0 and increasing the upper bound.
  • Note there is no boolean type. In c, any number is a boolean value:
    • 0 is false
    • All other numeric values are true
Back to top

Operator Precedence

  1. () [] -> .
  2. ! ~ ++ -- +(unary) -(unary) *(de-reference) &(address of) (type cast) sizeof
  3. * / %
  4. + -
  5. << >>
  6. < <= > >=
  7. == !=
  8. & (bitwise)
  9. ^
  10. |
  11. &&
  12. ||
  13. ?:
  14. = += -= *= /= %= ^= |= &= <<= >>=

Back to top

Variables

  • C is staically typed, meaning every variable must be given a type.
  • Variables must be declared before they are used.
    • You can assign a variable a value at declaration (i.e. int x = 10;)
  • THERE IS NO DEFAULT VALUE FOR VARIABLES
    • In Java, everything got initialized to 0, that’s a thing of the past.
    • Remember that declaring a variable means requesting a piece of memory to be used by your program of the corresponding variable size. (int means you are asking for 4 bytes of memory.)
    • If you do not initialize (provide a value for) a variable, its initial value will be whatever happens to be in the piece of memory that was assigned to your variable. Sometimes, that’s 0, sometimes it’s 2167354. Who knows?
    • This is cause for one of the most frustrating kind of programming errors in C. Normally, if you run the same program twice, you will get the same result. If you don’t initialize a variable, you could run the same program twice and get two different results, because you’re not guranteed that variable will have had the same value twice (common occurance: you run your program and get lucky, a variable is initialized to 0. Then I run your work and the program crashes or does not give the required result because the variable is initialized to some junk value).
  • Variables can be declared as unsigned (i.e. unsigned int u;).
    • unsigned variables have a lower bound of 0 and a higher upper bound than their signed counterparts.
    • unsigned char uc; declares a 1 byte integer type that can hold any number in the range [0, 255] otherwise known as [0, 2^8 - 1], as in there are 8 bits used for the number.
    • unsigned variables don’t need to set aside a bit for the sign of the value, hence the larger upper bound.
Back to top

Declaring and Defining Functions

  • Function and variable names are both examples or identifiers.
  • All identifiers must be declared before they can be used.
  • A function declaration provides its return type, name and parameters. This is also known as a function header.
  • double dylan(int jack);

Compiling and Linking

  • Compilers are more complex than straightforward source code –> executable code translators, and have multiple components.
  • To start, we’ll look at three major pieces of gcc, the preprocessor compiler and the linker.
  • Preprocessor
    • The preprocessor is, at it’s simplest interpretation, a text replacement system.
    • Modifies source code file with text, as opposed to binary data.
    • All preprocessor commands start with # (i.e. #include <stdio.h>)
    • Note that preprocessor directives do not end in ;
    • A few basic preprocessor directives
      • #include
        • Adds the entire text of the included file.
      • #define
        • Usage: #define TEXT REPLACEMENT
        • Will replace every instance of TEXT with the provided REPLACEMENT.
        • Examples:
          • #define PI 3.14159
          • #define MESSAGE "Hello!"
          • printf("%s, %f\n", MESSAGE, PI); would turn into printf("%s, %f\n", "Hello!", 31.14159);.
        • Note that define does not use =, becuase this is not an assignment.
        • You can also use #define to declare function-like macros.
          • #define MAX(a, b) a > b ? a : b
          • MAX(x, y) would turn into x > y ? x : y
      • #ifndef IDENTIFIER ... #endif
        • Conditional preprocessor statement.
        • If IDENTIFIER is not defined (for the preprocessor), then include all the lines of code up to the #endif.
        • If IDENTIFIER is defined, skip everything up to the #endif.
        • Example
          #ifndef PI
          #define PI 3.14159
          #endif
          
  • Compiler
    • Turns C source code into binary code.
    • The result is not an executable program.
    • Only one C file is compiled at a time.
    • The compiler checks called functions against their declared hearders, but if a function is defined in a separate file, its code is not added at this step.
    • $ gcc -c <FILE> will run the preprocessor and compile stages only, creating a non executable binary object file. The resulting file will have an extension of .o.
    • Since an executable is not created, you can successfully compile, via $ gcc -c, a C file that does not have a main function.
  • Linker
    • Combines compiled binary code from multiple files into a single executable program.
    • Will automatically look for standard library source code, or anything that can be included using <>.
    • If multiple definitions for any identifier is found, the linker will fail.
    • Must find one definition for main.
    • If you provide gcc multiple c files, it will compile each one individual and then run the linker on them together.
    • If you provide gcc any .o files, it will skip the compilation step for those files and then use them during linking.
    • You can mix and match .c and .o files for gcc, but it is not encouraged.
    • For example these would be good ways to compile a program, if you had files foo.c goo.c boo.c:
      • $ gcc foo.c goo.c boo.c
      • or:
         $ gcc -c foo.c
         $ gcc -c goo.c
         $ gcc -c boo.c
         $ gcc -o program foo.o goo.o boo.o
        
Back to top

Make

  • Command line tool to help automate building programs with multiple files and dependencies.
  • Only compiles files that have been modified, or that rely on modified files.
  • Compiling instructions and file dependencies are put into a makefile.
  • Running $ make, will look for a file called makefile (you can specify a different file with the -f flag).
  • The main parts of makesfiles are:
  • Targets: Things to make (usually executables or .o files)
  • Dependencies: Files or other targets needed to create a target.
  • Rules: How to create the target.
  • Make will always run the first target.
  • Make recursively goes through dependencies.
  • Make will check the modification timestamps for targets and dependencies and will only run the rules if the target is older than one or more of its dependencies.
  • Makefile Syntax:
       target: dependency0 dependency1 dependency2 ...
       TABrule
    
  • There should be a newline between the dependency list and the rules, and the TAB is necessary, there should not be any space between it and the rule.

  • Here is a makefile for a program made from three .c files: main.c, foo.c and goo.c.
  • main.c calls functions from foo.c
  • foo.c calls functions from goo.c
    all: main.o foo.o goo.o
        gcc -o program main.o foo.o goo.o
    
    main.o: main.c foo.h
        gcc -c main.c
    
    foo.o: foo.c foo.h goo.h
        gcc -c foo.c
    
    goo.o: goo.c goo.h
        gcc -c goo.c
    
  • This makefile creates the executable file program.
  • Since all is not a file and it is the first target, it will always run.
    • If instead, the first target was called program, then make would check the modification timestamp of that file.
  • main.o is the first dependency, so make will go to that target.
  • main.o depends on main.c and foo.h
  • The rest of the dependencies will go through in a similar way. Running $ make the first time would do the following:
    gcc -c main.c
    gcc -c foo.c
    gcc -c goo.c
    gcc -o porgram main.o foo.o goo.o
    
  • Notice the order of compilation and trace it through the makefile.
  • If goo.h is modified, the following would happen:
    gcc -c foo.c
    gcc -c goo.c
    gcc -o porgram main.o foo.o goo.o
    
Back to top

Memory

  • Bits & Bytes
    • All digital data is binary, broken up into series of 1s and 0s.
    • Physically, this data can take many forms, electronic (high voltage | low voltage), optical (light on | light off), magnetic (+ | - magnetic charge) and so on.
    • A single 1 or 0 is a bit, a unit of digital data.
    • 8 bits make a byte.
      • Why 8? - Some people thought that was a good number. It used to be 4.
    • If you have a music file that is 5 MB (megabytes) large, that means it takes 5 million bytes, or 40 million bits to be represented digitally. That’s 40 million individual 1s and 0s in whatever physical form it may be.
  • Computer Memory 101
    • In order to get all this pointer stuff down, we have to be cool with understanding computer memory. So let’s do that.
    • In general, memory is the term used to describe the computer part that contains any active data. Active data includes things like:
      • All open applications, even ones running in the background.
      • All open files.
      • Operating system.
      • Background processes.
    • Let’s say a computer has 4 gigabytes of memory, that means it can handle 4 billion bytes of data open at once.
    • The important distinction to be made is between memory and storage.
      • Storage refers to data stored on disk (Hard Drive, SSD, Flash Drive, Floppy Disk…).
      • Storage is where data gets saved for the long term. At any given time, most computers will have a lot more data in storage than in memory.
      • For example, you might have the entire Pearl Jam discography stored on your hard drive, but you can only listen to one song at a time, so the song you’re currenlty listening to would be the only data in memory out of all the other songs in storage.
    • Memory is much faster to access than storage, which is why it is used. THe downside is that memory is volitile, meaning it requires power to retain data (imagine losing everything when you shut your compuer off), and it’s much more expensive than storage.
    • The most common form of memory is RAM (Random Access Memory). The more RAM you have, the more data you can hape open at once.
      • Computers can use a concept called virtual memory, which will allocate unused disk space for memory purposes in the event that your RAM is full.
    • When you open a program/file (in the *nix world, everthing is a file), the file is copied from storage into memory. Saving a file reverses this process, taking the changes you’ve made from memory and copying them into storage.
  • Interacting with Memory in your Programs
    • Just like any other program, when a program you write is run, it takes up memory space.
    • Every variable and function you write gets turned into bits which takes up memory space when run.
    • In this class, when we talk about the memory usage of a program, we will mostly be talking about variables.
    • For example, when you declare an int, that means your program will request 4 bytes of memory to store a value. See variables for the chart of types and sizes.
    • Some more detail on endianness:
      • Forgetting about computer data for a moment, think aboput normal decimal numbers. In the number 2,354 we would say 2 is the most significant digit, because that 2 represents 2 thousand, the largest value of any digit in that number.
      • Generally, we write decimal numbers left–>right from most–>least significant.
      • Endianness is a similar concept, except instead of thinking of the significance of digits, we look at the significance of bytes.
        • Consider the value 261. In binary, that would be: 100000101, which is a 9 bit number.
        • To store 261 in an int, C will use 4 bytes, so it would really look more like this:
          • 00000000 00000000 00000001 00000101
        • Think about the significance of bytes in the same way you think about the significance of digits. In the above representation, the most significant byte comes first. Since we only need 9 bits (which is spread over 2 bytes) to represent 261, the first two bytes are all 0. The third byte, 00000001, represents the number 256.
        • Systems that use this representation are called big endian.
        • Other systems use the reverse order, going from least significant to most significant byte. These are called little endian.
        • 261 in little endian format would be:
          • 00000101 00000001 00000000 00000000
          • Notice that the indicidual bytes are in most->least significant bit order, but the order of the bytes is reversed.
        • Another example, 2,151,686,160
          • Big endian: 10000000 01000000 00100000 00010000
          • Little endian: 00010000 00100000 01000000 10000000

Back to top


Memory Addresses

  • In order to understand pointers, we need to take a closer look at what a variable is
  • There are three key features to any variable in your code:
    1. The identifier: Name you use in code to refer to the variable.
    2. The value: Data that you store.
    3. The address: The location of data in memory.
  • Let’s focus on the address (the following explanation is somewhat simplified, the nitty gritty details are not necessary to understand the concept)
  • Memory is addressed by starting at the first byte block (1), going up until the last accessible byte.
    • For example, if your computer has 4GB of RAM, then in theory memory addresses would go from 1 to ~32,000,000,000.
  • The amount of potential memory addresses is limited by the processor, since a processor must be able to read an entire memory address within a single cycle. For modern, 64-bit computers, this means you could theorhetically have 2^64 bytes of memory, though this number is practically limited by hardware.
  • The address space of a program is determined by the operating system (OS), when the program is run. Therefore it can be different each time.
  • You can get the address of any variable using the address of operator: &;
  • The %p placeholder character will print out a memory address in hexadecimal format: printf("x: %d, address of x: %p\n", x, &x);.
  • If you would prefer to see the address in decimal, you can use the placeholder for an unsigned long. (This will most likely result in a warning from gcc): printf("x: %d, address of x: %lu\n", x, &x);.

Back to top


More printf

  • printf provides different ways of printing out values.
  • Remember that no matter how you write a number in your code, it is stored in memory in binary, so as far as printf is concerned, even printing out a value in decimal (base 10), requires translating the stored data.
  • There is no such thing as a “native” hexidecimal or decimal number. You write them in your code in some way, and they turn into binary when the program is compiled.
  • So printf takes the binary data, and displays it in a particular way based on the formatting character(s) provided.
  • %d : print a value as a signed decimal int.
  • %u : print a value as a decimal unsigned int.
  • %o : print a value as an octal number.
  • %x : print a value as a hexidecimal number.
    • %o and %x will always treat the value as if it were unsigned.
    • You can print a value out with %u or %d regardless of how it is declared, that doesn’t mean it will make sense, just that printf will convert the value accordingly.
  • h : modify the printed value to look at 2 bytes instead of 4.
  • hh : modify the printed value to look at 1 byte.
  • h and hh can modify u, d, o or x.
  • The code snippet below displays these options:
    unsigned int q = 2151686160;
    printf("%%d: %d\n", q);
    printf("%%u: %u\n", q);
    printf("%%o: %o\n", q);
    printf("%%x: %x\n", q);
    printf("%%hhx: %hhx\n", q);
    printf("%%hhu: %hhu\n", q);
    
  • When run, this prints (on my computer):
    %d: -2143281136
    %u: 2151686160
    %o: 20020020020
    %x: 80402010
    %hhx: 10
    %hhu: 16
    
  • Things to notice:
    • %d is not the actual value we used, this has to do with how negative numbers are represented.
    • %hhx and %hhu print the first byte of the value. Based on that, you can tell the endianness of the system the program is run on (it is possible you get a different result from my example).

[Back to top]


Pointers

  • Regular variables are designed to store values.
  • Pointers are variables designed to store memory addresses.
  • Pointers are variables, meaning they are an identifier for a value stored in memory at a particular address (see above for detail), the only difference is that a pointer is designed to store an address.
  • Pointers must be able to store the value of any potential memory address. On 64 bit computers, this means pointers have to be able to represent 64 bits, or 8 bytes.
  • Pointers are designed for addresses, which means they are natively unsigned.
  • Even though all pointers are the same size, we declare them using the type of the value pointed to.
  • * is used to declare a pointer variable.
  • int x = 5; int *p = &x;
  • Here p is a pointer variable that stores the address of the variable x.
  • Notice that p is a normal variable, and has its own, different, memory address.
  • If you’re thinking, “hey, this looks familiar… like object variables in java”. You’re right! Object variables, or references, are java’s pointers. You just don’t have as much control over them as we do in C. In fact, think about the error you get when you try to use an uninitialized object variable in java… null pointer, meaning the reference stored is 0 (null), which is an invlaid memory address.
  • * is also used as the de-reference operator. This will return the value stored at the memory address pointed to by the pointer.
  • Given the definitions of x and p above:
  • int y = *p + 10; would set y to the value 15.
  • *p = y; would set the value at the memory address stored in p, to whatever the value stored in y is.
  • Consider the following C snippet:
      unsigned int i = 2151686160;
      int *ip = &i;
      char *cp = &i;
    
  • ip and cp will store the address of the first byte used to store i. Depending on the endianness of the system, that byte will either be 10000000 (big) or 00010000 (little).
  • Let’s just say that the first byte is located at memory address 3000 (using small number for ease of discussion)
  • If you perform ip++ and cp++, each pointer will be incremented by 1, but due to pointer arithmetic, ip will increase to 3004 and cp will increase to 3001. In essence, ip would move one int forward in memory, while cp only moves one byte forward.
Back to top

Endianness

  • Some more detail on endianness
    • Forgetting about computer data for a moment, think aboput normal decimal numbers. In the number 2,354 we would say 2 is the most significant digit, because that 2 represents 2 thousand, the largest value of any digit in that number.
    • Generally, we write decimal numbers left–>right from most–>least significant.
    • Endianness is a similar concept, except instead of thinking of the significance of digits, we look at the significance of bytes.
      • Consider the value 261. In binary, that would be: 100000101, which is a 9 bit number.
      • To store 261 in an int, C will use 4 bytes, so it would really look more like this:
        • 00000000 00000000 00000001 00000101
      • Think about the significance of bytes in the same way you think about the significance of digits. In the above representation, the most significant byte comes first. Since we only need 9 bits (which is spread over 2 bytes) to represent 261, the first two bytes are all 0. The third byte, 00000001, represents the number 256.
      • Systems that use this representation are called big endian.
      • Other systems use the reverse order, going from least significant to most significant byte. These are called little endian.
      • 261 in little endian format would be:
        • 00000101 00000001 00000000 00000000
        • Notice that the indicidual bytes are in most->least significant bit order, but the order of the bytes is reversed.
      • Another example, 2,151,686,160
        • Big endian: 10000000 01000000 00100000 00010000
        • Little endian: 00010000 00100000 01000000 10000000
Back to top

Arrays

  • An array is an allocated block of memory meant to hold multiple pieces of data of the same type.
  • C arrays do not have a length attribute/function.
  • We will use [] to access array elements.
  • The size of an array must be set at declaration and cannot be changed.
  • The size of an array cannot be dyanmic.
  • There is no boundry checking (much more on this later).
  • Array declaraion/access syntax:
       float ray[5];
       ray[2] = 8.22;
    
  • The above code requests a block of memory large enough for 5 floats (20 bytes), which then can be accessed using 0-based [] notation.

Array Variables

  • Array varibles (not the arrays themselves) are pointers to the allocated array block.
  • Unlike standard pointers, array variables are immutable, meaning that you can never change the memory address an array variable points to.
  • In the previous example, ray is a variable that points to the beginning of the 20 bytes allocated to that array of floats.
  • The sizeof function can be used to find the size of a given type (like float or char *), or _the amount of memory associated with a given variable.
    • sizeof(ray) would return 20.
    • sizeof(ray) / sizeof(float) would return 5. It is more standard in C not to use this, instead using other constants/variables to keep track of array sizes. Since array sizes must be set at compile time, you’re more likely to see something like this:
       int ARR_SIZE = 10;
       double trouble[ ARR_SIZE ];
      

Array Variables & Pointers

  • Since array variables are pointers, we can assign normal pointers to array variables.
      float ray[5];
      float *rp = ray;
    
  • ray, is immutable, so we could not do soemthing like ray++, but rp is a normal pointer, so we could do rp++. Due to pointer arithmetic, rp++ would actually add 4 to rp.
  • sizeof(ray) would return 20, while sizeof(rp) would return 8, since rp is a pointer and only holds an 8 byte (on most systems) memory address.
  • This is commonly done, and because of pointer arithmetic, you can iterate through an array by using a pointer and incrementing it.

Array Indexing and [] Notation

  • The following two pieces of code perform the same task.
  • ray[3] and *(rp + 3)
  • In the second example, we add 3 to rp, which is the same address as the location for ray[3].
  • The de-reference operator (*), is then used to retrieve the value.
  • You can think of the standard [] notation in terms of specifying an offset from the beginning memory address of an array.
  • Arrays are 0-indexed because the first element is stored at the starting address, so you need not add to get to the correct memory address.
  • The a[i] notation is actually shorthand for:
    • *(a + i),
  • This means that you can use [] with pointer variables as well.
    • rp[3] is valid code.
  • You can write ray[-1] or rp[-1], which would go to the value 4 bytes (one float size) before the beginning of your array.
  • If you use an index past the end of an array allocation, you will be attempting to access the memory addresses past the end of the array.
  • In either case, going past an array allocation on either end is not advised. Your code will compile, but when run, at best you’ll access other variables within the program, at worst, you’ll crash.
  • WARNING: HORRIBLE SYNTAX AHEAD
    • Once again, *(a + i) is the same as a[i]
    • + is a commutative operations, meaning a + i == i + a
    • *(a + i) == *(i + a)
    • *(i + a) == i[a]
    • So ray[2] can also be written as 2[ray]. Try it once, then NEVER DO IT AGAIN!
    • Use this knowledge wisely.

Back to top


C Functions

  • All functions in C are pass by value.
    • This means that the arguments are copied into new variables when the function is called.
    • As a result, normal values are not modified when passed into a funtion. Consider this function to swap two values:
      void swap(int a, int b) {
         int t = a;
         a = b;
         b = t;
      }
      
      //later on...
      int x = 10;
      int y = 5;
      swap(x, y);
      
    • In this example, when swap is run on x and y, a and b are created. The function swaps the values of a and b, but once the function finishes, it is popped off the call stack, and a and b are gone. x and y are left unchanged.
  • This is where pointers come in handy. If you pass a pointer to a memory address, then you can modify the value it points to. Look at this modified version of swap.
      void swap(int *a, int *b) {
         int t = *a;
         *a = *b;
         *b = t;
      }
    
      //later on...
      int x = 10;
      int y = 5;
      swap(&x, &y);
    
    • Now that swap takes pointers, we can de-reference the parameters to get at the values pointed to. When the function is colled, a and b become copies of the adrresses of x and y, so when swap finishes, the values will actaully be swapped.
  • Passing pointers as arguments is actually waht happens in java when object variables are used, you may recall the phrase pass by reference, all that means is pass by value, but the value being passed is a memory address.
  • When you pass an array as a argument to a function, the entire array is not copied. Since array variables are pointers, all arrays are treated as regular pointers when passed into a function. The following function headers are equivalent:
    • void arr_func( int arr[]);
    • void point_func( int *arr);
    • It is generally preferred to use the second option, since it makes clear that the parameter is a normal pointer. It is possible to use the first option and think that something special is going on due to the array notation (but nothing is).

Back to top


Strings in C

  • Strings are character arrays.
  • There is nothing special about the way character arrays work. Becuase strings are so useful, there are a few features of C that make working with them simpler.
  • By convention the last entry in a string character array is the NULL character (either 0, the number, or \0, the character). This is not something that is guranteed, if you want to create a string, you will need to make sure that there is a terminating NULL, if not, a number of string related functions will not work.
  • When you use "" to make a string literal:
    1. A character array large enough to store the string, including a terminating NULL, is created in memory.
    2. The characters of the string are stored in that array, and a terminating NULL is added.
  • String literals are immutable.
  • If a string literal is exactly repeated in code, a new character array is not created, instead, the orginal array is used. This means all references to the same immutable string literal refer to the same piece of memory.

  • Declaring Strings
    • There are 4 ways to declare strings in C (in eaxch example, the numbers and strings used are randomly chosen, none have special meaning in C).
    • char s[256];
      • Declares a mutable array of 256 bytes.
      • No speciic characters are saved to memory.
      • No guarantee of a NULL character at any position.
    • char s[256] = "Imagine"
      • Creates the immutable string literal "Imagine".
      • Declares a mutable array of 256 bytes.
      • Copies the string "Imagine", including a terminating NULL, into the first 8 bytes of the array s.
    • char s[] = "Tuesday";
      • Creates the immutable string literal "Tuesday".
      • Creates an 8 bytes array, large enough for "Tuesday" and a terminating NULL for the variable s.
      • Copies the string "Tuesday", including a terminating NULL, into the array s.
    • char *s = "Mankind";
      • Creates the immutable stirng literal "Mankind".
      • s becomes a pointer to that immutable string.
    • It is important to note that in the last example, an array is not created. In that case s is just a pointer to the memory location with the immutable string lives. If you want a mutable string, you cannot declare it this way.
  • Working With String Variables
    • Everything we’ve covered about pointers and arrays still holds true, string variables are pointers, either array pointers or normal pointers.
    • It is important to keep track of variables vs. values.
      • char s[10] = "Yankees";
        • In this example, s is an array variable that points to the 10 byte array allocation. s.
        • s is immutable, it cannot point to any other memory location.
          • s = "Mets"; is an error.
        • The values in the array s points to are not immutable. You can change the value of the string at any point.
          • s[0] = 'M'; is perfectly good.
      • char *s = "AL East Champions";
        • Here, s is a pointer.
        • As a pointer, you can change the value s points to.
          • s = "The Best"; is valid.
        • Since s points to an immutable string literal, you cannot change the value of the string.
          • s[0] = 'N'; is an error.
Back to top

struct

  • A struct is a custom data type that is a collection of values.
  • The following line creates a variable, s, who’s type is an anonymous struct:
  • struct { int a; char x; } s;
  • struct { int a; char x; } is the full type of s, it is syntactically identical to int or float
  • We use the . operator to access a value inside a struct
  • s.a = 10;
  • s.x = ‘@‘;
  • Here is an example of creating and using a struct:

     int main() {
         struct {int a; char x;} s0;
    
         s0.a = 51;
         s0.x = '%';
    
         printf("s0: %d\t%c\n", s0.a, s0.x);
    
         return 0;
      }
    
  • It is preferable to prototype your structs, which will make it easier to create and work with multiple variables of the same struct type.
    • struct foo { int a; char x; };
    • Note that since we are not creating a variable, there is no name between the } and the ; at the end.
  • After creating a prototye for a struct, you can declare new variables of that type like so:
    • struct foo s;
    • You still must include the word struct.
  • It is typically better practice to prototype structs outside of any particular function.
    • Struct prototypes are most commonly found in .h files.
  • Here is an example of creating and using a struct with a prototype:
     struct foo {int a; char x;};
    
     int main() {
    
         struct foo s0;
         struct foo s1;
    
         s0.a = 51;
         s0.x = '%';
    
         s1 = s0;
         printf("s0: %d\t%c\n", s0.a, s0.x);
         printf("s1: %d\t%c\n", s1.a, s1.x);
    
         return 0;
      }
    
  • Pointers and Structs
    • You can make pointers to structs like pointers to primitaves.
    • struct foo *p = &s;
    • One very important note, . takes precedence over *.
    • This means that *p.x is the same as *(p.x) which is almost certainly NOT what you want. (This will look for x inside p and de-reference that result).
    • To access a value in a struct via a pointer you need to do: (*p).x, that is, de-reference first, then get x.
    • In C, p->x is syntactic shorthand for (*p).x
Back to top

Stack and Heap Memory

  • Every program can have its own stack and heap.
  • Stack memory
    • Stores all normally declared variables (including pointers and structs), arrays and function calls.
    • Functions are pushed onto the stack in the order they are called, and popped off when completed.
    • When a function is popped off the stack, the stack memory associated with it is released.
  • Heap memory
    • Stores dynamically allocated memory.
    • Data will remain in the heap until it is manually released. (or the program terminates)
  • Dynamic memory allocation
    • malloc(size_t x)
      • Allocates x bytes of heap memory.
      • Returns the address at the beginning of the allocation
      • Returns a void *

          int *p;
          p = malloc( 5 * sizeof(int) );
        
    • free(void * p)
      • Releases dynamically allocated memory.
      • Has one parameter, a pointer to the beginning of a dynamically allocated block of memory.
      • Every call to malloc/calloc should have a corresponding call to free.
    • calloc(size_t n, size_t x)
      • Allocates n * x bytes of memory, ensuring every bit is 0.
      • Works like malloc in all other ways

          int *p;
          p = calloc( 5, sizeof(int) );
        
    • realloc(void *p, size_t x)
      • Changes the amount of memory allocated for a block to x bytes.
      • p must point to the beginning of a block.
      • Returns a pointer to the beginning of the block (this is not always the same as p)
      • If x is smaller than the original size of the allocation, the extra bytes will be released.
      • If x is larger than the original size then either:
        • If there is enough space at the end of the original allocation, the original allocation will be updated.
        • If there is not enough space, a new allocation will be created, containing all the original values. The original allocation will be freed.
Back to top

File functions

  • open - <fcntl.h>
    • Add a file to the file table and returns its file descriptor.
    • This will make the file accessible within a program via the returned file descriptor.
    • If open fails, -1 is returned, extra error information can be found in errno.
      • errno is an int variable that can be found in <errno.h>
      • Use strerror (in string.h) on errno to return a string description of the error
    • open( path, flags, mode )
      • mode
        • Only used when creating a file. Set the new file’s permissions using a 3 digit octal #
      • flags
        • Determine what you plan to do with the file, use the following constants and combine with |:
        • O_RDONLY
        • O_WRONLY
        • O_RDWR
        • O_APPEND
        • O_TRUNC
        • O_CREAT
        • O_EXCL: when combined with O_CREAT, will return an error if the file exists
      • examples:
        • open(foo.txt, O_RDONLY, 0)
        • open(goo.txt, O_WRONLY | O_APPEND | O_CREAT, 0644)
  • read - <unistd.h>
    • Read data from a file
    • read( fd, buff, n )
      • Read n bytes from fd’s file into buff
      • Returns the number of bytes actually read. Returns -1 and sets errno if unsuccessful.
      • buff must be a memory address (pointer or array), but can be to any type of data.
  • write - <unistd.h>
    • Write data to a file
    • write( fd, buff, n )
      • Write n bytes to the fd’s file from buff
      • Returns the number of bytes actually written. Returns -1 and sets errno if unsuccessful.
      • buff must be a memory address (pointer or array), but can be to any type of data.
      • lseek - <unistd.h>
        • Set the current position in an open file
        • lseek( file_descriptor, offset, whence )
          • offset
            • Number of bytes to move the position by, Can be negative.
          • whence
            • Where to measure the offset from
            • SEEK_SET: offset is evaluated from the beginning of the file
            • SEEK_CUR: offset is relative to the current position in the file
            • SEEK_END: offset is evaluated from the end of the file
        • Returns the number of bytes the current position is from the beginning of the file, or -1 (errno)
  • stat - <sys/stat.h>
    • Get information about a file (metadata)
    • stat( path, stat_buffer )
      • stat_buffer
        • Must be a pointer to a struct stat
        • All the file information gets put into the stat buffer.
        • Some of the fields in struct stat:
          • st_size: file size in bytes
          • st_uid, st_gid: user id, group id
          • st_mode: file permissions
          • st_atime, st_mtime: last access, last modification
            • These are time_t variables. We can use functions in time.h to make sense of them
            • ctime( time )
              • Returns the time as a string
              • time is type time_t *
  • opendir - <dirent.h>
    • Open a directory file
    • This will not change the current working directory (cwd), it only allows your program to read the contents of the directory file
    • opendir( path )
      • Returns a pointer to a directory stream (DIR *)
    • closedir - <dirent.h>
      • Closes the directory stream and frees the pointer associated with it.
      • closedir( dir_stream )
    • readdir - <dirent.h>
      • readdir( dir_stream )
      • Returns a pointer to the next entry in a directory stream, or NULL if all entries have already been returned.
    • struct dirent - <sys/types.h>
      • Directory struct that contains the information stored in a directory file. Some of the data members
      • d_name: Name of a file
      • d_type: File type as an int
      • Example usage:
          DIR * d;
          d = opendir( "somedir" );
          struct dirent *entry;
          entry = readdir( d );
          closedir(d);
        
    • rewinddir - <dirent.h>
      • rewinddir(d)
        • d must be a DIR * returned from opendir
        • Resets the directory stream of d to the beginning.
Back to top

Input

  • Command Line Arguments:
    • int main( int argc, char *argv[] )
      • Program name is considered the first command line argument
      • argc
        • number of command line arguments
      • argv
        • array of command line arguments as strings
  • stdin input
    • fgets - <stdio.h>
      • Read in data from a file stream and store it in a string.
      • fgets( char * s, int n, FILE * f );
        • Reads at most n-1 characters from file stream f and stores it in s, appends NULL to the end.
        • Stops at newline, end of file, or the byte limit.
        • File steam
          • FILE * type, more complex than a file descriptor, allows for buffered input.
          • stdin is a FILE * variable
    • fgets(s, 100, stdin)
  • Pulling data from strings
    • sscanf - <stdio.h>
      • Reads in data from a string using a format string to determine types.
      • sscanf( char *s, char * format, void * var0, void * var1, ... )
        • Copies the data into each variable.
        • example
          int x; float f; double d;
          sscanf(s, ”%d %f %lf", &x, &f, &d);
          
Back to top

Signals

  • All these functiosn can be found in <signal.h>

  • kill(pid, signal)
    • Returns 0 on success or -1 (errno) on failure.
    • Works like the command line kill program
  • sighandler
    • To intercept signals in a c program you must create a signal handling function.
    • Some signals (like SIGKILL, SIGSTOP) cannot be caught.
    • static void sighandler( int signo )
      • Must be static, must be void, must take a single int parameter.
      • static
        • Static values in c exist outside the normal call stack, they can be accessed regardless of the function at the top.
        • For variables, this also means they retain their value even if the function they are declared in has ended.
        • Static values (variables and functions) can only be accessed from within the file they are declared.
  • signal
    • Attach a signal to a signal handling function
    • signal( SIGNUMBER, sighandler)
    • Note that you are passing the name of the signal handling function as a parameter.
  • singal/sighandler example:
    static void sighandler(int signo) {
    if ( signo == SIGUSR1 )
    printf("Who you talkin to?\n”);
    }
    …
    signal(SIGUSR1, sighandler);
    
Back to top

Exec

  • <unistd.h>

  • A group of c functions that can be used to run other programs.
  • Replaces the current process with the new program.
  • execl
    • execl(path, command, arg0, arg1 … NULL)
    • path
      • The path to the program (ex: "/bin/ls" )
    • command
      • The name of the program (ex: "ls")
    • arg0
      • Each command line argument you wish to give the program. (ex "-a", “-l")
      • The last argument must be NULL
  • execlp
    • execlp(path, command, arg0, arg1 … NULL)
    • Works like execl, except it uses the $PATH environment variable for commands.
    • For example, you can use “ls” as the path instead of “/bin/ls"
    • To check the $PATH environment variable, use: $ echo $PATH
  • execvp
    • execvp(path, argument_array)
    • argument_array
      • Array of strings containing the arguments to the command.
      • argument_array[0] must be the name of the program.
      • The last entry must be NULL
      • Like execlp, the path argument will use the $PATH environment variable.

String Parsing for execvp - strsep - <string.h> - Parse a string with a common delimiter - strsep( source, delimiters ) - Locates the first occurrence of any of the specified delimiters in a string and replaces it with NULL - delimiters is a string, each character is interpreted as a distinct delimiter. - Returns the beginning of the original string, sets source to the string starting at 1 index past the location of the new NULL - Since source’s value is changed, it must be a pointer to a string (char **). - example char line[100] = "woah-this-is-cool"; char *curr = line; char * token; token = strsep( &curr, "-" ); - replaces the - after woah with NULL - returns a pointer to the w in “woah" - sets curr to point to the t in "this-is-cool"

Back to top

Managing Sub-Processes

  • fork() - <unistd.h>

    • Creates a separate process based on the current one, the new process is called a child, the original is the parent.
    • The child process is a duplicate of the parent process.
    • All parts of the parent process are copied, including stack and heap memory, and the file table.
    • Returns 0 to the child and the child’s pid, or -1 (errno), to the parent.
    • If a parent process ends before the child, the child’s new parent pid is 1
  • wait - <sys/wait.h>
    • Stops a parent process from running until any child has exited.
    • Returns the pid of the child that exited, or -1 (errno), and gathers information about the child process (this is called reaping)
    • If multiple child processes exit, an arbitrary one will be reaped.
    • wait(status)
      • status is used to store information about how the process exited.
      • Status macros
        • Usage: MACRO( status )
        • WIFEEXITED: True if child exited normally
        • WEXITSTATUS: The return value of the child
        • WIFSIGNALED: True if child exited due to a signal
        • WTERMSIG: The signal number intercepted by the child
  • waitpid - <sys/wait.h>
    • Wait for a specific child
    • waitpid(pid, status, options)
      • pid
        • The pid of the specific child to look for
        • If -1, will wait for any child (normal wait)
    • options
      • Can set other behavior for waitpid, if 0, will work normally.
Back to top

Redirection

  • Changing the usual input/output behavior of a program

  • Command line redirection
    • >
      • Redirects stdout to a file.
      • Overwrites the contents of the file.
    • >>
      • Redirects stdout to a file by appending.
    • <
      • Redirect stdin from a file.
      • The file is treated exactly like stdin, for example scanf() will read up until a newline is found.
    • | (pipe)
      • Redirect stdout from one program to stdin of the next.
      • Very useful for chaining programs together.
  • Redirection in c programs
    • dup2 - <unistd.h>
      • dup2( fd1, fd2 )
      • Redirects fd2 to fd1
      • Any use of fd2 will now act on the file for fd1.
    • dup - <unistd.h>
      • Duplicates an existing entry in the file table.
      • Returns a new file descriptor for the duplicate entry.
      • dup( fd )
    • USING dup and dup2 together:

      fd1 = open(“foo”, O_WRONLY);
      backup_sdout = dup( STDOUT_FILENO ) // save stdout for later
      dup2(fd1, STDOUT_FILENO) //sets STDOUT_FILENO's entry to the file for fd1.
      dup2(backup_stdout, STDOUT_FILENO) //sets STDOUT_FILENO’s entry to backup_stdout, which is stdout
      
Back to top

System V IPC

  • There are many kinds of Inter-Process Communication, but there is a specific set of IPC features that have competing standardized implementations, these are:
    • Shared Memory
    • Semaphores
    • Message Queues
  • For these features, there are 2 standards:
    • Portable Operating System Interface (POSIX)
    • System V
    • Both of these cover more than IPC, and their functionality, thought not their implementations, are mostly the same. We will be using System V IPC.
  • $ ipcs is a useful command line utility to see active System V IPC structures.

Shared Memory

  • A segment of heap-like memory that can be accessed by multiple processes.
  • Shared memory is accessed via a key that is known by any process that needs to access it.
  • Shared memory does not get released when a program exits.
  • 5 Shared memory operations
    • Create the segment (happens once) - shmget
    • Access the segment (happens once per process) - shmget
    • Attach the segment to a variable (once per process) - shmat
    • Detach the segment from a variable (once per process) - shmdt
    • Remove the segment (happens once) - shmctl
  • Using shared memory in C: Headers: <sys/shm.h> <sys/ipc.h> <sys/types.h>

    int *data;
    int shmd;
    shmd = shmget(KEY, sizeof(int), IPC_CREAT | 0640); //create and access
    printf("shmd: %d\n", shmd);
    data = shmat(shmd, 0, 0); //attach
    printf("data: %p\n", data);
    printf("*data: %d\n", *data);
    *data = * data + 10; //work with the segment as a normal pointer
    printf("*data: %d\n", *data);
    shmdt(data); //detach
    shmctl(shmid, IPC_REMOVE, 0); //remove the segment
    

Semaphores

  • IPC construct used to control access to a shared resource (like a file or shared memory).
  • Most commonly, a semaphore is used as a counter representing how many processes can access a resource at a given time.
    • If a semaphore has a value of 3, then it can have 3 active “users”.
    • If a semaphore has a value of 0, then it is unavailable.
  • Some semaphore operations are atomic, meaning they will not be split up into multiple processor instructions.
  • Semaphore operations
    • “Maintenance operations”
      • Create a semaphore
      • Set an initial value
      • Remove a semaphore
    • Traditional Semaphore Usage
      • Up(S) | V(S) - atomic
        • Release the semaphore to signal you are done with its associated resource
        • Pseudocode
          • S++
      • Down(S) | P(S) - atomic
        • Attempt to take the semaphore.
        • If the semaphore is 0, wait for it to be available.
        • Pseudocode
          • while (S == 0) { block } S--;
        • Important distinction that the blocking is not atmoic, so if a processess checks the semaphore and it is unavailable, other processes can run, but if the semaphore is available, the process will immediately modify the semaphore.
  • Using semaphores in C
    • headers: <sys/types.h> <sys/ipc.h> <sys/sem.h>
    • semget
      • Create/Get access to a semaphore.
      • Returns a semaphore descriptor or -1 (errno)
      • semget( key, amount, flags )
        • key
          • Unique semaphore identifier
        • amount
          • Semaphores are stored as sets of one or more. The number of semaphores to create/get in the set.
        • flags
          • IPC_CREAT: create the semaphore and set value to 0
          • IPC_EXCL: Fail if the semaphore already exists and IPC_CREAT is on.
          • Includes permissions for the semaphore, combine with bitwise or (|).
    • semctl
      • Control the semaphore, including
        • Set the semaphore value
        • Remove the semaphore
        • Get the current value
        • Get/set semaphore metadata
      • semctl(descriptor, index, operation, data)
        • descriptor
          • The return value of semget
        • index
          • The index of the semaphore you want to control in the semaphore set.
        • operation
          • IPC_RMID: remove the semaphore
          • SETVAL: Set the value (requires data)
          • GETVAL: Return the value
      • data
        • Variable for setting/storing semaphore metadata
        • Type is union semun
        • You have to declare this union in your main c file on linux machines.
          • union semun {
              int val;                  //used for SETVAL
              struct semid_ds *buf;     //used for IPC_STAT and IPC_SET
              unsigned short  *array;   //used for SETALL
              struct seminfo  *__buf;
            };
            
        • union?
          • A c structure designed to hold only one value at a time from a group of potential values.
          • Just large enough to hold the largest piece of data it could potentially contain
    • semop
      • Perform an atomic semaphore operation
      • You can Up/Down a semaphore by any integer value, not just 1
      • semop( descriptor, operation, amount )
        • amount
          • The amount of operations you want to perform on the semaphore set.
        • operation
          • A pointer to a struct sembuf
            • struct sembuf {
                short sem_op;
                short sem_num;
                short sem_flag;
              };
              
            • sem_num
              • The index of the semaphore you want to work on.
            • sem_op
              • Down(S): Any negative number
              • Up(S): Any positive number
              • 0: Block until the semaphore reaches 0
            • sem_flag
              • SEM_UNDO: Allow the OS to undo the given operation. Useful in the event that a program exits before it could release a semaphore.
              • IPC_NOWAIT: Instead of waiting for the semaphore to be available, return an err
    • Putting it all together.
      • Creating a single semaphore and initializing its value to 1:
        int semd = semget(KEY, 1, IPC_CREAT | IPC_EXCL | 0644);
        union semun us;
        us.val = 1;
        r = semctl(semd, 0, SETVAL, us);
        
      • Removing a semaphore: semctl(semd, IPC_RMID, 0);
      • Getting a a semaphore & upping and downing it.
        semd = semget(KEY, 1, 0); //get access
        struct sembuf sb;
        sb.sem_num = 0;
        sb.sem_flg = SEM_UNDO;
        sb.sem_op = -1; //setting the operation to down
        
        semop(semd, &sb, 1); //perform the operation
        printf("got the semaphore!\n");
        sleep(10); //simulate doing something.
        
        sb.sem_op = 1; //set the operation to up
        semop(semd, &sb, 1); //perform the operation
        
Back to top

Pipes

  • A pipe is a conduit in memory between 2 separate processes on the same computer.

  • Pipes have 2 ends, a read end and a write end. Pipes act just like files (i.e. you can use read() and write() to send any kind of data).
  • Unnamed pipes have no external identifier (more on named pipes later).

  • Working with unnamed pipes: pipe - <unistd.h>
    • pipe( descriptors )
    • Create an unnamed pipe.
    • Open the unnamed pipe in the calling program twice, once for reading, and once for writing.
    • Returns 0 if the pipe was created, -1 if not.
    • descriptors
      • Array that will contain the descriptors for each end of the pipe. Must be an int array of size 2.
      • descriptors[0] is the read end.
      • descriptors[1] is the write end.
    • Example:

      //it is usefull to add these definitions to make your code more readible
      #define READ 0
      #define WRITE 1
      
      int main() {
        int fds[2];
        pipe( fds );
        char line[100];
      
        f = fork();
        if (f == 0) {
          close( fds[READ] ); //it is a good idea to close the end of the pipe your are not using.
          write( fds[WRITE], "hello!", 7);
        }
        else {
          close( fds[WRITE] );
          read( fds[READ], line, sizeof(line) );
        }
      }
      

      Named Pipes

  • Also known as FIFOs.
  • Same as unnamed pipes except FIFOs have a name that can be used to identify them via different programs.
  • Like unnamed pipes, FIFOS are unidirectional.
  • mkfifo
    • Shell command to make a FIFO
    • $ mkfifo name
  • mkfifo - <sys/types.h> <sys/stat.h>
    • mkfifo( name, permissions )
    • c function to create a FIFO
    • Returns 0 on success and -1 on failure
    • Once created, the FIFO acts like a regular file, and we can use open, read, write, and close on it.
    • FIFOs will block on open until both ends of the pipe have a connection.
Back to top

Pipe Networking

  • For the purposes of these notes, the words server and client will be used only to differentiate two programs with respect to their roles in a connection attempt, and not with respect to their usage once a connection is made.
    • Server will be the program that starts up and awaits an incoming connection.
    • Client will be the program that initiates a connection to a waiting server.
  • Handshake
    • A procedure to ensure that a connection has been established between 2 programs.
    • Both ends of the connection must verify that they can send and receive data to and from each other.
    • 3 way handshake
      • Client sends a message to the server (in TCP networking, this is called SYN). At this point, the server knows it can receive data.
      • Server sends a response to the client based on the client’s initial message (SYN_ACK). At this point the client knows it can recieve and send data.
      • Client sends a response back to the server based on the server’s response (ACK). At this point, the server knows it can receive and send data.
  • 3 Way Handshake Implementation:
    • Setup
    • Server creates a FIFO (Well Known Pipe) and waits for a connection.
    • Client creates a “private” FIFO.
      • To use pid: sprintf(buffer, "%d", getpid() );
    • Handshake
      • Client connects to server and sends the private FIFO name. Client waits for a response from the server.
      • Server receives client’s message and removes the WKP.
      • Server connects to client FIFO, sending an initial acknowledgement message.
      • Client receives server’s message, removes its private FIFO.
      • Client sends response to server.
  • Operation
    • Server and client send information back and forth.
  • Reset
    • Client exits, server closes any connections to the client.
    • Server recreates the WKP waits for another client.

Server-Client Designs

  • There are a number of different ways to implement server/client systems.
  • Single Use Server
    • In this system, the server exits along with the client.
      1. Handshake
      2. Client: sends data to server.
      3. Server: gets response, processes data, resonds.
      4. Client: deals with response.
    • When the client exits, so does the server.
  • Persistent Single Client Server
    • In this system, the server will communicate with a single client. When the client exits, the server will reset to handle a new client.
      1. Handshake
      2. Client: sends data to server.
      3. Server: gets response, processes data, resonds.
      4. Client: deals with response.
    • When the client exits, the server goes back to setp 1, awaiting a new client.
    • The only way to quit the server is via ctl-c. This is fine, but it will leave the WKP on the filesystem. A cleaner exit would involve creating a signal handler that catches SIGINT and remove the WKP before exiting.
  • Forking Server
    • In this system, the server will create subservers for client communication. This will allow for multiple simultanious connections.
    • The main server’s job is to wait for a connection and create subservers.
    • The subserver will handle all communication with the client.
    • The client can work exactly the same as the persistent server client.
      1. Server: creates WKP and blocks until connection.
      2. Client: creates PP, connects to WKP, sends PP name, blocks on connection to PP.
      3. Server: gets connection on WKP, creates subserver.
      4. Server: Closes & removes WKP.
      5. Server: Resets back to step 1.
      6. Subserver: reads PP name from client.
      7. Subserver: sends secret message to client.
      8. Client: gets subserver secret, removes PP, responds, accordingly. 9: Subserver: verfies response from client, completing 3-way handshake.
Back to top

Sockets

  • A connection between 2 programs using network protocols.
    • This is usually between 2 computers, but does not have to be.
  • A socket corresponds to an IP (internet protocol) Address / Port pair.
  • To use a socket:
    1. create the socket: socket
    2. bind it to an address and port: bind
    3. listen & accept/initiate a connection: listen accept, connect
    4. send/receive data
      • Functions vary depeding on type of socket

Socket Protocols

  • Stream Sockets (TCP)
    • Reliable 2 way communication.
    • Must be connected on both ends.
      • 3 way handshake
    • Data is received in the order it is sent.
  • Datagram Sockets (UDP)
    • “Connectionless”: an established connection is not required.
    • Data sent may be received out of order (or not at all).
    • Cannot use the usual read and write function calls.

Sockets in C

  • Most functions and strcutres in sys/socket.h

  • socket( domain, type, protocol )
    • Creates a socket, opens it like a file, returning a socket descriptor (int that works like a file descriptor)
    • domain: type of address
      • AF_INET or AF_INET6 or AF_UNSPEC
    • type
      • SOCK_STREAM or SOCK_DGRAM
    • protocol
      • Combination of domain and type settings
      • If set to 0 the OS will set to correct protocol (TCP or UDP)
    • example: int sd = socket(AF_INET, SOCK_STREAM, 0);
  • getaddrinfo <sys/types.h> <sys/socket.h> <netdb.h>

    • System library calls use a struct addrinfo to represent network addresses (containing information like IP address, port, protocol…)

    • Will lookup information about the desired network address and get one or more matching struct addrinfo entries as a linked list.

    • getaddrinfo(node, service, hints, results)

    • node: String containing an IP address or hostname to lookup
      • If NULL, use the local machine’s IP addresses (all of them).
    • service: String with a port number or service name (if the service is in /etc/services)
    • hints: Pointer to a struct addrinfo used to provide settings for the lookup.
      • Think of this as a filter for the lookup that getaddrinfo performs. For example, if you only want to get an IPv4 address, you can set hints to be an IPv4 address.
    • results: Pointer to a linked list of struct addrinfo containing entries for each matching address.
    • getaddrinfo will allocate memory for these structs. Since results will be a linked list of unknown size, you should use freeaddrinfo to release the entire linked list when you are done.
  • bind( socket descriptor, address, address_length) (server only)

    • Binds the socket to an address and port. This allows a server has to have a set public* adrress and port.

    • Returns 0 (success) or -1 (failure)

    • socket descriptor: return value of socket

    • address: pointer to a struct sockaddr representing the address.

    • address_length: Size of the address, in bytes

    • address and address_length can be retrieved from getaddrinfo.

Using getaddrinfo, socket & bind on a server

//use getaddrinfo
struct addrinfo * hints, * results;
hints = calloc(1,sizeof(struct addrinfo));
hints->ai_family = AF_INET;
hints->ai_socktype = SOCK_STREAM; //TCP socket
hints->ai_flags = AI_PASSIVE; //only needed on server
getaddrinfo(NULL, 9845, hints, &results);  //Server sets node to NULL

//create socket
int sd = socket(results->ai_family, results->ai_socktype, results->ai_protocol);

bind(sd, results->ai_addr, results->ai_addrlen);

//DO STUFF

free(hints)
freeaddrinfo(results);
  • listen (socket_descriptor, backlog) (server only)
    • Set a socket to passively await a connection.
    • Needed for stream sockets.
    • Does not block.
    • socket descriptor: return value of socket
    • backlog: Number of connections that can be queued up.
      • Depending on the protocol, this may not do much.
  • accept (server only)
    • Accept the next client in the queue of a socket in the listen state.
    • Used for stream sockets.
    • Performs the server side of the 3 way handshake
    • Creates a new socket for communicating with the client, the listening socket is not modified.
    • Returns a descriptor to the new socket
    • Blocks until a connection attempt is made
    • accept(socket_descriptor, address, address_length)
    • socket descriptor: descriptor for the listening socket address: Pointer to a struct sockaddr_storage that will contain information about the new socket after accept succeeds.
    • address length: Pointer to a variable that will contain the size of the new socket address after accept succeeds.

Using listen and accept for servers

//use getaddrinfo (not shown)
//create socket
int sd = socket(results->ai_family, results->ai_socktype, results->ai_protocol);
//use bind
bind(sd, results->ai_addr, results->ai_addrlen);
listen(sd, 10);
int client_socket;
socklen_t sock_size;
struct sockaddr_storage client_address;
sock_size = sizeof(client_address);
client_socket = accept(sd,(struct sockaddr *)&client_address, &sock_size);

Using connect for clients

  • connect (client only) <sys/socket.h> <sys/types.h>
    • Connect to a socket currently in the listening state.
    • Used for stream sockets.
    • Performs the client side of the 3 way handshake
    • Binds the socket to an address and port
    • Blocks until a connection is made (or fails)
    • connect(socket descriptor, address, address length)
    • socket descriptor: descriptor for the socket
    • address: Pointer to a struct sockaddr representing the address.
    • address length: Size of the address, in bytes
    • address and address length can be retrieved from getaddrinfo()
    • Note that the arguments mirror those of bind()
//use getaddrinfo (not shown)
//create socket
int sd = socket(results->ai_family, results->ai_socktype, results->ai_protocol);

connect(sd, results->ai_addr, results->ai_addrlen);
Back to top

Select

  • select is a function that monitors multiple file descriptors, allowing a program to read from different sources.

  • select can also be used for writing to files, but we will not focus on that usage.

  • select will block on all the provided vile descriptors, and return when any of them have data to be read.

  • select returns the number of file descriptors that have data to read, and then modifies a parameter to set which descriptors are available. If an error occured, return -1.

  • To use select, you create a set of potential file descriptors using the type fd_set. Most of what we want from select involves interacting with an fd_set variable by using various macros.

  • To use select we must:
    • Create an fd_set variable.
      • fd_set desciptors;
    • Clear the fd_set (FD_ZERO)
      • FD_ZERO( &desciptors );
    • Add file descriptors to the fd_set (FD_SET).
      • FD_SET(listen_socket, &descriptors);
    • Call select and wait until any of the provided descriptors are available.
      • select(max_descriptor+1, &desciptors, NULL, NULL, NULL);
      • The first argument is 1 more than the largest file desciptor in desciptors. This is an artifact of how select works.
      • The second argument is the fd_set of descriptors waiting to be read from.
      • The third and fourth arguments are for descriptors for other actions, like writing.
      • The final argument allows you to set a timeout, if NULL, select will block indefinitely.
      • Returns the number of file descriptors that are ready (this is usually 1).
      • Modifies desciptors to contain the desciptors that are available.
    • Once select returns, loop through the potential file desciptors and determine which one is available (FD_ISSET).
      • FD_ISSET(listen_socket, &desciptors);
    • If you are going to use select multiple times, you’ll have to repeatedly zero out the fd_set and add the descriptors to it, since select modifies it.
    • It is usually a good idea to have a backup fd_set to keep all your descriptors in.
  • Putting it all together for a program that reads from a socket or stdin:
    fd_set read_fds;
    int listen_socket, client_socket;
    char buffer[100];
    
    FD_ZERO(&read_fds);
    //assume this functuion correcly sets up a listening socket
    listen_socket = server_setup();
    
    //add listen_socket and stdin to the set
    FD_SET(listen_socket, &read_fds);
    //add stdin's file desciptor
    FD_SET(STDIN_FILENO, &read_fds);
    
    int i = select(listen_socket+1, &read_fds, NULL, NULL, NULL);
    
    //if standard in, use fgets
    if (FD_ISSET(STDIN_FILENO, &read_fds)) {
      fgets(buffer, sizeof(buffer), stdin);
    }
    //if socket, accept the connection
    //assume this function works correctly
    if (FD_ISSET(listen_socket, &read_fds)) {
      client_socket = server_connect(listen_socket);
    }
    
Back to top