Ethernet and Coffee

A Module free way to process delimited data with row independence in perl

Example CSV input content (myfile.csv):

last,first,phone,zipcode,country
jones,jim,314-555-1212,63033,usa
smith,john,314-555-1001,63146,usa
doe,jane,314-555-0019,63141,usa
smith,jim,314-555-1210,65401,usa

The script below reads, processes, and hashes the csv input data

#!/usr/bin/perl -w
#
# simplecsv.pl - a quick and module free way to deal with csv data
#

use strict;

# declare vars
my ($country,@lines,@fields,%csvhash,$fullname,$zipcode,$key,@keys,$i,
    %allzips,%firstnames);

# read the input file into an array
open INPUT,"<myfile.csv";
@lines = <INPUT>;
close INPUT;


# take the first line and create a list of headers out of it
chomp $lines[0];
@fields = split( /,/, $lines[0] );

print "top row fields from csv file are: @fields\n\n";

# delete that first line now
shift @lines;

# now process each remaining line
foreach my $line ( @lines ) {
  chomp $line;
  # Skipping if the line is empty or a comment
  next if ( $line =~ /^\s*$/ ); # skip empty lines
  next if ( $line =~ /^\s*#/ ); # skip lines that start with #

  # store this line into a hash table keyed with the fields above
  @csvhash{ @fields } = split( /,/, $line );

  # print some data as we parse each line
  print "$csvhash{first} $csvhash{last} lives in $csvhash{country} and can be reached at $csvhash{phone}\n";

  # store the data in some hash tables using the header names
  $allzips{$csvhash{zipcode}}++;
  $firstnames{$csvhash{first}}++;
}

# now that we have all the data stored, do something with it


print "\n";

# count and display the unique zip codes
@keys = sort keys %allzips;
$i = $#keys + 1;
print "we saw $i unique zip codes, they are: @keys\n\n";

# count and display the first names and the number of times that each appeared
@keys = sort keys %firstnames;
$i = $#keys + 1;
print "we saw $i first names, they are:\n";
print "\t name\t occurances\n";
foreach $key (@keys) {
  print "\t $key\t $firstnames{$key}\n";
}

Output from the program is as follows:

gvolk@wumpus:~/scripts$ ./simplecsv.pl
top row fields from csv file are: last first phone zipcode country

jim jones lives in usa and can be reached at 314-555-1212
john smith lives in usa and can be reached at 314-555-1001
jane doe lives in usa and can be reached at 314-555-0019
jim smith lives in usa and can be reached at 314-555-1210

we saw 4 unique zip codes, they are: 63033 63141 63146 65401

we saw 3 first names, they are:
         name    occurances
         jane    1
         jim     2
         john    1
gvolk@wumpus:~/scripts$