musil
/
Agronode_feeder


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
							My data contains unescaped quotes within quoted fields or quote
characters in unquoted field.

libcsv handles such malformed data by default, no special configuration
is required.  There are cases where such malformed data is ambigous and
might not be parsed the way you would like, see the man page for libcsv
for details.  The csvfix and csvtest programs in the example directory
may be useful when trying to determine how libcsv will parse your data.
The csvvalid program can also be used to check for malformed data files.


My csv file contains comments that should not be parsed as csv data,
how can I handle this?

Although there is no direct support for comment handling in libcsv you
can preprocess the data before sending it to libcsv.  For example, say
that you wish to ignore all lines whose first non-space, non-tab
character is a hash (#), you could use the following piece of code to
accomplish that:

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <csv.h>

int in_comment, in_record;

void cb1(void *d, char *s, size_t size) { /* Data processing here */ }
void cb2(void *d, char c) { in_record = 0; /* Record handling here */ }

int main (void) {
  int c;
  char ch;
  struct csv_parser *p;
  if (csv_init(&p, 0))
    return EXIT_FAILURE;

  while ((c = getchar()) != EOF) {
    ch = c;
    if (in_comment) {
      if (ch == '\012' || ch == '\015') {
        in_comment = 0;
      }
    } else if (!in_record) {
      if (ch == ' ' || ch == '\t') {
        ;
      } else if (ch == '#') {
        in_comment = 1;
      } else {
        in_record = 1;
        csv_parse(p, &ch, 1, cb1, cb2);
      }
    } else {
      csv_parse(p, &ch, 1, cb1, cb2);
    }
  }
  csv_fini(p, ...);
  return 0;
}

If you determine that calling csv_parse for each character takes too
much overhead (do some tests before making this decision, it usually
doesn't) you can optimize this by processing a larger number of
characters and calling csv_parse on a larger resulting buffer.  
If you know that your data is text-only, you can simplify this by
reading one line at a time, checking the first non-space character,
skipping the line if it is a comment character and calling csv_parse
if it isn't.


My data uses a semicolon as a delimiter instead of comma but otherwise
follows CSV conventions, how can I use libcsv to read my data?

Use the csv_set_delim function introduced in libcsv 2.0.0:
struct csv_parser *p;
csv_init(&p);
csv_set_delim(p, ';');
...

You can use csv_set_delim to set the delimiter to any character.  Any
field that contains the delimiter must be quoted when using strict
mode.  Be careful not to set the delimiter to the same character used
as the quote character, a space character or a line terminator
character though as libcsv won't be able to determine if the character 
is a field delimiter or a quote, etc.


My data uses a single quotes instead of double quotes for quoted
fields, how can I accomidate this?

Use the csv_set_quote function introduced in libcsv 2.0.0:
struct csv_parser *p;
csv_init(&p);
csv_set_quote(p, '\'');
...

As with csv_set_delim you can set the quote character to any character
but fields containing the quote character must still be quoted and are
expected to be escaped by an instance of itself.  For example, if you 
use csv_set_quote to change the quote character to a single quote, 
instances of a single quote in field data should be escaped by a 
preceding single quote.


How can I preserve leading and trailing whitespace from non-quoted 
fields?

By default libcsv ignores leading and trailing spaces and tabs from 
non-quoted fields as this is the most common practice and expected by 
many applications.  The csv_set_space_func function introduced in 
libcsv 2.0.0 allows you to specify a function that will return 1 if 
the provided character should be considered a space character and 0 
otherwise.  This allows you to change the characters that libcsv 
ignores around unquoted fields.  If you create a function that always 
returns 0 then no character will be recognized as a space character 
and all characters will be preserved:

int my_space(char c) { return 0;}

struct csv_parser *p;
csv_init(&p);
csv_set_space_func(p, my_space);
...


How can I remove leading and trailing whitespace from quoted fields?

By default libcsv removes surrounding space and tab characters from 
unquoted fields but not from quoted fields.  The easiest way to remove 
unwanted characters from a quoted field is inside the field callback 
function, simply take the data provided to the callback function and 
perform any manipulations directly on it.


I want to be able to do things like extract or search on specific 
fields from a CSV file, sort a CSV file, etc. but the common UNIX 
utilities (cut, grep, sort, etc.) don't work on CSV data that contains 
fields with embedded commas or newlines, etc.  Are there any tools 
like this for managing CSV files?

Take a look at csvutils at https://github.com/rgamble/csvutils.
This package uses libcsv to provide a number of useful CSV utilities 
including csvcut, csvgrep, and others with option syntax resembling 
their non-CSV counterparts.