FAQ 5.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
  1. My data contains unescaped quotes within quoted fields or quote
  2. characters in unquoted field.
  3. libcsv handles such malformed data by default, no special configuration
  4. is required. There are cases where such malformed data is ambigous and
  5. might not be parsed the way you would like, see the man page for libcsv
  6. for details. The csvfix and csvtest programs in the example directory
  7. may be useful when trying to determine how libcsv will parse your data.
  8. The csvvalid program can also be used to check for malformed data files.
  9. My csv file contains comments that should not be parsed as csv data,
  10. how can I handle this?
  11. Although there is no direct support for comment handling in libcsv you
  12. can preprocess the data before sending it to libcsv. For example, say
  13. that you wish to ignore all lines whose first non-space, non-tab
  14. character is a hash (#), you could use the following piece of code to
  15. accomplish that:
  16. #include <stdio.h>
  17. #include <ctype.h>
  18. #include <stdlib.h>
  19. #include <csv.h>
  20. int in_comment, in_record;
  21. void cb1(void *d, char *s, size_t size) { /* Data processing here */ }
  22. void cb2(void *d, char c) { in_record = 0; /* Record handling here */ }
  23. int main (void) {
  24. int c;
  25. char ch;
  26. struct csv_parser *p;
  27. if (csv_init(&p, 0))
  28. return EXIT_FAILURE;
  29. while ((c = getchar()) != EOF) {
  30. ch = c;
  31. if (in_comment) {
  32. if (ch == '\012' || ch == '\015') {
  33. in_comment = 0;
  34. }
  35. } else if (!in_record) {
  36. if (ch == ' ' || ch == '\t') {
  37. ;
  38. } else if (ch == '#') {
  39. in_comment = 1;
  40. } else {
  41. in_record = 1;
  42. csv_parse(p, &ch, 1, cb1, cb2);
  43. }
  44. } else {
  45. csv_parse(p, &ch, 1, cb1, cb2);
  46. }
  47. }
  48. csv_fini(p, ...);
  49. return 0;
  50. }
  51. If you determine that calling csv_parse for each character takes too
  52. much overhead (do some tests before making this decision, it usually
  53. doesn't) you can optimize this by processing a larger number of
  54. characters and calling csv_parse on a larger resulting buffer.
  55. If you know that your data is text-only, you can simplify this by
  56. reading one line at a time, checking the first non-space character,
  57. skipping the line if it is a comment character and calling csv_parse
  58. if it isn't.
  59. My data uses a semicolon as a delimiter instead of comma but otherwise
  60. follows CSV conventions, how can I use libcsv to read my data?
  61. Use the csv_set_delim function introduced in libcsv 2.0.0:
  62. struct csv_parser *p;
  63. csv_init(&p);
  64. csv_set_delim(p, ';');
  65. ...
  66. You can use csv_set_delim to set the delimiter to any character. Any
  67. field that contains the delimiter must be quoted when using strict
  68. mode. Be careful not to set the delimiter to the same character used
  69. as the quote character, a space character or a line terminator
  70. character though as libcsv won't be able to determine if the character
  71. is a field delimiter or a quote, etc.
  72. My data uses a single quotes instead of double quotes for quoted
  73. fields, how can I accomidate this?
  74. Use the csv_set_quote function introduced in libcsv 2.0.0:
  75. struct csv_parser *p;
  76. csv_init(&p);
  77. csv_set_quote(p, '\'');
  78. ...
  79. As with csv_set_delim you can set the quote character to any character
  80. but fields containing the quote character must still be quoted and are
  81. expected to be escaped by an instance of itself. For example, if you
  82. use csv_set_quote to change the quote character to a single quote,
  83. instances of a single quote in field data should be escaped by a
  84. preceding single quote.
  85. How can I preserve leading and trailing whitespace from non-quoted
  86. fields?
  87. By default libcsv ignores leading and trailing spaces and tabs from
  88. non-quoted fields as this is the most common practice and expected by
  89. many applications. The csv_set_space_func function introduced in
  90. libcsv 2.0.0 allows you to specify a function that will return 1 if
  91. the provided character should be considered a space character and 0
  92. otherwise. This allows you to change the characters that libcsv
  93. ignores around unquoted fields. If you create a function that always
  94. returns 0 then no character will be recognized as a space character
  95. and all characters will be preserved:
  96. int my_space(char c) { return 0;}
  97. struct csv_parser *p;
  98. csv_init(&p);
  99. csv_set_space_func(p, my_space);
  100. ...
  101. How can I remove leading and trailing whitespace from quoted fields?
  102. By default libcsv removes surrounding space and tab characters from
  103. unquoted fields but not from quoted fields. The easiest way to remove
  104. unwanted characters from a quoted field is inside the field callback
  105. function, simply take the data provided to the callback function and
  106. perform any manipulations directly on it.
  107. I want to be able to do things like extract or search on specific
  108. fields from a CSV file, sort a CSV file, etc. but the common UNIX
  109. utilities (cut, grep, sort, etc.) don't work on CSV data that contains
  110. fields with embedded commas or newlines, etc. Are there any tools
  111. like this for managing CSV files?
  112. Take a look at csvutils at https://github.com/rgamble/csvutils.
  113. This package uses libcsv to provide a number of useful CSV utilities
  114. including csvcut, csvgrep, and others with option syntax resembling
  115. their non-CSV counterparts.