Library: (visualscheme data delimited-text)
Use the following functions for the common task of CSV parsing, allowing you to use the data from large CSV files for your specific needs.
Form Name: default-delimiter
Signature | (default-delimiter [<delimiter>]) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Parameter |
Description | When called with no arguments, returns the currency-defined default delimiter (that is, a comma #\,). When called with one parameter, which must be a character, it sets the global default delimiter to that character. Note, parameters in Scheme are like dynamic variables in Common Lisp, and lexical scoping rules allow for setting a different value in a given lexical scope without affecting other code that relies on the global value. |
Formal Parameters
Parameter | Description |
[<delimiter>] | [Optional] The character to be set as the default delimiter. |
Examples
> (import (visualscheme data delimited-text)) > (default-delimiter) #\, > (default-delimiter #\:) > (default-delimiter) #\: > |
Remarks
Setting the default delimiter makes sense when the delimited text file you wish to work with uses something other than a comma to separate the values.
Form Name: make-string-pool
Signature | (make-string-pool) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Function |
Description | Returns a new instance of the StringPool class provided from the Sylvan CSV library on which the (visualscheme data delimited-text) library is based. |
Formal Parameters
None.
Examples
> (make-string-pool) #<clr-type Sylvan.StringPool> > |
Remarks
The built-in StringPool from Sylvan CSV is good enough for most one-off files. If you are working in an environment where the same pool may be used with multiple files or is expected to be long-running (e.g., in a web service), then the InternPool class from the Ben.Collections.Specialized library, which is included as an alternative (see (make-intern-pool)) is a better choice.
Form Name: make-intern-pool
Signature | (make-intern-pool) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns a new instance of the InternPool class from the included Ben.Collections.Specialized library. |
Formal Parameters
None.
Examples
> (make-intern-pool) #<clr-type Ben.Collections.Specialized.InternPool> > |
Remarks
See previous discussion for (make-string-pool).
Form Name: string-pool/string-factory
Signature | (string-pool/string-factory <pool>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | First-class function. |
Description | Given a <pool> (which must be a StringPool instance), returns a function that, given a character buffer, an offset and a length, returns a string. |
Formal Parameters
Parameter | Description |
<pool> | An instance of the Sylvan CSV StringPool class. |
Examples
(let* ((opts (make-dt-reader-options)) (string-pool (make-string-pool)) (factory (string-pool/string-factory string-pool))) (begin (dt-reader-options/delimiter-set! opts (default-delimiter)) (dt-reader-options/string-factory-set! opts factory) (let ((csv (make-dt-reader file-name opts))) csv))) |
Remarks
When creating a CSV reader it’s necessary to set several options including the StringFactory to be used as a callback for returning a string from a buffer. The above code was modified from (dt-reader/open-with-defaults) source code to reflect a choice of StringPool and its factory method.
Form Name: intern-pool/string-factory
Signature | (intern-pool/string-factory <pool>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Higher order function. |
Description | Similar to (string-pool/string-factory…) except the pool must be an instance of the InternPool. |
Formal Parameters
Parameter | Description |
<pool> | An instance of InternPool. |
Examples
(let* ((opts (make-dt-reader-options))
(string-pool (make-intern-pool))
(factory (intern-pool/string-factory string-pool)))
(begin
(dt-reader-options/delimiter-set! opts (default-delimiter))
(dt-reader-options/string-factory-set! opts factory)
(let ((csv (make-dt-reader file-name opts)))
csv)))
Remarks
When creating a CSV reader it’s necessary to set several options including the StringFactory to be used as a callback for returning a string from a buffer. The above code was copied from (dt-reader/open-with-defaults) source code to reflect a choice of InternPool and its factory method.
If you compare it to the example for string-pool/string-factory you’ll see the code is identical except for the function used to create the string-pool and factory variables in the let* binding. In theory, any function taking a character array, an offset and a length and returning a string could be plugged in there.
Form Name: make-dt-reader
Signature | (make-dt-reader <file-path> [<options>]) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Given a valid <file-path> to a delimited text file, and an optional instance of the CsvDataReaderOptions class, returns a new CsvDataReader instance. |
Formal Parameters
Parameter | Description |
<file-path> | A valid path to a delimited text file. |
[<options>] | [Optional] An instance of the CsvDataReaderOptions class for configuring the new CsvDataReader instance created by this function. |
Examples
See the previous two code examples: intern-pool/string-factory and string-pool/string-factory.
Remarks
Unless you have very specific needs, such as a different delimiter, it’s best to use (dt-reader/open-with-defaults <file-path>). But if you do need to customize the reader, or if the delimiter is the only option you want to change, then this is the best function to call. You can always set the (default-delimiter <char>) before calling (dt-reader/open-with-defaults).
Form Name: dt-reader/open-with-defaults
Signature | (dt-reader/open-with-defaults <file-path>). |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Given a valid path to a delimited text file, returns a CsvDataReader instance with default options. |
Formal Parameters
Parameter | Description |
<file-path> | A valid path to a delimited text file. |
Form Name: dt-reader/read
Signature | (dt-reader/read <reader>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Given an instance of CsvDataReader, advances to the next “row” in the file being read. It must be called at least once to advance to the first row before data in the columns of a row can be inspected. Returns #t if successful, #f otherwise (when there are no more rows to be read). |
Formal Parameters
Parameter | Description |
<reader> | An instance of the reader whose next row you wish to read. |
Examples
> (define csv (dt-reader/open-with-defaults nppes-npi-file)) > (dt-reader/read csv) #t > |
Form Name: dt-reader/close
Signature | (dt-reader/close <reader>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Closes the passed in CsvDataReader instance. |
Formal Parameters
Parameter | Description |
<reader> | The instance of CsvDataReader you wish to close. |
Examples
> (dt-reader/close csv) > (dt-reader/read csv) #f > |
Remarks
Attempting to read a closed reader won’t work; the (dt-reader/read …) function will return #f.
Form Name: dt-reader/field-count
Signature | (dt-reader/field-count <reader>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns the number of columns in a row in the passed in <reader>. |
Formal Parameters
Parameter | Description |
<reader> | The reader whose column-length you seek. |
Examples
> (import (visualscheme data delimited-text)) > (define csv (dt-reader/open-with-defaults nppes-npi-file)) > (dt-reader/read csv) #t > (dt-reader/field-count csv) 330 > |
Form Name: dt-reader/row-field-count
Signature | (dt-reader/row-field-count <reader>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Provides the field count for the current row (which may be different from the field count of the reader, as determined by the header column). |
Formal Parameters
Parameter | Description |
<reader> | The <reader> whose current row’s field count you seek. |
Examples
> (import (visualscheme data delimited-text)) > (define csv (dt-reader/open-with-defaults nppes-npi-file)) > (dt-reader/read csv) #t > (dt-reader/field-count csv) 330 > (dt-reader/row-field-count csv) 330 > |
Remarks
When compared to the result of (dt-reader/field-count), this allows you to test whether the current row is malformed (i.e., too many or too few fields).
Note, this function depends on the state of the current row and may give different results from row to row if the file is malformed or otherwise irregular.
Form Name: dt-reader/row-number
Signature | (dt-reader/row-number <reader>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns the 1-based number of the current row. |
Formal Parameters
Parameter | Description |
<reader> | The reader whose current row number you wish to find. |
Examples
> (import (visualscheme data delimited-text)) > (define csv (dt-reader/open-with-defaults nppes-npi-file)) > (dt-reader/read csv) #t > (dt-reader/field-count csv) 330 > (dt-reader/row-field-count csv) 330 > (dt-reader/row-number csv) 1 > |
Remarks
This function may help you gauge how far through a file you are, if you know who many rows it is supposed to contain.
Form Name: dt-reader/get-ordinal
Signature | (dt-reader/get-ordinal <name>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Given the name of a field, returns its 0-based location in the array of data representing a row in the delimited text file. Assumes the first row is the header row. |
Formal Parameters
Parameter | Description |
<name> | The name of the field whose ordinal you seek. |
Examples
> (dt-reader/get-ordinal csv "NPI") 0 > |
Remarks
See also (dt-reader/get-name …) which performs the opposite function: given the ordinal, it returns the name of the field.
Form Name: dt-reader/get-XYZ
Signature | (dt-reader/get-XYZ <reader> <ordinal>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | This is not a single function but a number of functions where XYZ represents a common data type, conforming mainly to the ADO.NET specification. All such functions take a <reader> and the <ordinal> of the field requested, and return a value according to the type indicated by XYZ, if indeed the actual value of that field can be cast to the requested type. See remarks for full list of functions of this type and their expected return value(s). |
Formal Parameters
Parameter | Description |
<reader> | A CsvDataReader instance. |
<ordinal> | An integer representing the field requested by position in the current row’s data array. |
Examples
> (dt-reader/get-value csv 0) "1679576722" > (dt-reader/get-string csv 0) "1679576722" > (dt-reader/get-int32 csv 0) 1679576722 > (dt-reader/get-date-time csv 0) Unhandled CLR exception during evaluation: CLR Exception: System.FormatException System.FormatException: String was not recognized as a valid DateTime. at System.DateTimeParse.Parse(String s, DateTimeFormatInfo dtfi, DateTimeStyles styles) at Sylvan.Data.Csv.CsvDataReader.GetDateTime(Int32 ordinal) at visualscheme.data.delimited-text.dt-reader/get-date-time(Object reader, Object ordinal) at #.ironscheme.exceptions::dynamic-wind(Object in, Object proc, Object out) at #.ironscheme.exceptions::dynamic-wind(Object in, Object proc, Object out) at IronScheme.Runtime.Builtins.CallWithCurrentContinuation(Object fc1) at IronScheme.Runtime.R6RS.Exceptions.WithClrExceptionHandler(Object handler, Object thunk)
> |
Remarks
The following table lists the functions covered by this topic:
Functions | Description |
(dt-reader/get-value …) | Returns whatever it finds - no cast is attempted. |
(dt-reader/get-string …) | Returns a string representation of the actual value |
(dt-reader/get-char …) | Returns a char if actual value can be cast to it. |
(dt-reader/get-int16 …) | Returns an int16 if the actual value can be cast to it. |
(dt-reader/get-int32 …) | Returns an int32 if the actual value can be cast to it. |
(dt-reader/get-int64 …) | Returns an int64 if the actual value can be cast to it. |
(dt-reader/get-date-time …) | Returns a DateTime if the actual value can be cast to it |
(dt-reader/get-boolean …) | Returns a boolean if the actual value can be cast to it. |
(dt-reader/get-byte …) | Returns a byte if the actual value can be cast to it. |
(dt-reader/get-char …) | Returns a char if the actual value can be cast to it. |
(dt-reader/get-time-span …) | Returns a TimeSpan if the actual value can be cast to it. |
(dt-reader/get-date-time-offset …) | Returns a DateTimeOffset if the actual value can be cast to it. |
(dt-reader/get-decimal …) | Returns a Decimal if the actual value can be cast to it. |
(dt-reader/get-double …) | Returns a Double if the actual value can be cast to it. |
(dt-reader/get-float …) | Returns a Float if the actual value can be cast to it. |
(dt-reader/get-guid …) | Returns a Guid if the actual value can be cast to it. |
As mentioned above, the data types supported by CsvDataReader are intentionally modeled around ADO.NET’s DbDataReader abstraction. So working with the CSV reader is familiar to database developers for this reason.
Form Name: dt-reader/get-bytes
Signature | (dt-reader/get-bytes <reader> <ordinal> <data-offset> <buffer> <buffer-offset> <length>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Reads the specified number of bytes from the specified column starting at a specified index and writes them to a buffer starting at the specified position in the buffer. |
Formal Parameters
Parameter | Description |
<reader> | The reader whose bytes at column <ordinal> you wish to read. |
<ordinal> | The zero-based column ordinal. |
<data-offset> | The index within the row from which to begin the read operation. |
<buffer> | The buffer into which to copy the data. |
<buffer-offset> | The index within the buffer to which the data will be copied. |
<length> | The maximum number of bytes to read. |
Remarks
This low level API call is part of the ADO.NET API so examples using DbDataReader can illuminate the intended behavior of this call.
Form Name: dt-reader/get-chars
Signature | (dt-reader/get-chars <reader> <ordinal> <data-offset> <buffer> <buffer-offset> <length>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Reads the specified number of chars from the specified column starting at a specified index and writes them to a buffer starting at the specified position in the buffer. |
Formal Parameters
Parameter | Description |
<reader> | The reader whose bytes at column <ordinal> you wish to read. |
<ordinal> | The zero-based column ordinal. |
<data-offset> | The index within the row from which to begin the read operation. |
<buffer> | The buffer into which to copy the data. |
<buffer-offset> | The index within the buffer to which the data will be copied. |
<length> | The maximum number of bytes to read. |
Remarks
This low level API call is part of the ADO.NET API so examples using DbDataReader can illuminate the intended behavior of this call.
Form Name: dt-reader/get-field-type
Signature | (dt-reader/get-field-type <reader> <ordinal>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns the fully qualified .NET RuntimeType of the value found in the current row of the passed-in <reader> at the requested field <ordinal> |
Formal Parameters
Parameter | Description |
<reader> | The CsvDataReader instance passed in. |
<ordinal> | The zero-based index of the requested field in the current row. |
Examples
> (dt-reader/get-field-type csv 0) #<clr-type System.RuntimeType "System.String"> > |
Remarks
This can be handy to determine the most appropriate flavor of (dt-reader/get-XYZ …) to call on the field.
Form Name: dt-reader/get-data-type-name
Signature | (dt-reader/get-data-type-name <reader> <ordinal>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns short name of the type of the value found in the current row of the passed-in <reader> at the requested field <ordinal> |
Formal Parameters
Parameter | Description |
<reader> | The CsvDataReader instance passed in. |
<ordinal> | The zero-based index of the requested field in the current row. |
Examples
> (dt-reader/get-data-type-name csv 0) "String" > |
Remarks
Useful for determining which flavor of (dt-reader/get-XYZ …) to call on a row.
Form Name: dt-reader/get-name
Signature | (dt-reader/get-name <reader> <ordinal>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns the name of the field specified by <ordinal> of the passed-in <reader> if the file has headers. |
Formal Parameters
Parameter | Description |
<reader> | The CsvDataReader instance. |
<ordinal> | The zero-based index of the field requested. |
Examples
> (dt-reader/get-name csv 0) "NPI" > |
Form Name: dt-reader/all-rows
Signature | (dt-reader/all-rows <reader>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Syntax |
Description | Returns an iterator for all rows in the passed in <reader> instance. |
Formal Parameters
Parameter | Description |
<reader> | The CsvDataReader instance whose rows you seek to iterate. |
Examples
> (define all-rows (dt-reader/all-rows csv)) > all-rows #<clr-type record.9b26080c-35ae-48e7-8667-5540ece81d23.iterator> > |
Remarks
The resulting iterator can then be used directly with (iterator-reset), (iterator-current), and (iterator-move-next), which requires loading (visualscheme data linq) into the current runtime. If only a subset is desired, or evaluation should occur eagerly, then the iterator can be used in a LINQ expression inside a (foreach …) clause.
Form Name: make-dt-reader-options
Signature | (make-dt-reader-options) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns a new instance of the CsvDataReaderOptions class. |
Formal Parameters
None.
Examples
> (define opts (make-dt-reader-options)) > opts #<clr-type Sylvan.Data.Csv.CsvDataReaderOptions> > |
Remarks
Used to set options to be passed into the (make-dt-reader) constructor of a new CsvDataReader instance.
Form Name: dt-reader-options/has-headers?
Signature | (dt-reader-options/has-headers? <reader-options>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns #t if the current instance of <reader-options> expects headers, or #f if not. |
Formal Parameters
Parameter | Description |
<reader-options> | The instance of CsvDataReaderOptions passed in. |
Examples
> (dt-reader-options/has-headers? opts) #t > |
Form Name: dt-reader-options/has-headers!
Signature | (dt-reader-options/has-headers! <reader-options> <boolean>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Procedure with side effect |
Description | Sets whether or not the passed-in <reader-options> instance should expect the csv file to have headers, or not. |
Formal Parameters
Parameter | Description |
<reader-options> | The CsvDataReaderOptions instance passed in. |
<boolean> | #t or #f, depending on whether you wish to set expectations that the csv file will have headers. |
Examples
> (dt-reader-options/has-headers! opts #f) > (dt-reader-options/has-headers? opts) #f > |
Form Name: dt-reader-options/delimiter-get
Signature | (dt-reader-options/delimiter-get <reader-options>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | function |
Description | Returns the delimiter expected by the passed-in <reader-options> instance. |
Formal Parameters
Parameter | Description |
<reader-options> | The CsvDataReaderOptions instance passed in. |
Examples
> (dt-reader-options/delimiter-get opts) () > |
Remarks
If null, or ‘(), then the #\, (comma) character is assumed.
Form Name: dt-reader-options/delimiter-set!
Signature | (dt-reader-options/delimiter-set! <reader-options> <char-delimiter>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Procedure with side effect |
Description | Sets the delimiter which the passed-in <reader-options> instance should expect. |
Formal Parameters
Parameter | Description |
<reader-options> | The CsvDataReaderOptions instance passed-in. |
<char-delimiter> | The character <reader-options> should expect to be the delimiter in the target CSV file. |
Examples
> (dt-reader-options/delimiter-set! opts #\,) > (dt-reader-options/delimiter-get opts) #\, > |
Form Name: dt-reader-options/string-factory-set!
Signature | (dt-reader-options/string-factory-set! <reader-options> <string-factory>) |
Exported From | (visualscheme data delimited-text) |
Type of Form | Procedure with side effect |
Description | Sets the StringFactory property of the passed-in <reader-options> to be used for string pooling. |
Formal Parameters
Parameter | Description |
<reader-options> | The CsvDataReaderOptions passed in. |
<string-factory-delegate> | A lambda expression taking a buffer, and offset, and a length for input, and returning a string (usually one already interned in a cache, to minimize memory usage and optimize speed of processing). |
Examples
See example usage in the (string-pool/string-factory …) and (intern-pool/string-factory …) topics.
Remarks
When the StringPool property is set, strings are cached to speed up processing and make more efficient use of memory rather than would be the case without it, as CSV files are by their nature quite duplicative, containing many redundant instances of the same string.
The following terms are registered trademarks of the Microsoft group of companies and are used in accordance with Microsoft’s Trade and Brand Guidelines: Microsoft, Microsoft 365, Microsoft Office, Microsoft Excel, Microsoft Edge, Microsoft Edge WebView2, Microsoft Windows, Excel, Office 365
The following terms are registered trademarks of Apex Data Solutions: Visual Scheme, VSA.
Copyright © 2022. Apex Data Solutions, LLC. All Rights Reserved.