Library:  (visualscheme data delimited-text)


Use the following functions for the common task of CSV parsing, allowing you to use the data from large CSV files for your specific needs.



Form Name:  default-delimiter


Signature

(default-delimiter [<delimiter>])

Exported From

(visualscheme data delimited-text)

Type of Form

Parameter

Description

When called with no arguments, returns the currency-defined default delimiter (that is, a comma #\,). When called with one parameter, which must be a character, it sets the global default delimiter to that character. Note, parameters in Scheme are like dynamic variables in Common Lisp, and lexical scoping rules allow for setting a different value in a given lexical scope without affecting other code that relies on the global value.


Formal Parameters


Parameter

Description

[<delimiter>]

[Optional] The character to be set as the default delimiter.


Examples


> (import (visualscheme data delimited-text))
> (default-delimiter)
#\,
> (default-delimiter #\:)
> (default-delimiter)
#\:
>


Remarks


Setting the default delimiter makes sense when the delimited text file you wish to work with uses something other than a comma to separate the values.



Form Name:  make-string-pool


Signature

(make-string-pool)

Exported From

(visualscheme data delimited-text)

Type of Form

Function

Description

Returns a new instance of the StringPool class provided from the Sylvan CSV library on which the (visualscheme data delimited-text) library is based.


Formal Parameters


None.



Examples


> (make-string-pool)
#<clr-type Sylvan.StringPool>
>



Remarks


The built-in StringPool from Sylvan CSV is good enough for most one-off files. If you are working in an environment where the same pool may be used with multiple files or is expected to be long-running (e.g., in a web service), then the InternPool class from the Ben.Collections.Specialized library, which is included as an alternative (see (make-intern-pool)) is a better choice. 



Form Name:  make-intern-pool


Signature

(make-intern-pool)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns a new instance of the InternPool class from the included Ben.Collections.Specialized library.


Formal Parameters


None.



Examples


> (make-intern-pool)
#<clr-type Ben.Collections.Specialized.InternPool>
>


Remarks


See previous discussion for (make-string-pool).



Form Name:  string-pool/string-factory


Signature

(string-pool/string-factory <pool>)

Exported From

(visualscheme data delimited-text)

Type of Form

First-class function.

Description

Given a <pool> (which must be a StringPool instance), returns a function that, given a character buffer, an offset and a length, returns a string.


Formal Parameters


Parameter

Description

<pool>

An instance of the Sylvan CSV StringPool class.


Examples


(let* ((opts (make-dt-reader-options))
             (string-pool (make-string-pool))
             (factory (string-pool/string-factory string-pool)))
        (begin
          (dt-reader-options/delimiter-set! opts (default-delimiter))
          (dt-reader-options/string-factory-set! opts factory)
          (let ((csv (make-dt-reader file-name opts)))
            csv)))



Remarks


When creating a CSV reader it’s necessary to set several options including the StringFactory to be used as a callback for returning a string from a buffer. The above code was modified from (dt-reader/open-with-defaults) source code to reflect a choice of StringPool and its factory method.



Form Name:  intern-pool/string-factory


Signature

(intern-pool/string-factory <pool>)

Exported From

(visualscheme data delimited-text)

Type of Form

Higher order function.

Description

Similar to (string-pool/string-factory…) except the pool must be an instance of the InternPool.


Formal Parameters


Parameter

Description

<pool>

An instance of InternPool.


Examples


(let* ((opts (make-dt-reader-options))
             (string-pool (make-intern-pool))
             (factory (intern-pool/string-factory string-pool)))
        (begin
          (dt-reader-options/delimiter-set! opts (default-delimiter))
          (dt-reader-options/string-factory-set! opts factory)
          (let ((csv (make-dt-reader file-name opts)))
            csv)))

Remarks


When creating a CSV reader it’s necessary to set several options including the StringFactory to be used as a callback for returning a string from a buffer. The above code was copied from (dt-reader/open-with-defaults) source code to reflect a choice of InternPool and its factory method.


If you compare it to the example for string-pool/string-factory you’ll see the code is identical except for the function used to create the string-pool and factory variables in the let* binding. In theory, any function taking a character array, an offset and a length and returning a string could be plugged in there.



Form Name:  make-dt-reader


Signature

(make-dt-reader <file-path> [<options>])

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Given a valid <file-path> to a delimited text file, and an optional instance of the CsvDataReaderOptions class, returns a new CsvDataReader instance.


Formal Parameters


Parameter

Description

<file-path>

A valid path to a delimited text file.

[<options>]

[Optional] An instance of the CsvDataReaderOptions class for configuring the new CsvDataReader instance created by this function.


Examples


See the previous two code examples:  intern-pool/string-factory and string-pool/string-factory.



Remarks


Unless you have very specific needs, such as a different delimiter, it’s best to use (dt-reader/open-with-defaults <file-path>). But if you do need to customize the reader, or if the delimiter is the only option you want to change, then this is the best function to call. You can always set the (default-delimiter <char>) before calling (dt-reader/open-with-defaults).



Form Name:  dt-reader/open-with-defaults


Signature

(dt-reader/open-with-defaults <file-path>).

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Given a valid path to a delimited text file, returns a CsvDataReader instance with default options.


Formal Parameters


Parameter

Description

<file-path>

A valid path to a delimited text file.



Form Name:  dt-reader/read


Signature

(dt-reader/read <reader>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Given an instance of CsvDataReader, advances to the next “row” in the file being read. It must be called at least once to advance to the first row before data in the columns of a row can be inspected. Returns #t if successful, #f otherwise (when there are no more rows to be read).



Formal Parameters


Parameter

Description

<reader>

An instance of the reader whose next row you wish to read.


Examples


> (define csv (dt-reader/open-with-defaults nppes-npi-file))
> (dt-reader/read csv)
#t
>



Form Name:  dt-reader/close


Signature

(dt-reader/close <reader>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Closes the passed in CsvDataReader instance.


Formal Parameters


Parameter

Description

<reader>

The instance of CsvDataReader you wish to close.


Examples


> (dt-reader/close csv)
> (dt-reader/read csv)
#f
>


Remarks


Attempting to read a closed reader won’t work; the (dt-reader/read …) function will return #f.



Form Name:  dt-reader/field-count


Signature

(dt-reader/field-count <reader>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns the number of columns in a row in the passed in <reader>.


Formal Parameters


Parameter

Description

<reader>

The reader whose column-length you seek.


Examples


> (import (visualscheme data delimited-text))
> (define csv (dt-reader/open-with-defaults nppes-npi-file))
> (dt-reader/read csv)
#t
> (dt-reader/field-count csv)
330
>


Form Name:  dt-reader/row-field-count


Signature

(dt-reader/row-field-count <reader>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Provides the field count for the current row (which may be different from the field count of the reader, as determined by the header column).


Formal Parameters


Parameter

Description

<reader>

The <reader> whose current row’s field count you seek.


Examples


> (import (visualscheme data delimited-text))
> (define csv (dt-reader/open-with-defaults nppes-npi-file))
> (dt-reader/read csv)
#t
> (dt-reader/field-count csv)
330
> (dt-reader/row-field-count csv)
330
>


Remarks


When compared to the result of (dt-reader/field-count), this allows you to test whether the current row is malformed (i.e., too many or too few fields).

Note, this function depends on the state of the current row and may give different results from row to row if the file is malformed or otherwise irregular.



Form Name:  dt-reader/row-number


Signature

(dt-reader/row-number <reader>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns the 1-based number of the current row.



Formal Parameters


Parameter

Description

<reader>

The reader whose current row number you wish to find.


Examples


> (import (visualscheme data delimited-text))
> (define csv (dt-reader/open-with-defaults nppes-npi-file))
> (dt-reader/read csv)
#t
> (dt-reader/field-count csv)
330
> (dt-reader/row-field-count csv)
330
> (dt-reader/row-number csv)
1
>


Remarks


This function may help you gauge how far through a file you are, if you know who many rows it is supposed to contain.



Form Name:  dt-reader/get-ordinal


Signature

(dt-reader/get-ordinal <name>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Given the name of a field, returns its 0-based location in the array of data representing a row in the delimited text file. Assumes the first row is the header row.



Formal Parameters


Parameter

Description

<name>

The name of the field whose ordinal you seek.


Examples


> (dt-reader/get-ordinal csv "NPI")
0
>



Remarks


See also (dt-reader/get-name …) which performs the opposite function: given the ordinal, it returns the name of the field.



Form Name:  dt-reader/get-XYZ


Signature

(dt-reader/get-XYZ <reader> <ordinal>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

This is not a single function but a number of functions where XYZ represents a common data type, conforming mainly to the ADO.NET specification. All such functions take a <reader> and the <ordinal> of the field requested, and return a value according to the type indicated by XYZ, if indeed the actual value of that field can be cast to the requested type. See remarks for full list of functions of this type and their expected return value(s).



Formal Parameters


Parameter

Description

<reader>

CsvDataReader instance.

<ordinal>

An integer representing the field requested by position in the current row’s data array.


Examples


> (dt-reader/get-value csv 0)
"1679576722"
> (dt-reader/get-string csv 0)
"1679576722"
> (dt-reader/get-int32 csv 0)
1679576722
> (dt-reader/get-date-time csv 0)
Unhandled CLR exception during evaluation:
CLR Exception: System.FormatException
System.FormatException: String was not recognized as a valid DateTime.
   at System.DateTimeParse.Parse(String s, DateTimeFormatInfo dtfi, DateTimeStyles styles)
   at Sylvan.Data.Csv.CsvDataReader.GetDateTime(Int32 ordinal)
   at visualscheme.data.delimited-text.dt-reader/get-date-time(Object reader, Object ordinal)
   at #.ironscheme.exceptions::dynamic-wind(Object in, Object proc, Object out)
   at #.ironscheme.exceptions::dynamic-wind(Object in, Object proc, Object out)
   at IronScheme.Runtime.Builtins.CallWithCurrentContinuation(Object fc1)
   at IronScheme.Runtime.R6RS.Exceptions.WithClrExceptionHandler(Object handler, Object thunk)

>


Remarks


The following table lists the functions covered by this topic:


Functions

Description

(dt-reader/get-value …)

Returns whatever it finds - no cast is attempted.

(dt-reader/get-string …) 

Returns a string representation of the actual value

(dt-reader/get-char …) 

Returns a char if actual value can be cast to it.

(dt-reader/get-int16 …)

Returns an int16 if the actual value can be cast to it.

(dt-reader/get-int32 …) 

Returns an int32 if the actual value can be cast to it.

(dt-reader/get-int64 …)

Returns an int64 if the actual value can be cast to it.

(dt-reader/get-date-time …) 

Returns a DateTime if the actual value can be cast to it

(dt-reader/get-boolean …)

Returns a boolean if the actual value can be cast to it.

(dt-reader/get-byte …)

Returns a byte if the actual value can be cast to it.

(dt-reader/get-char …)

Returns a char if the actual value can be cast to it.

(dt-reader/get-time-span …)

Returns a TimeSpan if the actual value can be cast to it.

(dt-reader/get-date-time-offset …)

Returns a DateTimeOffset if the actual value can be cast to it.

(dt-reader/get-decimal …)

Returns a Decimal if the actual value can be cast to it.

(dt-reader/get-double …)

Returns a Double if the actual value can be cast to it.

(dt-reader/get-float …) 

Returns a Float if the actual value can be cast to it.

(dt-reader/get-guid …) 

Returns a Guid if the actual value can be cast to it.


As mentioned above, the data types supported by CsvDataReader are intentionally modeled around ADO.NET’s DbDataReader abstraction. So working with the CSV reader is familiar to database developers for this reason.



Form Name:  dt-reader/get-bytes


Signature

(dt-reader/get-bytes <reader> <ordinal> <data-offset> <buffer> <buffer-offset> <length>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Reads the specified number of bytes from the specified column starting at a specified index and writes them to a buffer starting at the specified position in the buffer.


Formal Parameters


Parameter

Description

<reader>

The reader whose bytes at column <ordinal> you wish to read.

<ordinal>

The zero-based column ordinal.

<data-offset>

The index within the row from which to begin the read operation.

<buffer>

The buffer into which to copy the data.

<buffer-offset>

The index within the buffer to which the data will be copied.

<length>

The maximum number of bytes to read.


Remarks


This low level API call is part of the ADO.NET API so examples using DbDataReader can illuminate the intended behavior of this call.



Form Name:  dt-reader/get-chars


Signature

(dt-reader/get-chars <reader> <ordinal> <data-offset> <buffer> <buffer-offset> <length>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Reads the specified number of chars from the specified column starting at a specified index and writes them to a buffer starting at the specified position in the buffer.


Formal Parameters


Parameter

Description

<reader>

The reader whose bytes at column <ordinal> you wish to read.

<ordinal>

The zero-based column ordinal.

<data-offset>

The index within the row from which to begin the read operation.

<buffer>

The buffer into which to copy the data.

<buffer-offset>

The index within the buffer to which the data will be copied.

<length>

The maximum number of bytes to read.


Remarks


This low level API call is part of the ADO.NET API so examples using DbDataReader can illuminate the intended behavior of this call.


Form Name:  dt-reader/get-field-type


Signature

(dt-reader/get-field-type <reader> <ordinal>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns the fully qualified .NET RuntimeType of the value found in the current row  of the passed-in <reader> at the requested field <ordinal>


Formal Parameters


Parameter

Description

<reader>

The CsvDataReader instance passed in.

<ordinal>

The zero-based index of the requested field in the current row.


Examples


> (dt-reader/get-field-type csv 0)
#<clr-type System.RuntimeType "System.String">
>


Remarks


This can be handy to determine the most appropriate flavor of (dt-reader/get-XYZ …) to call on the field. 



Form Name:  dt-reader/get-data-type-name


Signature

(dt-reader/get-data-type-name <reader> <ordinal>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns short name of the type of the value found in the current row  of the passed-in <reader> at the requested field <ordinal>


Formal Parameters


Parameter

Description

<reader>

The CsvDataReader instance passed in.

<ordinal>

The zero-based index of the requested field in the current row.


Examples


> (dt-reader/get-data-type-name csv 0)
"String"
>


Remarks


Useful for determining which flavor of (dt-reader/get-XYZ …) to call on a row.



Form Name:  dt-reader/get-name


Signature

(dt-reader/get-name <reader> <ordinal>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns the name of the field specified by <ordinal> of the passed-in <reader> if the file has headers.


Formal Parameters


Parameter

Description

<reader>

The CsvDataReader instance.

<ordinal>

The zero-based index of the field requested.


Examples


> (dt-reader/get-name csv 0)
"NPI"
>




Form Name:  dt-reader/all-rows


Signature

(dt-reader/all-rows <reader>)

Exported From

(visualscheme data delimited-text)

Type of Form

Syntax

Description

Returns an iterator for all rows in the passed in <reader> instance.


Formal Parameters


Parameter

Description

<reader>

The CsvDataReader instance whose rows you seek to iterate.


Examples


> (define all-rows (dt-reader/all-rows csv))
> all-rows
#<clr-type record.9b26080c-35ae-48e7-8667-5540ece81d23.iterator>
>


Remarks


The resulting iterator can then be used directly with (iterator-reset), (iterator-current), and (iterator-move-next), which requires loading (visualscheme data linq) into the current runtime. If only a subset is desired, or evaluation should occur eagerly, then the iterator can be used in a LINQ expression inside a (foreach …) clause.



Form Name:  make-dt-reader-options


Signature

(make-dt-reader-options)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns a new instance of the CsvDataReaderOptions class.


Formal Parameters

None.



Examples


> (define opts (make-dt-reader-options))
> opts
#<clr-type Sylvan.Data.Csv.CsvDataReaderOptions>
>


Remarks


Used to set options to be passed into the (make-dt-reader) constructor of a new CsvDataReader instance.



Form Name:  dt-reader-options/has-headers?


Signature

(dt-reader-options/has-headers? <reader-options>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns #t if the current instance of <reader-options> expects headers, or #f if not.


Formal Parameters


Parameter

Description

<reader-options>

The instance of CsvDataReaderOptions passed in.


Examples


> (dt-reader-options/has-headers? opts)
#t
>




Form Name:  dt-reader-options/has-headers!


Signature

(dt-reader-options/has-headers! <reader-options> <boolean>)

Exported From

(visualscheme data delimited-text)

Type of Form

Procedure with side effect

Description

Sets whether or not the passed-in <reader-options> instance should expect the csv file to have headers, or not.


Formal Parameters


Parameter

Description

<reader-options>

The CsvDataReaderOptions instance passed in.

<boolean>

#t or #f, depending on whether you wish to set expectations that the csv file will have headers.


Examples


> (dt-reader-options/has-headers! opts #f)
> (dt-reader-options/has-headers? opts)
#f
>




Form Name:  dt-reader-options/delimiter-get


Signature

(dt-reader-options/delimiter-get <reader-options>)

Exported From

(visualscheme data delimited-text)

Type of Form

function

Description

Returns the delimiter expected by the passed-in <reader-options> instance.


Formal Parameters


Parameter

Description

<reader-options>

The CsvDataReaderOptions instance passed in.


Examples


> (dt-reader-options/delimiter-get opts)
()
>


Remarks


If null, or ‘(), then the #\, (comma) character is assumed.



Form Name:  dt-reader-options/delimiter-set!


Signature

(dt-reader-options/delimiter-set! <reader-options> <char-delimiter>)

Exported From

(visualscheme data delimited-text)

Type of Form

Procedure with side effect

Description

Sets the delimiter which the passed-in <reader-options> instance should expect.


Formal Parameters


Parameter

Description

<reader-options>

The CsvDataReaderOptions instance passed-in.

<char-delimiter>

The character <reader-options> should expect to be the delimiter in the target CSV file.


Examples


> (dt-reader-options/delimiter-set! opts #\,)
> (dt-reader-options/delimiter-get opts)
#\,
>




Form Name:  dt-reader-options/string-factory-set!


Signature

(dt-reader-options/string-factory-set! <reader-options> <string-factory>)

Exported From

(visualscheme data delimited-text)

Type of Form

Procedure with side effect

Description

Sets the StringFactory property of the passed-in <reader-options> to be used for string pooling.


Formal Parameters


Parameter

Description

<reader-options>

The CsvDataReaderOptions passed in.

<string-factory-delegate>

A lambda expression taking a buffer, and offset, and a length for input, and returning a string (usually one already interned in a cache, to minimize memory usage and optimize speed of processing).


Examples


See example usage in the (string-pool/string-factory …) and (intern-pool/string-factory …) topics.

Remarks


When the StringPool property is set, strings are cached to speed up processing and make more efficient use of memory rather than would be the case without it, as CSV files are by their nature quite duplicative, containing many redundant instances of the same string.




The following terms are registered trademarks of the Microsoft group of companies and are used in accordance with Microsoft’s Trade and Brand Guidelines: Microsoft, Microsoft 365, Microsoft Office, Microsoft Excel, Microsoft Edge, Microsoft Edge WebView2, Microsoft Windows, Excel, Office 365


The following terms are registered trademarks of Apex Data Solutions: Visual Scheme, VSA.


Copyright © 2022.  Apex Data Solutions, LLC. All Rights Reserved.