{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": null, "outputs": [], "source": [ "#r \"nuget: FSharp.Data,8.1.8\"\n", "\n", "Formatter.SetPreferredMimeTypesFor(typeof\u003cobj\u003e, \"text/plain\")\n", "Formatter.Register(fun (x: obj) (writer: TextWriter) -\u003e fprintfn writer \"%120A\" x)\n", "#endif\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "[![Binder](../img/badge-binder.svg)](https://mybinder.org/v2/gh/fsprojects/FSharp.Data/gh-pages?filepath=library/CsvFile.ipynb)\u0026emsp;\n", "[![Script](../img/badge-script.svg)](https://fsprojects.github.io/FSharp.Data//library/CsvFile.fsx)\u0026emsp;\n", "[![Notebook](../img/badge-notebook.svg)](https://fsprojects.github.io/FSharp.Data//library/CsvFile.ipynb)\n", "\n", "# CSV Parser\n", "\n", "The F# [CSV Type Provider](CsvProvider.html) is built on top of an efficient CSV parser written\n", "in F#. There\u0027s also a simple API that can be used to access values dynamically.\n", "\n", "When working with well-defined CSV documents, it is easier to use the\n", "[type provider](CsvProvider.html), but in a more dynamic scenario or when writing\n", "quick and simple scripts, the parser might be a simpler option.\n", "\n", "## Loading CSV documents\n", "\n", "To load a sample CSV document, we first need to reference the `FSharp.Data` package.\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 2, "outputs": [], "source": [ "open FSharp.Data\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "The `FSharp.Data` namespace contains the [CsvFile](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html) type that provides two static methods\n", "for loading data. The [CsvFile.Parse](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html#Parse) method can be used if we have the data in a `string` value.\n", "The [CsvFile.Load](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html#Load) method allows reading the data from a file or from a web resource (and there\u0027s\n", "also an asynchronous [CsvFile.AsyncLoad](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html#AsyncLoad) version). The following sample calls [CsvFile.Load](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html#Load) with a URL that\n", "points to a live CSV file on the Yahoo finance web site:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 3, "outputs": [ { "data": { "text/plain": ["HLOC: (76.55, 75.86, 9-Oct-17)", "", "HLOC: (76.03, 75.54, 6-Oct-17)", "", "HLOC: (76.12, 74.96, 5-Oct-17)", "", "HLOC: (74.72, 73.71, 4-Oct-17)", "", "HLOC: (74.88, 74.20, 3-Oct-17)", "", "HLOC: (75.01, 74.30, 2-Oct-17)", "", "HLOC: (74.54, 73.88, 29-Sep-17)", "", "HLOC: (73.97, 73.31, 28-Sep-17)", "", "HLOC: (74.17, 73.17, 27-Sep-17)", "", "HLOC: (73.81, 72.99, 26-Sep-17)", "", "val msft: Runtime.CsvFile\u003cCsvRow\u003e", "", "val it: unit = ()"] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }], "source": [ "// Download the stock prices\n", "let msft = CsvFile.Load(__SOURCE_DIRECTORY__ + \"/../data/MSFT.csv\").Cache()\n", "\n", "// Print the prices in the HLOC format\n", "for row in msft.Rows |\u003e Seq.truncate 10 do\n", " printfn \"HLOC: (%s, %s, %s)\" (row.GetColumn \"High\") (row.GetColumn \"Low\") (row.GetColumn \"Date\")\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "Note that unlike `CsvProvider`, [CsvFile](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html) works in streaming mode for performance reasons, which means\n", "that [CsvFile.Rows](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-runtime-csvfile-1.html#Rows) can only be iterated once. If you need to iterate multiple times, use the [CsvFile.Cache](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-runtime-csvfile-1.html#Cache) method,\n", "but please note that this will increase memory usage and should not be used in large datasets.\n", "\n", "## Using CSV extensions\n", "\n", "Now, we look at a number of extensions that become available after\n", "opening the [CsvExtensions](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvextensionsmodule.html) namespace. Once opened, we can write:\n", "\n", "* `row?column` uses the dynamic operator to obtain the column value named `column`;\n", "alternatively, you can also use an indexer `row.[column]`.\n", "\n", "* `value.AsBoolean()` returns the value as boolean if it is either `true` or `false` (see [StringExtensions.AsBoolean](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-stringextensions.html#AsBoolean))\n", "\n", "* `value.AsInteger()` returns the value as integer if it is numeric and can be\n", "converted to an integer; `value.AsInteger64()`, `value.AsDecimal()` and\n", "`value.AsFloat()` behave similarly.\n", "\n", "* `value.AsDateTime()` returns the value as a `DateTime` value using either the\n", "[ISO 8601](http://en.wikipedia.org/wiki/ISO_8601) format, or using the\n", "`\\/Date(...)\\/` JSON format containing number of milliseconds since 1/1/1970.\n", "\n", "* `value.AsDateTimeOffset()` parses the string as a `DateTimeOffset` value using either the\n", "[ISO 8601](http://en.wikipedia.org/wiki/ISO_8601) format, or using the\n", "`\\/Date(...[+/-]offset)\\/` JSON format containing number of milliseconds since 1/1/1970,\n", "[+/-]() the 4 digit offset. Example- `\\/Date(1231456+1000)\\/`.\n", "\n", "* `value.AsTimeSpan()` parses the string as a `TimeSpan` value.\n", "\n", "* `value.AsGuid()` returns the value as a `Guid` value.\n", "\n", "Methods that may need to parse a numeric value or date (such as `AsFloat` and\n", "`AsDateTime`) receive an optional culture parameter.\n", "\n", "The following example shows how to process the sample previous CSV sample using these extensions:\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 4, "outputs": [ { "data": { "text/plain": ["HLOC: (76.550000, 75.86, 10/9/2017 12:00:00 AM)", "", "HLOC: (76.030000, 75.54, 10/6/2017 12:00:00 AM)", "", "HLOC: (76.120000, 74.96, 10/5/2017 12:00:00 AM)", "", "HLOC: (74.720000, 73.71, 10/4/2017 12:00:00 AM)", "", "HLOC: (74.880000, 74.20, 10/3/2017 12:00:00 AM)", "", "HLOC: (75.010000, 74.30, 10/2/2017 12:00:00 AM)", "", "HLOC: (74.540000, 73.88, 9/29/2017 12:00:00 AM)", "", "HLOC: (73.970000, 73.31, 9/28/2017 12:00:00 AM)", "", "HLOC: (74.170000, 73.17, 9/27/2017 12:00:00 AM)", "", "HLOC: (73.810000, 72.99, 9/26/2017 12:00:00 AM)", "", "val it: unit = ()"] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }], "source": [ "open FSharp.Data.CsvExtensions\n", "\n", "for row in msft.Rows |\u003e Seq.truncate 10 do\n", " printfn \"HLOC: (%f, %M, %O)\" (row.[\"High\"].AsFloat()) (row?Low.AsDecimal()) (row?Date.AsDateTime())\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "## Transforming CSV files\n", "\n", "In addition to reading, [CsvFile](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html) also has support for transforming CSV files. The operations\n", "available are [CsvFile.Filter](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html), [CsvFile.Take](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html), [CsvFile.TakeWhile](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html), [CsvFile.Skip](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html), [CsvFile.SkipWhile](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html), and [CsvFile.Truncate](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html). After transforming\n", "you can save the results by using one of the overloads of the `Save` method. You can choose different\n", "separator and quote characters when saving.\n", "\n" ] } , { "cell_type": "code", "metadata": { "dotnet_interactive": { "language": "fsharp" }, "polyglot_notebook": { "kernelName": "fsharp" } }, "execution_count": 5, "outputs": [ { "data": { "text/plain": ["val it: string =", "", " \"Date\tOpen\tHigh\tLow\tClose\tVolume", "", "9-Oct-17\t75.97\t76.55\t75.86\t76.29\t11386502", "", "6-Oct-17\t75.67\t76.03\t75.54\t76.00\t13959814", "", "5-Oct-17\t75.22\t76.12\t74.96\t75.97\t21195261", "", "4-Oct-17\t74.09\t74.72\t73.71\t74.69\t13317681", "", "29-Sep-17\t73.94\t74.54\t73.88\t74.49\t17079114", "", "28-Sep-17\t73.54\t73.97\t73.31\t73.87\t10883787", "", "27-Sep-17\t73.55\t74.17\t73.17\t73.85\t19375099", "", "22-Sep-17\t73.99\t74.51\t73.85\t74.41\t14111365", "", "19-Sep-17\t75.21\t75.71\t75.01\t75.44\t16093344", "", "15-Sep-17\t74.83\t75.39\t74.07\t75.31\t38578441", "", "\""] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }], "source": [ "// Saving the first 10 stock prices where the closing price is higher than the opening price in TSV format:\n", "msft.Filter(fun row -\u003e row?Close.AsFloat() \u003e row?Open.AsFloat()).Truncate(10).SaveToString(\u0027\\t\u0027)\n" ] } , { "cell_type": "markdown", "metadata": {}, "source": [ "## Related articles\n", "\n", "* [CSV Type Provider](CsvProvider.html) - discusses F# type provider\n", "that provides type-safe access to CSV data\n", "\n", "* API Reference: [CsvFile](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvfile.html)\n", "\n", "* API Reference: [CsvRow](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvrow.html)\n", "\n", "* API Reference: [CsvExtensions](https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-csvextensionsmodule.html)\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": ".NET (F#)", "language": "F#", "name": ".net-fsharp" }, "language_info": { "file_extension": ".fs", "mimetype": "text/x-fsharp", "name": "polyglot-notebook", "pygments_lexer": "fsharp" }, "polyglot_notebook": { "kernelInfo": { "defaultKernelName": "fsharp", "items": [ { "aliases": [], "languageName": "fsharp", "name": "fsharp" } ] } } }, "nbformat": 4, "nbformat_minor": 2 }