plotdevice-manual/src/tut/Serialization.html at master · plotdevice/plotdevice-manual

152 lines (130 loc) · 9.51 KB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html><head>
  <title></title>
</head><body>
  <div class="article">
    <p>In the last few chapters we've seen how you can store values in <a href="Variables.html">variables</a> and group them into collections of <a href="Collections.html#lists">lists</a> and <a href="Collections.html#dictionaries">dictionaries</a>. As your scripts grow more complicated you'll find yourself working with a shared set of these datastructures that you access repeatedly.</p>
    <p>When just starting out, it's natural to include your data right in the script (for instance by building a list of words or a dictionary mapping names to coordinates). But after these intialization blocks grow beyond a certain size, your script can get a little ungainly.</p>
    <p>Once your data has taken on a life of its own, it makes sense to "serialize" it (i.e., "write it out") to a file. You can think of the file as holding the freeze-dried version of a variable. PlotDevice gives you an easy way to "rehydrate" files using the <a href="../ref/Misc.html#read()">read()</a> command. In addition to reading <a href="Strings.html#Unicode">Unicode</a> text, <i>read()</i> can also unpack structured data from a pair of useful formats:</p>
      <li><span class="message">JSON</span> for arbitrarily-nested dictionaries &amp; lists</li>
      <li><span class="message">CSV</span> for speadsheet-style "tabular" data</li>
    <p>When you call <i>read()</i> with a file path as its first arg, it will use the file exension to recognize <code>.json</code> or <code>.csv</code> files and give you options for how they are unpacked into variables.</p>
    <h1>Using <code>json</code> for Datastructures</h1>
    <p>Over the last decade or so, the <a href="http://json.org">JSON</a> format has become something of a lingua-franca for exchanging structured data between systems. Its popularity owes a lot to how naturally its syntax and semantics map onto the native collection classes of the so-called "dynamic" languages.</p>
    <p>In fact, the representations of most values look <em>identical</em> to those used in Python literals. This makes json a great choice for moving custom data out of your scripts and into a file. For instance, here's a list of numbers in json:</p>
    <pre>[1, 2, 3, 4, 5]</pre>
    <p>Looks just like a Python list, right? Now here's a dictionary (note that it's a little different):</p>
    <pre>{"address":"123 Elm St.", "apt":null, "garage":false}</pre>
    <p>The structure of the dictionary itself looks just like Python's <code>{"key":val}</code> syntax, but you'll notice the values are "spelled" a little differently. First, Python's <code class="kw">None</code> value should be written as <code>null</code> inside the json file. Second, <code>true</code> &amp; <code>false</code> aren't capitalized. When you <i>read()</i> from the file, these will be translated back to their Python equivalents automatically. Also keep in mind that strings in json <em>always</em> use double-quotes.</p>
    <p>Things start to get interesting when you combine nested containers to represent more-complex "records". For instance, if we were writing a nostalgic computer game, we might deal with data whose json representation looked like:</p>
  "player":{
    "room":"Flood Control Dam #3",
    "score":53,
    "inventory":["lamp", "leaflet", "sword"]
  "thief":{
    "room":"West of House",
    "score":null,
    "inventory":["egg"]
  "moves":153
    <p>If we save this text into a file called <code>zork.json</code>, we can access it in PlotDevice by reading it into a variable. All of the "fields" in the data can then be accessed using normal dictionary syntax:</p>
game = read("zork.json")
print("Game saved after %i moves" % game["moves"])
print("(%i points so far)" % game["player"]["score"])
if "lamp" in game["player"]["inventory"]:
    print("Player has not yet been eaten by a grue...")
>>> Game saved after 153 moves
>>> (53 points so far)
>>> Player has not yet been eaten by a grue...
    <p>When you <i>read()</i> from a json file, you can also supply an optional keyword argument: <code>dict</code>. This lets you use one of the "specialized" dictionaries like <a href="../ref/Misc.html#adict()">adict</a> or <a href="../ref/Misc.html#odict()">odict</a> if that's more convenient. For instance, we can use an "Attribute Dictionary" to cut down on some of the punctuation noise in the previous example:</p>
<pre>game = read("zork.json", dict=adict)
print("Game saved after %i moves" % game.moves)
print("(%i points so far)" % game.player.score)
if "lamp" in game.player.inventory:
    print("Player has not yet been eaten by a grue...")
    <p>An "Ordered Dictionary" is an appropriate choice if the preserving the file's key ordering is important to you:</p>
print("normal:", list(read("zork.json").keys()))
print("ordered:", list(read("zork.json", dict=odict).keys()))
>>> normal: [u'player', u'moves', u'thief']
>>> ordered: [u'player', u'thief', u'moves']
    <p class="double-dagger">Behind the scenes, the <i>read()</i> command is using the <code class="kw">json</code> module from the standard library. Take a look at the <a href="https://docs.python.org/3/library/json.html">official docs</a> to learn about generating json as well as parsing it.</p>
    <h1>Using <code>csv</code> for Tabular Data</h1>
    <p>While json is great for the kinds of structured-data you generate through code, data that comes from the "real world" often started its life in a spreadsheet or SQL table. Rather than having a variable structure, these formats are rigidly row &amp; column-oriented. Comma-separated Value files represnt these "tabular" structures in a simple text format. Each line of text corresponds to a row, and columns are separated by <code class="str">","</code> characters.</p>
    <h2>Reading from rows &amp; columns</h2>
    <p>In a somewhat contrived example, imagine we have a file called <code>timestable.csv</code> with the following set of rows and columns:</p>
    <pre>0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30
0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40
0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50
0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60
0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70
0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80
0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100</pre>
    <p>When we load this file using <i>read()</i>, it returns a list with one element for each of the rows. Each row is itself a list containing the column values as its elements. You can then use list-index notation to read from particular a particular cell or even pull out a "slice" of values within a row:</p>
    <pre>mult = read('timestable.csv')
print("%i rows and %i columns" % (len(mult), len(mult[0])))
print("6 times 8 is", mult[6][8])
print("multiples of 9:", mult[9][2:10])
>>> 11 rows and 11 columns
>>> 6 times 8 is 48
>>> multiples of 9: [u'18', u'27', u'36', u'45', u'54', u'63', u'72', u'81']
    <p>Note that the parser doesn't do <em>any</em> interpretation of the contents of the cells. Every value is read in as a unicode string (whether it's number-like or not) and it's up to you to call <i>int()</i> or <i>float()</i> to perform the conversion.</p>
    <h2>Reading from "records"</h2>
    <p>Often, each row represents a distinct "object" and the columns correspond to "attributes". For instance, here's a csv-formatted table of data relating to the months of the year. Each row represents a single month and has attributes for the number of days, and the name in a few different languages. We'll save the following into a file called <code>months.csv</code>:</p>
    <pre>days, english, french, german
31, January, Janvier, Januar
28, February, Février, Februar
31, March, Mars, März
30, April, Avril, April
31, May, Mai, Mai
30, June, Juin, Juni
31, July, Juillet, Juli
31, August, Août, August
30, September, Septembre, September
31, October, Octobre, Oktober
30, November, Novembre, November
31, December, Décembre, Dezember</pre>
  <p>A common convention in csv files is to treat the first row of the document as a "header" row. Rather than containing data, its values label what each column represents. If you set your file up in this manner, the <i>read()</i> command makes it easy to unpack each row into a dictionary with keys set to the names in the header row. All you need to do is set the <code>cols</code> arg to <code class="kw">True</code>:</p>
year = read("months.csv", cols=True)
dec = year[-1]
print("%i months in a year" % len(year))
print("The month of", dec['english'])
print("lasts for %i hours" % (24 * int(dec['days'])))
>>> 12 months in a year
>>> The month of December
>>> lasts for 744 hours
  <p>Just as with json files, you can tell <i>read()</i> to use a particular kind of dictionary if you don't want the standard one. An <a href="../ref/Misc.html#adict()">adict</a> will let you access the columns with dot-notation:</p>
  <pre>year = read("months.csv", cols=True, dict=adict)
print(year[9].german)
>>> Oktober
  <p>An <a href="../ref/Misc.html#odict()">odict</a> will let you address columns by order as well as by name:</p>
  <pre>year = read("months.csv", cols=True, dict=odict)
june = year[5]
print("column order:", list(june.keys()))
print("column values:", list(june.values()))
>>> column order: [u'days', u'english', u'french', u'german']
>>> column values: [u'30', u'June', u'Juin', u'Juni']
</body></html>
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

Serialization.html

Latest commit

History

Serialization.html

File metadata and controls