Tuesday, December 1, 2009

Poetry generator

Continuing off the previous post, here's a nonsense (or poetry) generator. Takes a filename as an argument, optional second argument for the number of letters back to remember. Obviously this would need substantial work to be useful, but I was quite personally entertained by the return on minimal investment. Uncommenting the alphabet line leads to a different style.


#!/usr/bin/env python

import collections
import random
import sys

filename = sys.argv[1]
know = len(sys.argv) > 2 and int(sys.argv[2]) or 3
num = 1000
alphabet = "abcdefghijklmnopqrstuvwxyz \n"

expected = collections.defaultdict(lambda: collections.defaultdict(int))

text = open(filename)
previous = ""
for line in text:
for letter in line:
# if letter not in alphabet: continue
expected[previous][letter] += 1
previous = (previous + letter)[-know:]
text.close()

def weighted_select(weights):
all = sum(weights.values())
index = random.randrange(all)
for letter, weight in weights.iteritems():
index -= weight
if index <= 0: return letter
print "Unexpected", index, weights
return letter

start = random.choice(expected.keys())
thoughts = [letter for letter in start]
for unused in xrange(num):
last = "".join(thoughts[-know:])
thoughts.append(weighted_select(expected[last]))
print "".join(thoughts)


Example:

./textual.py sample_text 4
writing deeper
it is a fail of you


my timid anxious queries bereft a perhaps it is only have unbeknow amazing has and i are


aimless arms


i done side in three
this curious that my lither mary
has anythin is as for me other mocking thought gain three
that still bring i remember
beform
that further being i remains me i are myself in that of timid anxious queries back trail may you is only as reality is made of hearts leaping in her
and want
but as that we happy returning is not word or recallinger serves ive unbeknow how unliking that furthere beauty a simply by its sublimitation has and bring i remains meager servation unrequited you are sunderstanding me to do known
if i find real as real as unto you most days for their bound
accept a perhaps it is under cling me shadows yet unhappier by
far word or soul to real as unto accurate too much discarded a bring lithe one you


what imprehend
my coverabundant lies back trail their ground
and their like apartake u


If anyone with the free time is clever enough to strip crappy poetry from online to use as a generator for this, I'd love to see the results.

The joy of collections

I recently discovered the collections library in python (http://docs.python.org/library/collections.html), which is a treasure trove of various handy data structures. As a fan of math, I had already found myself creating many of these classes and functionalities, and it is good to know others think the same way. These are implemented in C, which obviates any python scratchings I had been using.

In particular, here are a few cool tricks using the defaultdict type, which functions as a dictionary, but with a default value (saving the use of many d[k] = d.get(k, ) ... lines and helper functions).

defaultdict takes a function to call as the initalizer for the default value of a new key, since int() evaluates to 0, and list() evaluates to [], we save ourselves some lambda or helper functions here and use those.

import collections

d_counts = collections.defaultdict(int)
d_lists = collections.defaultdict(list)

sentence = "the quick brown fox jumps over the lazy dog"
for i, letter in enumerate(sentence):
d_counts[letter] += 1
d_lists[letter].append(i)

for letter, count in sorted(d_counts.iteritems()):
print "%s - %s" % (letter, count)
for letter, indicies in sorted(d_lists.iteritems()):
print "%s - %s" % (letter, indicies)


You can also nest the initializers, this code might be appropriate for use in a word generating program.

d_dict_int = collections.defaultdict(lambda: collections.defaultdict(int))

text = open("sample_text")
previous = ""
for line in text:
for letter in line:
d_dict_int[previous][letter] += 1
previous = (previous + letter)[-3:]
text.close()

# print the number of times any letter appeared after its (up to) 3 previous letters
for antecedents, letters in sorted(d_dict_int.iteritems()):
print antecedents
for letter, count in sorted(letters.iteritems()):
print " %s - %s" % (letter, count)


You can even be clever and allow for arbitrarily deep dictionaries, for those who don't like to initialize anything.

def inf_dict():
return collections.defaultdict(inf_dict)
d_infdict = inf_dict()
d_infdict[4][5]['foo'][6]['bar'] = 100


One caveat to note. While checking for the existence of an entry does not create it, referencing it will fill it up (you do not need to assign to it), making it unsuitable for try/excepts. It should be easy enough to catch though, as an IndexError clearly makes no sense.

d = collections.defaultdict(int)
print 4 in d # False
print d[4] # 0
print 10 in d # False
print 4 in d # True