Garagiste Invoice Fetcher

About a year ago, I found out about Garagiste, and, since then, I have spent too much money buying wine. That wine started piling up in the basement, and since it was getting hard to keep track of it all, my obsessive organizational self decided to create a spreadsheet to manage it.

A few months ago, I went to a small business conference (Small & Special) and listened to Eric LeVine talk about his website, Cellar Tracker. That seemed like a much better solution than my spreadsheet, so I made a mental note to give it a try. (His talk about how he started his business is worth watching even if you don’t like wine.)

A few weeks ago, I finally had some time to play around with Cellar Tracker. Despite the “throwback” website, the UI was really usable and the AJAX-based search made it easy to find the bottle of wine I wished to catalog and enter it into my virtual cellar. I was immediately hooked, as the website provided quick access to reviews, tasting notes, etc, and it was a lot easier to manage and access than my ghetto spreadsheet.

I now wanted to get ahead of the game and start entering Garagiste purchases before I took delivery of them. But it would be time consuming to pull that information from all the emails (which I did for my initial data upload). I figured this was a good opportunity to dust off my programming skills (which have been languishing since I became a manager 18 months ago), learn a new language (Ruby), and solve a (somewhat manufactured) problem.

The result is below, and, although it likely won’t win any programming awards, I am pretty happy with it as “my first Ruby program”. It takes your Garagiste O.A.R.S. URL (which changes weekly) and fetches the invoice data and generates a tab-delimited text file with your purchases. You can then convert into Excel and filter it, etc. I immediately found it useful to find out how many bottles are ready for pickup at the warehouse (classified, but yikes, where will I put them?) and as an easier way to enter data into Cellar Tracker. I guess the next step would be to automatically upload it, but that’s likely overkill, and I’m worried about data verification issues (it’s nice to manually check that what you’re entering is correct).

Thanks to this post for providing a good sample script from which to learn Ruby / web scraping and to Eric Levine for creating a kickass website. I guess it’s time to stop being a freeloader and pay up.


#!/usr/bin/env ruby

# Simple Ruby program to fetch purchase information from Garagiste and
# generate a tab delimited file, suitable for import into Excel.

require ‘rubygems’
require ‘mechanize’

def main
page_count = 1
more_pages = 1

while (more_pages)
url = get_url(page_count)
more_pages = process_url(url)
page_count = page_count + 1
end
end

# Construct the URL based on the current page
def get_url (page_count)
urlbase = “http://garagistewine.com/oars/?page=”
# Put in the part of the URL after the numeric page number (it’s specific to your account and changes weekly)
urlend = “”
url = urlbase + String(page_count) + urlend
return url
end

# fetch and process the specified URL
def process_url (url)
agent = WWW::Mechanize.new
agent.user_agent_alias = ‘Mac Safari’

page = agent.get(url)

body = page.body

more_pages = nil

if body
more_pages = process_html(body)
end
return more_pages
end

# parse the HTML document and print out the line items
def process_html (body)
doc = Hpricot(body)

# Find the div for the table and process it
content_div = doc.search(“//div[@id='thing-content']“)
if not content_div.empty?
process_invoice_table (content_div)

# See if there is a next page link
next_page_a = doc.search(“a[@class='next']“)
new_url = nil
if not next_page_a.empty?
link = next_page_a.first.get_attribute(“href”)
if link
new_url = “http://garagistewine.com” + link
end
end
return new_url
end
end

# Process the invoice table and print out the line items
def process_invoice_table (content_div)
tab = content_div.search(“table”).first

body = tab.search(“//tbody[@class='records']“)

body.search(“tr”).each do |tr|
quantity = tr.search(“td[@class='quantity-column numeric']“).inner_text.strip()
description = tr.search(“td[@class='description-column ']“).inner_text.strip()
price = tr.search(“td[@class='price-column numeric']“).inner_text.strip()
status = tr.search(“td[@class='status-column ']“).inner_text.strip()
date = tr.search(“td[@class='created_at-column sorted']“).inner_text.strip()
paid_cell = tr.search(“td[@class='paid-column ']“)
paid = “No”

checked = paid_cell.search(“input[@checked='checked']“)

if checked.first
paid = “Yes”
end

printf “%s\t%s\t%s\t%s\t%s\t%s\n”,date,quantity,description,price,status,paid
end
end

main

2 Responses to “Garagiste Invoice Fetcher”

  1. Eric LeVine Says:

    Very fun!

  2. meagan Says:

    just how many bottles of wine ARE in the warehouse, joe? love.

Leave a Reply

jf’s blog