filling a database with historic weather data

I’m one of those people that just loves inclement weather. Maybe it’s because of where I grew up, but I’ve always adored a good thunderstorm. Whenever a storm is heading for my area, I check various sites online to get the latest radar and what-not, but sometimes you just aren’t near a computer (gasp!). Not only that, but most of radar loops you find only go back so far… maybe an hour or two. That’s why I decided to build my own solution, using data that the NOAA provides for free. Pretty much everything here is written in Python, the greatest language ever. :)

So first of all, I had to decide how to actually collect the data. There are a few methods, but one issue I faced is that I normally look at this stuff on my laptop. What if I have my laptop turned off? It’s just not a reliable platform to use for mining data. Happily, I have a dreamy webhost that gives me shell access. So there you go… the perfect spot to house the fruitage of my scripts. So what I’m going to do is build a script that runs regularly, contacts a weather site to scrape a radar image, and contacts the NOAA to get the latest weather data… temperature, barometer, etc. At that point, I want to insert this stuff into a database housed at the same host.

I’m going to take you through this as if you’re fairly new to Python, and the plan is to be really easy to understand. If it isn’t… well… you get what you pay for, right? :) Seriously though, if you end up with questions, just post them at the end of this article and I’ll answer them.

So I logged into my shell account at my webhost via SSH. I created a new text file called ‘radardl’ using the command vi radardl, and then made it executable with chmod +x radardl. As for the code that goes in that file…

[python]
#!/usr/bin/python2.4
import MySQLdb, time, os
from urllib import urlopen, urlretrieve
[/python]

Nothing special here… just set it up to execute using your Python executable, and import the necessary modules. MySQLdb is used for working with the database, and urllib is used to get the web pages / images that we’ll need.

[python]
db = MySQLdb.connect(host=’dbhost’, user=’dbuser’, passwd=’dbpasswd’, db=’dbname’)
c = db.cursor()
[/python]

This is the basic connection to your database. Obviously, you’ll need to set one up somewhere so that you’ll actually be able to do this. Replace the placeholders with your specific connection info. You’ll also need to create a new table to hold your data… I used something like, “create table weather (recorded timestamp primary key, temperature_string varchar(50), temp_f float, relative_humidity float, pressure_in float, heat_index_f float, windchill_f float, radar varchar(250))“. For simplicity, go ahead and use that command to create your table and you can modify it after we’re done.

[python]
j = urlopen(‘http://www.weather.gov/data/current_obs/KTUL.xml’)
[/python]

Ok, this is where we download the current weather observations. Take a look at this page and select your state, then find the weather station closest to you. For me, it’s KTUL in Oklahoma. Just find your local page, and replace the url in the code. While you’re looking at it, take a look at all of the data that’s provided in there. It’s a wealth of observation data! I’m only capturing a couple of things with this code, but you can easily add what you want.

[python]
stats = {}
for l in j.readlines():
stats[l[l.find("<")+1:l.find(">")]] = l[l.find(“>”)+1:l.rfind(“<")]
[/python]

Now, you could use something fancy like the wonderful Elementtree library, but in my case my host doesn’t have it installed and I couldn’t do it myself. So instead I used the above code to give me a similar result. Basically, I search for the < and > characters in different ways in order to get the key name and the value. I then store those into the ‘stats’ array. This works great for the XML that the NOAA provides… but keep in mind that XML from other sources may not be so easy to parse.

[python]
rightnow = str(time.time())
j = urlopen(“[INSERT WEBSITE OF YOUR CHOICE HERE]“)

for line in j.readlines():
if line.find(“img src=”) > -1:
img = line.strip()[11:-1]
urlretrieve(img, rightnow+”.gif”)
[/python]

Ok… the thing is, everyone has their favorite weather radar, and I don’t want to advocate one over the other. I’d also prefer not to give this code out and have my favorite radar image source hammered over and over by everyone in the world. So basically… you need to find a site that will predictably show you their most current radar image for your area. Take a look at the page source and look for the line where the image is. As you can see above, I simply loop through every line on the page looking for “img src=”, because my particular source only shows one image on that page. I’ll have to leave it up to your imagination on how to scrape the image out of their code. Or, if the radar images don’t matter to you, you could skip this step altogether!

Incidentally, you might notice the variable ‘rightnow’. That is simply the current unix timestamp. I name my radar images using that variable so I always have a unique name for them.

[python]
if os.path.getsize(rightnow+”.gif”) > 0:
print “Got “+rightnow+”.gif”
c.execute(‘insert into weather values(now(),”‘ + stats['temperature_string'] + ‘”,”‘ + stats['temp_f'] + ‘”,”‘ + stats['relative_humidity'] + ‘”,”‘ + stats['pressure_in'] + ‘”,”‘ + stats['heat_index_f'] + ‘”,”‘ + stats['windchill_f'] + ‘”,”‘ + rightnow + ‘.gif”)’)
else:
print rightnow+”.gif”+” Skipped”
os.unlink(“suckfreemusic.com/radar/”+rightnow+”.gif”)
[/python]

And finally, I check to see if the image size is greater than 0 bytes. You’d be surprised how often empty images are saved. This just makes sure you’re not filling your database with useless information. Assuming it’s a good image, I insert the data from the NOAA’s XML page along with the name of the radar image I retrieved into the weather table.

And there you have it! Just create a cron job to run this script regularly (radar is generally updated every 6 minutes, but weather observations are only updated hourly) and watch this data accumulate in your new weather table. You can then go on to create neat graphs and other fancy things all based on your historic data. Enjoy!

3 Responses to “filling a database with historic weather data”

  1. Anonymous Says:

    “you could use something fancy like the wonderful Elementtree library”

    If you can install individual Python modules on your server, you can just drop in the ElementTree.py and ElementPath.py modules, and import it as “import ElementTree”. In worst case, just put it in your script directory. No need to install the whole thing.

  2. pr Says:

    Crazy… for some reason, everytime I tried that it never worked for me. I definitely couldn’t install my own modules, but dropping it into the script directory wouldn’t work either. Python just never did see the module. Thanks though… now I know. :)

  3. Brian Wigginton Says:

    I did this same thing about half a year ago. I didn’t use a database though, instead I just set up a cron job to run a wget every 15 minutes that downloaded a radar image. All the images were saved in a directory and then I would just use feh in slide show mode to loop through the radars.

    The funniest part is that I forgot about this little script and later found I was out of disk space. Low and behold I had about 3-4 months of continuous radar images saved in my home directory. It was fascinating watching that loop.

    Great article!

Leave a Reply