Changelog Format
@ June 14, 2010, 10:06 p.m.
Filed under: Code Tech Frustration Fedora
This has been bugging me fore a while. Many projects and products have a changelog. It's great! We can see in a file what changes have been made from release 1 to release 2. Wouldn't it be great to watch an upstream changelog file and use something like Buildbot that would trigger a build on a new release. Then my brain started working. It told me that humans are much better at parsing information provided in different textual formats or markups. Let me explain ....
Let's take a look at the victims project. Even though victims doesn't have a special changelog, we will follow the general scm changelog information. Victims has a changelog like so:
Based off this information we can easily create a parser! We care about the date, author/email, description and the release (tag). Through the magic of a little bit of regex the following works decent enough ...
Now we can parse changelogs! Yay! Oh, but then our brain explodes in fear since this is not the only project out there. Surely everyone uses the same format! Let's use nmap as a second project example.
Well that isn't so bad! With a little regex we could ... wait ... if I have to do this twice with two different projects am I going to need to do this many, many more times before I create Skynet^H^H^H^H^H^Han uber parser smart enough to figure out what accent, dialect, markup, etc.. a changelog may be in? It sure seems that way!
This is when another thought entered by brain (TWO IN ONE DAY!!!): Surely someone else has thought of this. There must be a commonly used format that shares this information for easy inclusion. As it turns out, I could only find one format for this and it doesn't exactly match. The project I'm talking about is doap. While the project does seem interesting, it seems to focus more so on information about a project and it's services and not so much about project releases and changes that have happened between those releases.
Long story long .... am I out of luck? Is there not a format in the works to deal with release information such as this in an open way? If there really isn't, is anyone interested in creating a format? It seems to me that this would be quite useful for package maintainers, system administrators and developers. Hit me up on identi.ca or twitter if you know or a format or want to chat about what one would look like.
digg it
seed it
del.icio.us
ma.gnolia
Log in to post comments.
Filed under: Code Tech Frustration Fedora
This has been bugging me fore a while. Many projects and products have a changelog. It's great! We can see in a file what changes have been made from release 1 to release 2. Wouldn't it be great to watch an upstream changelog file and use something like Buildbot that would trigger a build on a new release. Then my brain started working. It told me that humans are much better at parsing information provided in different textual formats or markups. Let me explain ....
Let's take a look at the victims project. Even though victims doesn't have a special changelog, we will follow the general scm changelog information. Victims has a changelog like so:
2010-05-20 Steve 'Ashcrow' Milner* setup.py: added archivers module to the setup script [4cd8f0133b44] [tip] 2010-05-18 Steve 'Ashcrow' Milner * README, src/victims/__init__.py, src/victims/archivers/__init__.py: rpm is now listed as a useable archive closing #8 [e71ad437f9f4]
Based off this information we can easily create a parser! We care about the date, author/email, description and the release (tag). Through the magic of a little bit of regex the following works decent enough ...
(\d{4}-\d{2}-\d{2}) (.*) <(.*)>\n\n.*:\n[ ]*(.*)\n[ ]*(.*)
Now we can parse changelogs! Yay! Oh, but then our brain explodes in fear since this is not the only project out there. Surely everyone uses the same format! Let's use nmap as a second project example.
# Nmap Changelog ($Id: CHANGELOG 18109 2010-06-14 18:48:07Z drazen $); -*-text-*- o [NSE] Added additional vulnerability checks to smb-check-vulns.nse. These checks are intrusive and have MS06-025, MS07-029 designations. o [NSE] Added dns-cache-snoop.nse by Eugene Alexeev. This script does cache snooping by either sending non-recursive queries or by measuring response times.
Well that isn't so bad! With a little regex we could ... wait ... if I have to do this twice with two different projects am I going to need to do this many, many more times before I create Skynet^H^H^H^H^H^Han uber parser smart enough to figure out what accent, dialect, markup, etc.. a changelog may be in? It sure seems that way!
This is when another thought entered by brain (TWO IN ONE DAY!!!): Surely someone else has thought of this. There must be a commonly used format that shares this information for easy inclusion. As it turns out, I could only find one format for this and it doesn't exactly match. The project I'm talking about is doap. While the project does seem interesting, it seems to focus more so on information about a project and it's services and not so much about project releases and changes that have happened between those releases.
Long story long .... am I out of luck? Is there not a format in the works to deal with release information such as this in an open way? If there really isn't, is anyone interested in creating a format? It seems to me that this would be quite useful for package maintainers, system administrators and developers. Hit me up on identi.ca or twitter if you know or a format or want to chat about what one would look like.
digg it
seed it
del.icio.us
ma.gnolia
Log in to post comments.

