Introducing rjmailer
Today I have decided that it is time to publish rjmailer, a programming project that
I have worked with on spare time for the last two years or so. In my own view, rjmailer is the most useful piece of software I have written yet, and I have some faith that in time others will find it useful as well. Thanks to my amazing partner Alex, it even has it's own mascot and webpage to go with the release. I love you man!
rjmailer is a programming library that sends mail. There are some other pieces of software that does that, but they usually hand off their messages to the mail system and don't give much feedback to the user. rjmailer is not like that. It goes out of it's way to provide as much information as possible about the mail delivery and can in many cases give detailed and quick information about failures such as misspelled usernames or domain names.
Lets say you run a web based service that require people to register with some email address. You want to verify that the address is valid, so you send an email to the address that the user provided when signing up and require her to click a link in that message to activate your account. We're all used to this, but there are lots of things that can go wrong. The user can misspell her email address, or there can be some problem with her email server that causes the activation message to bounce. If you are unlucky you lose a member or even someone that can later be converted to a paying customer.
If that sounds interesting, please have a look at rjmailer.org. However, please be warned: this is beta software. It is not yet fully tested, has bugs and will probably lose your mail for the moment.
Filed under rjmailer | Comment (1)pwhash in Ruby
I spent some time this weekend re-implementing my pwhash functionality in ruby. I don't have much experience with ruby. I got some exposure to it when doing some work for johnlook a while back, but when writing this code it became apparent that I had some gaps in my knowledge.
Learning new programming languages is an interesting thing to do. I've done it a few times now and if the language is good it gives you a few new perspectives and new ideas on how to be a better programmer. I must say that ruby is a nice acquaintance. The learning curve is a bit steeper than with languages like python (or maybe I'm just getting old) but many things are elegant and I hope to get to work more with it in the future.
Anyway, without any further ado I give you pwhash.rb. Feel free to use it in any way that is compatible with GPL3. I'm fully aware that I have yet to master the style and details of ruby, so if you have any criticisms or ideas on how to improve upon it, feel free to drop me a line.
Filed under Cryptography, Geeky, Programming | Comment (1)pwhash, password hashing in java
As promised, here is the code to a Java implementation of the principles of password hashing that I outlined in my previous post. I'll put it on a proper project page later on, but for now the full distribution can be downladed as pwhash-0.9.zip, the binary jar can be found as pwhash-0.9.jar and the source code with documentation can be found at PasswordHasher.java.
Included in the distribution is also a Base64 implementation, Base64.java, that I wrote. The fact that Sun hasn't included it in Java from version from the very beginning is a mystery to me. My implementation might not be the fastest or the most robust one around but it is quite readable and preforms okay.
Filed under Geeky, Programming | Comment (1)How to best protect your users’ passwords
It seems like this blog has turned into more and more of a programming blog, and here is yet another step in that direction. Perhaps somewhat boring to many, but hopefully useful to some.
Online services generally use usernames and passwords to identify their users. The simplest way of doing that is to save the information in clear text, perhaps in a database or in a text file. When a user logs in, look up the stored password and compare it with the one supplied by the user. If they match, authentication is successful. A simple solution, but a problematic one. If the username and password information gets in the hands of the wrong people. Since many user the same password over and over again it is likely that the username-email-password combinations can be used to log into other services.
This is a real problem. Writing secure web applications is difficult, and security problems that gives an attacker access to login data can be introduced not only by your own code but also by third party libraries and frameworks.
Because of this it is a good idea to make it somewhat more difficult to use the password information for an attacker using some sort of scrambling method. Many people call such methods password encryption, but encryption implies that its possible to decrypt the information with the right key, and keys can be lost so it is better to make the process one way. The basic idea is to take a plaintext password and change it into something called a hash that can not be reversed back into the plaintext. However, the scrambling process needs to be repeatable so that a plaintext password can be verified to match the password that was once used to create the hash.
To help in this process there is a family of cryptographic functions that called cryptographic hash functions. They work in a way that a variable length input is turned into a fixed length output in a way that it's very difficult to a) find two inputs that generate the same output and b) find the input given a specific output. Assuming that the cryptographic hash function works as advertised, shouldn't this should solve all our password storing problems? If we store a cryptographic hash of the user supplied password using for example the SHA-1 algorithm, an attacker that gets hold of our login data can't run SHA-1 backwards to get the passwords. All he has is a list of hash values, but at the same time we can repeat the SHA-1 function when a user needs to authenticate and compare the hashes of the new and old password. If they match they must be the same.
This would be an efficient method if it were not for one little detail: Users choose bad passwords. Some use password, some use their first name some a1. There are lists available with common passwords, and once an attacker puts a computer to the task of trying out passwords many of them gets broken. Worse yet, since the SHA-1 algorithm is such a common one, there is almost certainly hard drives out there filled with pre-calculated SHA-1 values for all common passords and even the ones for all possible password combinations shorter than for example 8 characters. With such pre-calculated values any short or simple password can be found in seconds.
To solve the second problem, the one with pre-calculated hash values, something called a salt value is used. A salt is a random number that is added to the plaintext password before it is encrypted. If the salt can have say a billion possible values, someone doing pre-calculated hash values need to do a billion times more pre-calculations and have a billion times more storage to store the hashes. The salt value can be stored in an unaltered form together with the hash value, and be used when verifying a password by simply doing the same hash operation with the same salt a second time. Another benefit of adding salt to the password is that if you have a large number of users, the attacker can't reuse each password guess with all the users, since their salt values differ.
The problem with easy to guess passwords is much more difficult to solve. The only thing that helps a bit, besides educating users in methods for choosing better passwords, is to make it more time consuming to do one hash calculation. That way it takes longer to try out millions of common passwords and password combination and guessing right will take longer, hopefully too long to be worth it. To make it take a bit longer to calculate the hash you can simply repeat the cryptographic hash function over and over again. Doing it once on my dual core desktop computer takes less than a millisecond. Doing it a thousand times increases the time spent calculating one hash value to about 200 milliseconds.
So, to take good care of your users' login information you should:
- pay attention to security on your servers. That includes operating system security, backup management as well as avoiding misstakes in your own code.
- encourage them to use good passwords.
- store hash values of the passwords instead of plaintext versions
- use a good cryptographic hash function to hash them
- use a large enough and random enough salt value when hashing
- repeat the hash function until you reach a reasonable tradeoff between efficiency and difficulty of repeating it by an attacker.
Soon, I'll publish some java code I've written that implements recommendations 3-6. But that's for another day, now I need to put this computer to sleep.
Update 090330: Since I wrote this post I have published the code for two implementations of these recommendations, in Java and Ruby.
Filed under Cryptography, Geeky | Comment (1)The standalone gas pump
Imagine that someone invented a standalone gas pump. One that didn't need to be connected to tanks of gasoline at a gas station or even the power grid, but could be used to fill up the tank of your car all by itself. The pump created gasoline from the carbon dioxide in the air around it and didn't produce any waste. Since the pump doesn't require any raw materials and was cheap to produce you could use it practically for free.
Does it sound like a brilliant machine, and a solution to many of our world's problems? I think so, but on the other hand I'm pessimistic about how such a machine would be received by our society.
Why? Because when it comes to information such a machine has been invented, and some part of society seems to be at war with it. That machine is the idea of pee r to peer file sharing on the internet. Digital information can be copied and distributed to millions of users practically for free. I can have access to enormous amounts of high quality music and other media all for just the cost of my computer and my internet connection. The act of accessing that content doesn't stop others from getting it as well. The orignal creator of the content gets to keep it as well and do what she wishes with it.
Yet some multinational media corporations, not unlike the oil companies that make money off gasoline sales, claim that their revenue streams are hurt by this new invention and that they need compensation. Today the Pirate Bay trial began here in Sweden, where the prosecutor together with some lawyers representing US based media companies will try to put four guys in jail for running a website that lets internet users share and download digital content.
If you are making up your mind about if they should be convicted or not, think of them as the entrepreneurs responsible for trying to take the standalone gasoline pump from the inventor out to the world market. Sure, their actions probably hurt the companies that has made obscene amounts of money off selling small plastic discs containing music and movies in the last decade. Just like the standalone gas pump would definitely decrease the revenue of large oil companies and put many, many people working in the oil industry in unemployment.
But when you look at the bigger picture, a world where anyone and everyone can have access to a large part of the media that is produced, that is something really valuable for the whole society, and as a whole that is more valuable than the dramatic downside of media companies not being able to make as much money as they used to.
Would it be reasonable to call using the standalone gas pump stealing? One could argue that you would be stealing the money that Exxon would otherwise get from filling up your tank. If you would want to make a more emotional argument you could argue that , and that a small fraction the money you stole from Exxon would go to Saudi Arabia and that a small fraction of that would go to poor defenseless children crying in an orphanage.
The core of my argument is that no one should be guaranteed that their current business model should be protected from the effects of new technology the way that the media companies is trying to do now. Manufacturers of mechanical calculators were put out of business when the electronic calculators came along. It was painful for the companies and their employees, but for society as a whole we're better off with cheap, smart electronic calculators than with a healthy industry creating expensive, dumb mechanical calculators. This is true when it comes to file sharing on the internet and it would be true if the day would come when someone invents the standalone gas pump. Lets recognise this and change our laws accordingly.
Filed under Uncategorized | Comments (5)Disruptive technology and the future of media
A few months back I got an invite to Spotify, a streaming music service available in Sweden and some other markets, that I have used increasingly since. The basic service is free for invited users and once every ten songs or so they play one or two 30 second commercials. For a fee of 99SEK/month (about $12) you get the service free from commericals and they have a thousands and thousands of albums to choose from.
I must say that this service is brilliant in several ways. When I first started to use it I was happy to discover that the audio quality was totally acceptable and that large parts of my favourite music was available. Now, a bit later, I find myself changing my music listening behaviour due to the nature of the service. You can click on an artist name to see other music by that artist, and you can click on an album name to see what other songs are on that album. This means that you can surf from artist to artist on compilation albums and listen to find new music that you like. This is something that prior to Spotify you could only do using peer to peer file sharing and that was quite a bit less convenient than this solution, putting all legal issues aside for a moment.
So, this way I can search for the title of a film I saw a while back that I liked, Lars and the Real Girl, click on Nat King Cole, find out that an artist named Bebel Gilberto has remixed him. A bit down in the list of albums from that artist I find a collection album named 20 ways to float through walls which sounds kind of cool. One of the artists, Snooze, catches my attention. It turns out I like several songs from the album Goingmobile. All this in a matter of seconds, guided by interesting group names, song names and album covers. Doing the same thing using any other technology that I know of would have taken hours.
One of the most encouraging things about this is that it shows that once the media industry moves past just trying to stop unstoppable file sharing and allow creative people to build something better then they can win their customers back.
I have feeling that the attitude change needed before Spotify could spring into existence would not have happened without the pressure from thepiratebay.org and others.
Filed under Uncategorized | Comment (0)Sun’s Java MP3 plugin is no friend of ID3 tags
The Java programming environment has a framework for reading audio files, that is extensible and has the ability to to handle new audio formats not originally supported by the standard platform. It turns out that Sun has released a plugin that adds playback support of the popular MP3 audio format. However, a few days back I learned that Sun's plugin doesn't seem to recognize the Bible MP3 files that we sell at Voxbiblia.
When looking into the problem I found out that the audio format identification functionality doesn't play well with the ID3v2 metadata format that we use at Voxbiblia. The ID3 tag enables users to organize our MP3 files into whole books or even whole Bibles on their computers or portable MP3 players, and it even enables them to read the actual Bible text of the passage recorded in a specific audio segment. As you might imagine the ID3 functionality is quite useful, and also close to universally used and accepted in popular playback products such as iTunes and Windows Media Player, so I think that it's kind of strange that Sun's MP3 plugin doesn't at least support it by skipping over the tag is kind of surprising.
However, it is not impossible to do just that yourself. If you open the MP3 file yourself and skip forward to after the end of the tag before you hand over the InputStream to the AudioSystem framework it can identify and decode the MP3 stream correctly.
So, I wrote some code that did just that. It's a little bit more tricky to than you might think, because the ID3 format encodes the tag length information in an unusual format, but other than that the strategy is quite straightforward.
The class can be downloaded here: MP3Identifier.java. Enjoy.
Filed under Geeky, Programming | Comment (0)The accidental thief
It turns out that the name of the software I released on saturday was a bit too good, so good in fact that someone had thought of it before. It turns out that there is a much more ambitious piece of software that does the same thing that my software does. To avoid confusion I have renamed, so Voxbiblia's jid3 is now id3j. I'm sorry about any confusion.
Filed under Programming | Comment (0)