A java list randomizer
This week I came across what seemed like a simple programming problem. From a large list of items I wanted to create a smaller list with random items copied from the larger list. The real world equivalent of would be something like bring me four random records from your CD collection. Sounds simple? Well, there was an additional requirement on that I put on the solution, and that was that the smaller list should contain no duplicates.
There is a Random class in java that returns pseudorandom numbers. However, there is no guarantee that when calling Random.nextInt() a number of times the same number will not be returned twice. In other words, the simple strategy filling a small list with elements from the large list picked at random with the help of Random.nextInt() will probably produce a result list when the same element is added twice. Not good enough.
One defense against that problem could be to keep track of which numbers the random generator has returned and simply discard any duplicates. For most cases that strategy is probably efficient, but sometimes it could lead to having to discard a whole bunch of random numbers. Moreover, execution times of such a solution would differ a lot depending on which numbers Random.nextInt() returned.
Another solution would be to copy all of the large list to a temporary list and remove each element taken to the smaller list from the temporary list. That way, no item would be added twice to the result list. However, if the source list is long, say a million entries, creating a temporary list to fetch holding all the million entries just to extract 100 random entries would be inefficient.
The elegant solution to this problem in my mind would instead be to create a list of offsets into the large list as long as the small list. The list of offsets would then be modified to emulate the effect of having a copy of the large list and removing entries from it. If the first entry of the offset list is offset 1, and the second offset also is 1, modify that to 2 instead, the element that would have been at position 1 if the original element at that position was removed.
I wrote a class implementing this solution, Randomizer.java. Feel free to use it or modify it anyway you wish.
Filed under Geeky, Programming | Comment (0)The joy of unnecessary optimizations
The overwhelming majority of all professional programming is about solving a large number of simple problems. The intellectually challenging problems are rare and in many situations solving a problem with a complicated but efficient solution is hard to justify. It is much cheaper to buy a faster computer with more memory to run your program than it is to write a smarter program that uses less resources.
However, I really enjoy the challenge of writing efficient and elegant code that solve a particular problem, so sometimes when I get a bit of free time and want to do something relaxing I spend time writing solutions to small programming puzzles that I run across in my day job working for Voxbiblia.
The next post contains the solution to one such puzzle and the solution I just wrote. It might be interesting to some.
Filed under Geeky | Comment (0)pwhash in Ruby
I spent some time this weekend re-implementing my pwhash functionality in ruby. I don't have much experience with ruby. I got some exposure to it when doing some work for johnlook a while back, but when writing this code it became apparent that I had some gaps in my knowledge.
Learning new programming languages is an interesting thing to do. I've done it a few times now and if the language is good it gives you a few new perspectives and new ideas on how to be a better programmer. I must say that ruby is a nice acquaintance. The learning curve is a bit steeper than with languages like python (or maybe I'm just getting old) but many things are elegant and I hope to get to work more with it in the future.
Anyway, without any further ado I give you pwhash.rb. Feel free to use it in any way that is compatible with GPL3. I'm fully aware that I have yet to master the style and details of ruby, so if you have any criticisms or ideas on how to improve upon it, feel free to drop me a line.
Filed under Cryptography, Geeky, Programming | Comment (1)pwhash, password hashing in java
As promised, here is the code to a Java implementation of the principles of password hashing that I outlined in my previous post. I'll put it on a proper project page later on, but for now the full distribution can be downladed as pwhash-0.9.zip, the binary jar can be found as pwhash-0.9.jar and the source code with documentation can be found at PasswordHasher.java.
Included in the distribution is also a Base64 implementation, Base64.java, that I wrote. The fact that Sun hasn't included it in Java from version from the very beginning is a mystery to me. My implementation might not be the fastest or the most robust one around but it is quite readable and preforms okay.
Filed under Geeky, Programming | Comment (1)How to best protect your users’ passwords
It seems like this blog has turned into more and more of a programming blog, and here is yet another step in that direction. Perhaps somewhat boring to many, but hopefully useful to some.
Online services generally use usernames and passwords to identify their users. The simplest way of doing that is to save the information in clear text, perhaps in a database or in a text file. When a user logs in, look up the stored password and compare it with the one supplied by the user. If they match, authentication is successful. A simple solution, but a problematic one. If the username and password information gets in the hands of the wrong people. Since many user the same password over and over again it is likely that the username-email-password combinations can be used to log into other services.
This is a real problem. Writing secure web applications is difficult, and security problems that gives an attacker access to login data can be introduced not only by your own code but also by third party libraries and frameworks.
Because of this it is a good idea to make it somewhat more difficult to use the password information for an attacker using some sort of scrambling method. Many people call such methods password encryption, but encryption implies that its possible to decrypt the information with the right key, and keys can be lost so it is better to make the process one way. The basic idea is to take a plaintext password and change it into something called a hash that can not be reversed back into the plaintext. However, the scrambling process needs to be repeatable so that a plaintext password can be verified to match the password that was once used to create the hash.
To help in this process there is a family of cryptographic functions that called cryptographic hash functions. They work in a way that a variable length input is turned into a fixed length output in a way that it's very difficult to a) find two inputs that generate the same output and b) find the input given a specific output. Assuming that the cryptographic hash function works as advertised, shouldn't this should solve all our password storing problems? If we store a cryptographic hash of the user supplied password using for example the SHA-1 algorithm, an attacker that gets hold of our login data can't run SHA-1 backwards to get the passwords. All he has is a list of hash values, but at the same time we can repeat the SHA-1 function when a user needs to authenticate and compare the hashes of the new and old password. If they match they must be the same.
This would be an efficient method if it were not for one little detail: Users choose bad passwords. Some use password, some use their first name some a1. There are lists available with common passwords, and once an attacker puts a computer to the task of trying out passwords many of them gets broken. Worse yet, since the SHA-1 algorithm is such a common one, there is almost certainly hard drives out there filled with pre-calculated SHA-1 values for all common passords and even the ones for all possible password combinations shorter than for example 8 characters. With such pre-calculated values any short or simple password can be found in seconds.
To solve the second problem, the one with pre-calculated hash values, something called a salt value is used. A salt is a random number that is added to the plaintext password before it is encrypted. If the salt can have say a billion possible values, someone doing pre-calculated hash values need to do a billion times more pre-calculations and have a billion times more storage to store the hashes. The salt value can be stored in an unaltered form together with the hash value, and be used when verifying a password by simply doing the same hash operation with the same salt a second time. Another benefit of adding salt to the password is that if you have a large number of users, the attacker can't reuse each password guess with all the users, since their salt values differ.
The problem with easy to guess passwords is much more difficult to solve. The only thing that helps a bit, besides educating users in methods for choosing better passwords, is to make it more time consuming to do one hash calculation. That way it takes longer to try out millions of common passwords and password combination and guessing right will take longer, hopefully too long to be worth it. To make it take a bit longer to calculate the hash you can simply repeat the cryptographic hash function over and over again. Doing it once on my dual core desktop computer takes less than a millisecond. Doing it a thousand times increases the time spent calculating one hash value to about 200 milliseconds.
So, to take good care of your users' login information you should:
- pay attention to security on your servers. That includes operating system security, backup management as well as avoiding misstakes in your own code.
- encourage them to use good passwords.
- store hash values of the passwords instead of plaintext versions
- use a good cryptographic hash function to hash them
- use a large enough and random enough salt value when hashing
- repeat the hash function until you reach a reasonable tradeoff between efficiency and difficulty of repeating it by an attacker.
Soon, I'll publish some java code I've written that implements recommendations 3-6. But that's for another day, now I need to put this computer to sleep.
Update 090330: Since I wrote this post I have published the code for two implementations of these recommendations, in Java and Ruby.
Filed under Cryptography, Geeky | Comment (1)Sun’s Java MP3 plugin is no friend of ID3 tags
The Java programming environment has a framework for reading audio files, that is extensible and has the ability to to handle new audio formats not originally supported by the standard platform. It turns out that Sun has released a plugin that adds playback support of the popular MP3 audio format. However, a few days back I learned that Sun's plugin doesn't seem to recognize the Bible MP3 files that we sell at Voxbiblia.
When looking into the problem I found out that the audio format identification functionality doesn't play well with the ID3v2 metadata format that we use at Voxbiblia. The ID3 tag enables users to organize our MP3 files into whole books or even whole Bibles on their computers or portable MP3 players, and it even enables them to read the actual Bible text of the passage recorded in a specific audio segment. As you might imagine the ID3 functionality is quite useful, and also close to universally used and accepted in popular playback products such as iTunes and Windows Media Player, so I think that it's kind of strange that Sun's MP3 plugin doesn't at least support it by skipping over the tag is kind of surprising.
However, it is not impossible to do just that yourself. If you open the MP3 file yourself and skip forward to after the end of the tag before you hand over the InputStream to the AudioSystem framework it can identify and decode the MP3 stream correctly.
So, I wrote some code that did just that. It's a little bit more tricky to than you might think, because the ID3 format encodes the tag length information in an unusual format, but other than that the strategy is quite straightforward.
The class can be downloaded here: MP3Identifier.java. Enjoy.
Filed under Geeky, Programming | Comment (0)From Ruby on Rails to Java
I have been tasked with making some extensions and changes to a system written i Ruby on Rails with some parts written in Java. For reasons I will not write about at this time we have decided that we want to move away from Ruby on Rails and instead develop using java and the excellent Spring framework.
Leaving one development framework for another is often a painful experience, where lots of code needs to be thrown away and rewritten in one step. For this particular situation doing that massive rewrite, technology change has been something we wish to avoid and here are some info on how we plan to do that.
The first step was to move both the java and the ruby environment into the same namespace from the point of view of the web browser. Since we already use the Apache httpd as a front end for the Ruby environment—executing it via mod_fcgid—this was a simple matter of configuring the same apache httpd instance to proxy all requests to the java environment. That way we can make relative links from pages created in he ruby environment to pages in the java environment.
The next problem to tackle was session management. More specifically, when a user logged in using Ruby on Rails we and then navigated to a page served by the java environment we needed to propagate the information about the logged in user to the java code, hopefully without doing any drastic changes to the ruby code.
It turns out that doing that was not that difficult. The way our Ruby on Rails environment is configured it uses Active Record to store session information associated with a specific cookie in the web brower in the database. Since both ruby and java lives on the same webserver from the point of view of the brower, any cookies set by rails is also sent by the browser as a part of the reqests that ends up in the java environment.
I have then written code that uses the cookie to look up the session data in the database from java. That data consists of a Base64-encoded binary data using the Marshal.dump() facility of Ruby. It turns out that data is not particularly difficult to parse relevant bits from. Looking for the ASCII string of the key of our user id value in the session object in Ruby, then look for the next double qoute char and parse all ASCII digits before the next double qoute occurs seems to be a fully workable solution.
The code to do this can be looked at in RailsSessionIntegrationHelper.java. If you want to get rid of the Spring Framework dependency, just replace the method getSessionData() with one that does raw JDBC instead, and remember to close your connection when you're done.
Filed under Geeky, Programming | Comment (0)Image resizing in java
Doing good image resizing in your favourite software development environment shouldn't be hard. After all, lots and lots of software that has a graphical user interface of some kind needs to do image resizing. However, in java it isn't as easy to do as it should be. I had some resizing needs, more specifically I needed code to resize a large black-and-white image into thumbnail size.
After googling a bit I came up with a recipe from the official Java 2D FAQ at sun.com and used that. After all, the creators of Java should know how to do it right. However, I was surprised to find out that the visual result of the resizing was terrible. Have a look for yourself:

I started out with this letter a in black and white. When resizing that one to a version 40 pixels wide with the code that sun suggests, it looks like this when magnified:

As you see, there is some grayscale smoothing going on but not much and the overall impression is quite jagged. Before anyone asks, yes I'm using the bilinear interpolation option. Compared to the result when resizing in GIMP, ImageMagick or just about any other tool the result is terrible. So i tried around, googled and looked at code here and there.
I was on my way to accept that java just didn't do this, and restort to calling a command line tool from my web application when I found a piece of code that compares the speed and results of different scaling methods. It turns out that if you use the getScaledInstance() method of the java.awt.Image base class with the Image.SCALE_SMOOTH as last parameter, the result looks much better.
Here is a blown up version of a 40 pixels wide rescaling using that method instead:

Ah, much better. Why is this? I don't know. If there is anyone out there that can give me details on why this is I'm more than happy to be educated.
So, if you want to copy my method, please have a look at ImageResizer.java. The version calling a verbatim copy of the suggested solution from Sun's FAQ is in the method sunResize() and the better looking version is in resize().
Filed under Geeky, Programming | Comment (0)