The remaining 2% were extracted correctly, but they appear to be typos.   

If you're trying to clean the data, including fixing misspellings, sounds like http://openrefine.org/ might help.  I'm happy to give it a shot.