Archive for May, 2009

13
May
09

libxml Extra content at the end of the document

Ruby libxml parser that I use to process large xml files in SAX mode refused to process a file that looked perfectly valid, throwing ‘Extra content at the end of the document’ error somewhere in the middle of the file. It turned out that it disliked control character \x0B (vertical tab), which is not allowed in XML according to the spec.

To simply remove the vertical tabs from the file (or, rather, replace them with spaces), I tried using sed like this

sed s/\x0B/\ /g file.xml

but I found out that \xXX syntax is not supported by OSX sed version, which is a shame, so I used a ruby script, which, to my surprise, was quick enough to process a 800 MB file.

output = File.open("out.xml", 'w+')
File.open('file.xml').each{|p| output.puts p.gsub(/\x0B/, ' ')}

10
May
09

TextMate: lstat – No such file or directory

Sometimes in TextMate I get exceptions like this when trying to run unit tests:

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/pathname.rb:420:in `lstat': No such file or directory - /Users/evgeny/Projects/project/test/unit/test (Errno::ENOENT) from
<... stack trace skipped ...> 

The exception above is caused by the call to realpath() in path_to_url_chunk() in Bundles/Ruby.tmbundle/Support/RubyMate/run_script.rb.

def path_to_url_chunk(path)
  unless path == "untitled"
    file = Pathname.new(path).realpath.to_s
    "url=file://#{e_url(path)}&"
  else
    ''
  end
end

There are two problems here. First, the file variable is not used, so the hyperlink to the failed method in textmate output would be broken as a result. Second, realpath() raises an exception because for some reason (I didn’t dig deeper) the current directory is ‘/path/to/test/unit’ and path is ‘test/unit/my_test.rb’, so realpath() can’t find the test.

The modified version of the function works better:

def path_to_url_chunk(path)
unless path == "untitled"
Dir.chdir "../.."
file = Pathname.new(path).realpath.to_s
"url=file://#{e_url(file)}&"
else
''
end
end

It’s not a proper solution because the either the file path or the working directory should be corrected before this function is called. If you know a better solution to this problem, please leave a comment.




Follow

Get every new post delivered to your Inbox.