开发者

A better way to remove blank lines after Nokogiri Node removal

开发者 https://www.devze.com 2022-12-12 04:36 出处:网络
Perhaps this is nitpicky, but I have to ask. I\'m using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove leaves blank lines in the XML. I\

Perhaps this is nitpicky, but I have to ask.

I'm using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove leaves blank lines in the XML. I'm currently using a regex to get rid of the blank lines. Is there some built-in Nokogiri method I should be using?

Here's what I have:

require 'Nokogiri'
io_path = "/path/to/metadata.xml"
io = File.read(io_path)
document = Nokogiri::XML(io)
document.xpath('//artwork_files', '//tracks', '//previews').remove

# write to file and remove blank lines with a regular expression
File.open(io_path, 'w') do |x|
  开发者_运维知识库x << document.to_s.gsub(/\n\s+\n/, "\n")
end


There is not built-in methods, but we can add one

class Nokogiri::XML::Document
  def remove_empty_lines!
    self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
  end
end


This removed blank lines for me;

doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)


Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.

One rather messy solution I found was to reparse the document:

xml = Nokogiri::XML.parse xml.to_xml

Now adjacent text nodes will be merged and you can do regexes on them.

But this looks like it's probably a better option:

https://github.com/tobym/nokogiri-pretty

0

精彩评论

暂无评论...
验证码 换一张
取 消