开发者

How do I deserialize classes in Psych?

开发者 https://www.devze.com 2023-02-28 17:04 出处:网络
How do I deserialize in Psych to return an existing object, such as a class object? To do serialization of a class, I can do

How do I deserialize in Psych to return an existing object, such as a class object?

To do serialization of a class, I can do

require "psych"

class Class
  yaml_tag 'class'
  def encode_with coder
    coder.represent_scalar 'class', name
  end
end

yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n" 

but if I try doing Psych.load on that, I get an anonymous class, rather than the String class.

The normal deserialization method is Object#init_with(coder), but that only changes the state of the existing anonymous class, whereas I'm wanting the String class.

Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o) has cases where rather than modifying existing objects with init_with, they make sure the right object is created in the first place (for example, calling Complex(o.value) to deserialize a complex number), but I don't think I should be monkeypatching that method.

Am I doomed to working with low level or medium level emitting, or am I missing something?

Background

I'll describe the project, why it needs classes, and why it needs (de)serialization.

Project

The Small Eigen Collider aims to create random tasks for Ruby to run. The initial aim was to see if the different implementations of Ruby (for example, Rubinius and JRuby) returned the same results when given the same random tasks, but I've found that it's also good for detecting ways to segfault Rubinius and YARV.

Each task is composed of the following:

receiver.send(method_name, *parameters, &block)

where receiver is a randomly chosen object, and method_name is the name of a randomly chosen method, and *parameters is an array of randomly chosen objects. &block is not very random - it's basically equivalent to {|o| o.inspect}.

For example, if receiver were "a", method_name was :casecmp, and parameters was ["b"], then you'd be calling

"a".send(:casecmp, "b") {|x| x.inspect}

which is equivalent to (since the block is irrelevant)

"a".casecmp("b")

the Small Eigen Collider runs this code, and logs these inputs and also the return value. In this example, most implementations of Ruby return -1, but at one stage, Rubinius returned +1. (I filed this as a bug https://github.com/evanphx/rubinius/issues/518 and the Rubinius maintainers fixed the bug)

Why it needs classes

I want to be able to use class objects in my Small Eigen Collider. Typically, they would be the receiver, but they could also be one of the parameters.

For example, I found that one way to segfault YARV is to do

Thread.kill(nil)

In this case, receiver is the class object Thread, and parameters is [nil]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )

Why it needs (de)serialization

The Small Eigen Collider needs serialization for a couple of reasons.

One is that using a random number generator to generate a series of random tasks every time isn't practical. JRuby has a different builtin random number generator, so even when given the same PRNG seed it'd give different tasks to YARV. Instead, what I do is I create a list of random tasks once (the first running of ruby bin/small_eigen_collider), have the initial running serialize the list of tasks to tasks.yml, and then have subsequent runnings of the program (using different Ruby implementations) read in that tasks.yml file to get the list of tasks.

Another reason I need serialization is that I want to be able to edit the list of tasks. If I have a long list of tasks that leads to a segmentation fault, I want to red开发者_StackOverflow中文版uce the list to the minimum required to cause a segmentation fault. For example, with the following bug https://github.com/evanphx/rubinius/issues/643 ,

ObjectSpace.undefine_finalizer(:symbol)

by itself doesn't cause a segmentation fault, and nor does

Symbol.all_symbols.inspect

but if you put the two together, it did. But I started out with thousands of tasks, and needed to pare it back to just those two tasks.

Does deserialization returning existing class objects make sense in this context, or do you think there's a better way?


Status quo of my current researches:

To get your desired behavior working you can use my workaround mentioned above.

Here the nicely formatted code example:

string_yaml  = Psych.dump(Marshal.dump(String))
  # => "--- ! \"\\x04\\bc\\vString\"\n"
string_class = Marshal.load(Psych.load(string_yaml))
  # => String

Your hack with modifying Class maybe will never work, because real class handling isn't implemented in psych/yaml.

You can take this repo tenderlove/psych, which is the standalone lib.

(Gem: psych - to load it, use: gem 'psych'; require 'psych' and do a check with Psych::VERSION)

As you can see in line 249-251 handling of objects with the anonymous class Class isn't handled.

Instead of monkeypatching the class Class I recommend you to contribute to the Psych lib by extending this class handling.

So in my mind the final yaml result should be something like: "--- !ruby/class String"

After one night thinking about that I can say, this feature would be really nice!


Update

Found a tiny solution which seems to work in the intended way:

code gist: gist.github.com/1012130 (with descriptive comments)


The Psych maintainer has implemented the serialization and deserialization of classes and modules. It's now in Ruby!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号