开发者

What is a simple visualization tool to show word counts?

开发者 https://www.devze.com 2023-04-05 14:14 出处:网络
I have a text file which has the count of how many times does a phrase appear inside a corpus. The file looks like this, with the phrase and its count separated by \"=\":

I have a text file which has the count of how many times does a phrase appear inside a corpus. The file looks like this, with the phrase and its count separated by "=":

phrase1=100
phrase2=156
... and so on

What is a good simple visualization tool that can take this file (or a slightly modified version of this), and provide me a nice visualization in form of bubbles, where the bubble size is proportional to the count of the phrase. I would prefer the开发者_运维百科 phrase be written inside the bubble.


The type of plot you referred to in the OP (bubble plot) is also referred to as a balloon plot.

The title of your question is directed to the more general problem of intuitively displaying word frequency in a given text. Given this, perhaps it's worth mentioning that the infographics gurus are critical of bubble plots because the plot is based on mapping data values to circle areas.

Unfortunately, the same gurus haven't agreed on a plausible set of alternatives (as far as i know).

The best alternative to a bubble plot to show term frequency, that i can think of, is usually referred to as a tag cloud.

On his blog, Statistics, R, Graphics, and Fun, Yihui Xie, has written an excellent tutorial for creating tag clouds using R. His tutorial is excellent for two reasons--it's nicely written with step-by-step code, and the result is beautiful.

See also this Post on R Bloggers for a tutorial on creating a better tag cloud.

But if a bubble (aka balloon) plot is what you want, here you go.

They are simple to create in R. There is meticulously detailed step-by-step tutorial for creating and polishing Bubble Charts on the excellent Flow Data site.

In addition, the R Package gplots (available on CRAN) includes a function balloonplot for plotting these directly.

From the Flowing Data Site:

What is a simple visualization tool to show word counts?


Hum, I am not sure I completely understand you idea of Bubble graphics. For a lot of phrases it does not look feasible to me. Have you looked at GraphViz?

I have done similar project to count the words in Wikipedia:

What is a simple visualization tool to show word counts?

The best way I know is to use double log scale. You can probably add some phrases on the graph. I created all graphics here with Xmgrace.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号