Archive for September, 2007

Review of Carbone et al. 2004

2007 Sep 17 in Reviews | Comments (0)

Carbone, M., Gal, Y., Shieber, S., and Grosz, B. (2004). Unifying annotated discourse hierarchies to create a gold standard. In Proceedings of 4th SIGDIAL Workshop on Discourse and Dialogue.
online at Citeseer
online at Google Scholar

This paper discusses work attempting to create more authoritative discourse annotations by automatically combining annotations produced by a few human annotators. It uses the Boston Directions Corpus, which has discourse annotations based on Grosz and Sidner’s theory. They note in the introduction that work has focused on LDS over HDS because of the difficulties in evaluating HDS: annotation is more difficult and takes more time, annotation is more subjective (i.e. lower inter-annotator agreement), and it is unclear what metric to use for measuring agreement or similarity between two annotations. They consider 5 different methods of automatically combining annotations — consensus (full agreement) with and without hierarchical information, majority consensus with and without hierarchical information, and conflict-free union — and compare them to complete union and taking the best single annotation as measured by inter-annotator agreement. They then evaluate the original annotations against the unified annotation, using kappa, recall, precision, and non-crossing brackets. As is typically the case, the methods with high recall had low precision. The high recall methods had high kappa and the high precision methods had high non-crossing brackets scores. The conflict-free union method and flat majority consensus did well on kappa and recall metrics, but the authors suggest the hierarchical majority consensus is better for the purpose of having a high precision, hierarchical gold standard.

They compare the combined annotation with the contributing annotations instead of with other annotations created for the purpose “for the sake of scientific validity”. However, considering that only three annotations are used to create the the gold standard, the similarities used here are probably artificially high, like testing on training data. This method gives no indication of how applicable these comparisons are to other annotations. For example, it would have been better to compare these against similarly prepared annotations unified from the non-specialist annotations.

Translation for Wikipedia: Li Cheng

2007 Sep 12 in Translations | Comments (1)

The following is a translation I have just done, from the Chinese Wikipedia, for the English Wikipedia. Until I submitted this, there was no English article about Li Cheng, one of the greatest Chinese landscape painters. My translation is better than nothing, right?

Since the Chinese Wikipedia is still blocked within China, and I would like to encourage friends and wandering passersby from China to give feedback on my translations, I reproduce the original here in full:

李成(919年-约967年),字咸熙,中国五代宋初画家,青州(今属山东潍坊)人。唐宗室,山水画家,在北宋时期和范宽、关仝并称为“三家鼎峙”,他多作淡墨山水,所谓“惜墨如金”,如在梦雾之中。当时评价他“凡称山水者必以成为古今第一。”,将他的山水和吴道玄的人物相提并论。他主要描绘齐鲁一带山水。后代郭熙等人都是师法他的山水画法。

作品有《盘车图》、《渔乐图》、《寒鸦图》、《山水图》等。宣和御府所藏有一百九十五卷,真伪难辨。只有翟院深的摹本可以乱真,但缺少神气。现存的宋画中有他和王晓合作的《读碑窠石图》。

And here is my version, mostly translation, though borrowing some elements from the pages about Guo Xi and Fan Kuan. You can also visit the current version at Wikipedia.

Li Cheng (Traditional Chinese: 李成; Simplified Chinese: 李成; Hanyu Pinyin: Lǐ Chéng; Wade-Giles: Li Ch’eng) (919–967), style name 咸熙 (Pinyin:Xián Xī), Chinese painter of the Northern Song period, from Qingzhou (now part of Weifang County, Shandong). He came from the Tang imperial family. Li Cheng, Fan Kuan, and Guan Tong together became known as the “three great rival artists”. He did many landscape paintings with diluted ink, known as “treating ink like gold”, which gives the appearance of being in a foggy dream. At that time, he was considered the best landscape painter of all time. He carried on a dialog with Wu Daoxuan through their paintings. He primarily portrayed Shandong area landscape. Later generations, such as Guo Xi, modeled their teaching on his painting methods.

His works include “Jigger”(?), “Joy in Fishing”, “Cold crow”, and “Landscape”. His prefecture maintains that it has archived 195 scrolls, but it is impossible to distinguish genuine ones from copies. Only “Pheasant Courtyard” copybook looks genuine, but lacks expression. One extant painting, “Reading Stele Nest Stone”(?), was a collaboration between him and Wang Xiao.

If you notice points of misunderstanding, or if you can help me understand what 盘车 or 读碑窠石 mean, I would much appreciate your comments.

Review of Hirschberg and Nakatani 1996

2007 Sep 11 in Reviews | Comments (0)

Hirschberg, J. and Nakatani, C. (1996). A prosodic analysis of discourse segments in direction-giving monologues. In Proceedings of the 34th ACL.
online at Citeseer
online at Google Scholar

This article describes an analysis of annotation reliability for the hierarchical discourse segmentation of the Boston Directions Corpus. The Boston Directions Corpus is a set of direction-giving monologues collected by Hirschberg and Grosz. Speakers were prompted to tell another person how to accomplish 9 navigation tasks around Boston. Oral reading samples for the same tasks were also obtained by having the subjects return several weeks later to read aloud transcripts of their own monologues. The transcripts were annotated by linguists familiar with ToBI prosodic annotation conventions and Grosz and Sidner’s theory of discourse structure. In this study, the sub-corpus from one speaker is used to compare the reliability of annotation with versus without access to the audio corpus. Using raw agreement, the kappa coefficient and Flammia’s generalized kappa, the annotation with the audio corpus was shown to be markedly better, bumping kappa from “unreliable” levels near .5 to “reliable” levels near .7. Furthermore, they report on acoustic features that were found to correlate with a phrase’s position within its discourse segment. On average, initial phrases were found to be higher pitch and louder, and have longer pauses after and shorter pauses before. Medial phrases and final phrases had lower pitch and volume and shorter pauses before. They differ in that final phrases are spoken faster and have long pauses after, where medial phrases have short pauses.

None of the results they report are surprising, but they do confirm a wide number of previous studies, using a fairly reliable and quantitative methodology. Of course using audio files helps with identifying discourse structure! But here we see how much it helps — without the audio files, they would not be able to achieve reliable annotation, whereas with it they do. On the other hand, the segments where there is agreement show much the same intonation correlations in the text-only annotation and the text+speech annotation. Again, of course we know there are pauses at discourse boundaries and it’s not too hard to notice that pitch and loudness descend during a discourse segment. But here by being quantitative and comprehensive, they provide a foundation for further studies.

Memedia #25

2007 Sep 6 in Translations | Comments (0)

Memedia es una publicación de base, escrita por blogueros chinos. Aquí publico una traducción para que el pueblo hispanohablante pueda leerla.

草莓周刊(第25期):科学伪科学,和谐吃河蟹,人民非公民

草莓用特殊的效果标记被GFW封锁的链接,请使用Tor等工具访问这类链接。本周“鲜草莓”添加到网站的顶部,你喜欢吗?

Memedia Semanario (No. 25): La ciencia y seudociencia, harmonía come cangrejos, y gente no ciudanos

Memedia usa un símbolo especial para indicar enlaces bloquedas por el GFW. Favor de usar Tor etc para visitar este tipo de enlace.

[科技与互联网]

Blogger’s Day !古人云,天行健,君子当自强不息。不管环境如何恶劣,blogger们的使命和生活,将年复一年地持续,沟镬在网络竹简的纹理中。但是互联网黑洞如此之大,导致国内互联网网站出逃,是否会对国内互联网造成打击呢?WordCamp在北航如期举行,barCampBeijing2007也聚集了很多人气,圆满结束。一周以后,barCampShanghai再启动。

[Ciencia, Tecnología, y el Internet]
Dia del Bloguero! Los ancianos dijeron: Come los cielos mantienen salud por moverse, la persona de noble carácter debe siempre luchar a mejorarse. No importa cuanto malo sea el ambiente, la misión y vida de blogueros continuará año tras año, [???]. Pero si el agujero negro del internet está tan grande que los sitios web domesticos huyan del pais, no será golpe contra el internet domestico?

什么是科学?什么是伪科学?这是亘古以来的命题,也不乏戴帽子和挥棒子的先例。但是这应当不只是一个宽容的问题。随着人类认识到自己的局限性,科学本身也成为了更大范畴的一个子话题。因为科学无法解决所有的问题,也甚至无法自圆其说,所以把任何科学之外的事物定义为伪科学也是违背自然的。科学家被立法保护实验失败,从观念上是进步,否则未来还要面临”革命与反革命”相似的争论。但是“纳米门”、“汉芯门”这样的事件恐怕不仅仅是立法保护可以解决的问题了。 谁懂得维基百科 ?国内一些号称学者的人对维基百科的认识,实在令人质疑他的学术素养 。或许大家更感兴趣的是维基百科打印出来会有多少 。

网络巡警

网络巡警来了!每隔半小时,在13个中国主要的网站上,包括新浪和搜狐,就会出现一个警告动画。快点来吧,到分布全世界的草莓园子来巡逻。在十七大之前后,保持良好的噤声是必要的,所以“强制关闭论坛”和各地主机是史上世上最牛的方法。宁可错封万网,也不放过一个? 中国政府的官员思维还停留在“要是没有网络就好了 ”的阶段。

FeedBurner也像flickr一样被和谐了,紫田之殇中一句话毁灭了成千上万的网站 (diglog 就是其中之一)。人们有点被追着跑的感觉,是祈祷,或接受如此忠告,还是加入到反抗互联网审查的全球网络?或许还有人趁机捞钱,来做“五毛党”。无论如何,很多人都在反问:谁有权力关闭上万网站?为什么他们在法律之外 ?面对强权部门,中国互联网前途实在堪忧。我们为何就不能像肯尼亚一样去高声反对呢。

[社会]China Population in century

公民记者对传统媒体会带来什么样的挑战,何国华先生对此做了专门的论述。 官煤假新闻仍然不断,中国新闻界最高奖项爆出丑闻,伪造报纸居然得了三等奖。CCTV更是令人可笑(还是根本就是新闻道德问题),活人刹那间被说死,幸好“死人”及时做了澄清。

中国人口比例失调和老龄化的问题在2015年后会越来越趋于严重,我们今天的社会服务体系做好准备了吗?有人调侃说“同性恋”可以解决男女比例问题。其实中国的一些其他人口也很值得关注:例如吸烟人口比例全球第一。一个家庭都会为子女做好20年的准备,一个国家的领导者们是否愿意去考虑长久一点。

太平盛世,万众喜迎十七大 ,中国社会科学院的一份调查显示:中国民众认为,最近十年以来中国改革获益最大的是党政官员,而获益最少的是农民和工人。 中国经济问题的症结在以钱为纲 ,郎咸平的观点再次一针见血。而看看令人吃惊的俄罗斯福利制度,可惜中国政府还是高高在上的姿态:“我们已经做得够好了,老百姓实在该为我们的新政起立鼓掌”。西安周至县“地下党小组” 按照党章向党组织反映情况,让上千亩被非法占用的土地得以复耕。我们真的是要“感谢政府,感谢党”呀。其实与科学伪科学的悖论一样,所谓“和谐社会”的理论是有陷阱的,正义才是社会稳定进步的最重要基石。

今年2月3号,沙叶新在香港中文大学演讲时还不忘党恩,号召大学生认真学习温家宝总理讲话 ,最近发现自己竟然被监控好几年了,这是谁干的?谁胆子这么大?敢公然与《宪法》对抗,这样的事情只有王XX政府才干得出!

铁蛋们的成功维权和探险家杨勇的故事不应该被抹杀 ,农民也是公民,必须享受国民待遇,我们不能把农民像猪一样对待,更不能把纳税人的3050亿钱拿去做一件不可能的工程。 在中国历史上,草民是君主专制的牺牲品,人民是新式专政的伪主体,只有公民才是共和国的自由、独立、有尊严的主体。我是公民,不是人民 !
[社会:天灾人祸](总希望是临时的)

家园被毁北京房山两名矿工靠喝自己的尿,奇迹般地在政府宣布停止救援后爬出矿井。大牛说:“全靠我们自己有尿”,其实这中间反映了人和物孰更重要的问题。还是美国人精明,他们的机器人寻找矿工,四个星期过去了,还在不知疲倦地寻找中。而中国的环境危机 更是引起了国际社会的关注,而我们却只能通过收听“敌台 ”才能了解事情的真相。

78岁老人一个月无钱吃饭,愿意持刀抢劫便利店以求入狱温饱,仿佛把我们带到了“万恶的资本主义社会”流浪汉苏比的遭遇中。一场官府纵火,人们的家园被毁,击穿了人类基本善良情感的底线,孩子们会记住,人们也会失去对这样城市的好感,厦门、深圳还有谁。

[商业]

商务部为了民族利益,发出了振聋发聩的声音:诋毁中国制造的实质是贸易保护主义。那么什么是中国制造的实质:大品牌在中国制造产品,中国人的声誉却被拖累受辱,而不是那些品牌。其实全球化之后,政府的监管不力让任何人都无法脱离影响,其实中国自己制造给自己吃用的是更差的产品。人们担心,我们是要保护这样的“中国制造”,还是励精图治,真的做彻底的重生呢? 也难怪有人会感慨中国态度令人失望 。

宏基牵走了美国最大的一只奶牛(Gateway的标志来源于奶牛花纹),将再次超过联想,成为全球第三大PC制造商,也许对联想是暂时的打击,但是未必需要担心。不过真正的冤大头可能是被打得找不到北的易趣网,被Tom收购后重新上路了,也算是有种。可是,新网站竟然不支持不带www的域名访问和RSS,可见还是在晕头转向中。似乎他们根本不了解什么是Web 2.0,因为用户才是打分的主人,就怕他们不买账。研究表明绝大部分网络广告被无视,所以创业公司如何控制节奏

西门子商业贿赂越来越清晰,在中国做生意,跨国大公司都学会了潜规则,中国企业家那艰难的底线 守的住么?但是至于在中国部分的贿赂关联人都有谁,也许未必能够调查清楚了。分众传媒(FMCN)的关联交易调查以及推迟季报等问题正在面临听证和调查挑战,这可能不仅仅是江南春的一次门槛,还是中国众多公司在国际规则下的集中问题表现?
[生活]月食

据说错过就只能再等四年的红色月全食又来啦(好像从小到大很多次这样的警告),可惜的是,全国很多地方多云无法看到。很多小朋友,可能就是从“天狗吃月亮”的故事中开始认识宇宙,希望他们的未来能够是自由和参与探索的社会。不过神奇的目击UFO事件可总是扑朔迷离,难以捉摸。

猪肉继续涨,沙僧对孙悟空说:“大师兄大师兄,现在二师兄的肉比师傅都贵了喔!”。似乎猪肉对很多中国人的生活还是很重要的,连做清蒸鱼都需要放猪肉,有需求才有价格。不过老外看了中国的菜名翻译可真是要晕菜或恐惧,类似 Red burned lion head(“红烧狮子头”)。北京市当局为了迎接奥运开始提倡统一的菜名翻译法,对老外点菜应当所帮助。不过也有人认为,“为什么要一味迎合外国人呢?

[教育]

中英文学习之间的障碍令很多初学者感到苦恼 ,你的感觉呢? 教科书改变不了历史,更改变不了人,因为人都会长大,包括这位最小的Blogger。

[摘草莓]
草莓的写作方法当然和传统媒体不同,但是任何人都可以掌握其要领。 [本期结束]