From t.hammond at nature.com Thu Jul 19 12:12:52 2007 From: t.hammond at nature.com (Hammond, Tony) Date: Thu Jul 19 12:16:52 2007 Subject: [Otmi-discuss] Welcome to "otmi-discuss" In-Reply-To: Message-ID: Welcome to "otmi-discuss". This mailing list is for public discussion of the Open Text Mining Interface (OTMI). The Open Text Mining Interface (OTMI) is an initiative from Nature Publishing Group (NPG). It aims to enable scholarly publishers, among others, to publish their full text for indexing and text-mining purposes. It provides for a range of structured text disclosure options from word vectors (lists of word occurrences with frequency counts) to text 'snippets' in non-narrative order to uninterrupted full text. We would like to encourage all to contribute to the discussion and to make OTMI a successful initiative for everybody - publishers and consumers alike. Useful Links: * Public Discussion (this list) - - subscribing - mail archives * Private Feedback - * Wiki - Resources (Specifications, Code, Information, Etc.) - Cheers, Tony ******************************************************************************** DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents. Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication. Macmillan Publishers Limited Registered in England and Wales with registered number 785998 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS ******************************************************************************** From t.hammond at nature.com Thu Jul 19 12:41:34 2007 From: t.hammond at nature.com (Hammond, Tony) Date: Thu Jul 19 12:43:22 2007 Subject: [Otmi-discuss] What's New on the OTMI Wiki Message-ID: The wiki 'opentextmining.org' is a public forum for exchanging information about OTMI and opening up discussion on OTMI. This post lists below some changes made since we last blogged on OTMI development in a post to Nascent back in February. Do go and take a look at the wiki and consider contributing. tags: spam, unicode, relax-ng, xsd, gem Cheers, Tony = Access control We set up the wiki as a public utility with public read/write access but, as is the way of the digital world, the wiki has been regularly targeted by spammers. To address this we first introduced a very lightweight security measure - the requirement to set up an account in order to post. Unfortunately, this was not enough to stop spam attacks and we are currently resorting to limiting write access to approved IP addresses. We are also looking into more robust yet flexible measures. Read access remains public. Feel free to post (publicly or privately) any suggestions for how best we might manage access to the wiki while preventing spam. = Changes to spec We have continued to make changes to the OTMI spec. 1. Changed version numbering to standard 0.0.0 style 2. Added new attribute 'data/@version' 3. Added new attributes 'vectors/@number', 'snippets/@number' 4. Added new element 'table/title' = Schemas As part of the site overhaul we have upgraded the reference grammar (in ABNF) and have added the following concrete schemas: 1. Relax NG (Compact) 2. Relax NG 3. W3C XML Schema 4. DTD = Ruby gem The demo generator Ruby script 'gen_otmi.rb' has been repackaged as a Ruby gem: 'otmi-0.4.2.gem'. Reasons for this are twofold: 1. To simplify script distribution and installation 2. To ease code management Code changes include the following: 1. Changed version numbering to standard 0.0.0 style 2. Added in command line options 3. Modularized the file 4. Repackaged as a Ruby Gem otmi-0.4.2.gem - Added LICENSE, INSTALL and README files 5. Changed sort order on vectors - now case insensitive 6. Regex processing - Added support for Unicode characters - Fixed error which removed errant [<>&'"] chars 7. Added early support for unknown DTD - Added tag: for atom:id when unknown DTD ******************************************************************************** DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents. Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication. Macmillan Publishers Limited Registered in England and Wales with registered number 785998 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS ******************************************************************************** From t.hammond at nature.com Fri Jul 20 05:47:28 2007 From: t.hammond at nature.com (Hammond, Tony) Date: Fri Jul 20 05:52:32 2007 Subject: [Otmi-discuss] Publishing Article on OTMI Message-ID: Hi: I'm thinking or writing an article about OTMI. (Anyone care to co-author?) I'm really looking for some feedback about where to publish. Also, what issues should we be covering? We would like the article to stand out as a useful reference point both for scholarly publishers and text miners. We have had some success with a couple of previous papers [1,2] which have been widely cited. We would like to reprise that for OTMI. It has been our intention to position OTMI as a proposed industry standard. OTMI is not meant to be a closed schema but rather a framework which can allow publishers to open up their content to the degree that they feel comfortable with and that is consistent with their business models. It is primarily a set of compromises to bridge the worlds of publisher and text researcher. Cheers, Tony [1] doi:10.1045/december2004-hammond The Role of RSS in Science Publishing: Syndication and Annotation on the Web, Tony Hammond, Timo Hannay, and Ben Lund, December 2004 Available from http://www.dlib.org/dlib/december04/hammond/12hammond.html. [2] doi:10.1045/april2005-hammond Social Bookmarking Tools (I): A General Review, Tony Hammond, Timo Hannay, Ben Lund, and Joanna Scott, April 2005 Available from http://www.dlib.org/dlib/april05/hammond/04hammond.html. ******************************************************************************** DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents. Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication. Macmillan Publishers Limited Registered in England and Wales with registered number 785998 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS ******************************************************************************** From t.hammond at nature.com Fri Jul 20 06:46:31 2007 From: t.hammond at nature.com (Hammond, Tony) Date: Fri Jul 20 06:49:59 2007 Subject: [Otmi-discuss] Licensing OTMI Content Message-ID: Hi: As far as referencing licenses from an OTMI document, there is this Atom License Extension Internet-Draft which looks like it should do nicely: Atom License Extension draft-snell-atompub-feed-license-11.txt http://www.ietf.org/internet-drafts/draft-snell-atompub-feed-license-11.txt This allows a link to be placed to a license page in an Atom Feed or Entry document as (and here I've also included a rights info element as well): Copyright (c) 2005. Some rights reserved. This feed is licensed under a Creative Commons Attribute-NonCommercial Use License. It contains material originally published by Jane Smith at http://www.example.com/entries/1 under the Creative Commons Attribute License. (This example was taken from the above I-D, Sect. 2.3. "Example".) Note that the I-D was approved by the IANA in May '07, see https://datatracker.ietf.org/idtracker/draft-snell-atompub-feed-license/ That's the easy part. Now we just need a license. A license that presumably would allow for non-commercial use with limited rights of redistribution. We want users also to be able to reproduce text snippets along with any annotations as long as there is appropriate attribution. We're currently looking into the problem of how best to license our OTMI files but meanwhile would be interested in receiving any suggestions or pointers as regards user needs. Cheers, Tony ******************************************************************************** DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents. Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication. Macmillan Publishers Limited Registered in England and Wales with registered number 785998 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS ********************************************************************************