September 4, 2013
David Woolls, CEO of CFL Software Limited, presents, "Finding Without Searching". Watch it here.
View the slides for this seminar [PDF]
Abstract: When searching in large collections of long unstructured text documents knowing what you need to ask a regular search engine is not always or even generally possible. This project seeks to explore the potential of using entire documents as a starting point to find the most relevant companion documents in a collection. We will describe the use of linguistic principles and the Semantic Web RDF format as a base to provide a common framework for heterogeneous text types: contracts, patents, e-mails, academic articles etc. Such a methodology generates a very large number of queries per document and creates a graph analytic problem when looking for the most relevant documents because of the number of potential links. The project seeks to discover the time and resource parameters required to deliver accurate and timely results using YarcData's Urika, known as Sherlock at PSC, which has been specifically designed to tackle such complex graph analytics using shared memory and multi-threaded technology. The aim is to uncover commercial, government, and academic applications using large data collections such as Wikipedia and US patents as a starting point.