Meeting order
prices and details

Meeting order form

Printable Meeting
order form

Exhibitor briefing

Hotel order site


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Meeting order
prices and details

Meeting order form

Printable Meeting
order form

Exhibitor briefing

Hotel order site

 


 

 

 

 

 

 

This page last changed 01 December 2008

Boston, Massachusetts, April 27-28, 2009

Program

This annual meeting provides a forum and point-of-reference for all those interested in the intricacies of Search and Retrieval. The meeting draws those with a professional interest in search engines -- such as search engine designers and developers -- and those interested in applying search engines in their own professional environments. Search is at the heart of information retrieval; and the Search Engine Meeting provides an annual point of reference as to what is happening in this fast-moving and exciting field.

All presentations are given sequentially; there are no parallel sessions or parallel presentations at this meeting.



  Sunday April 26  

Pre-Conference Tutorial Sunday Afternoon (Stephen Arnold)



  Monday April 27  

THE PAPERS BELOW ARE CURRENTLY NOT IN FINAL PRESENTATION ORDER

Day One Opening Keynote
[Speaker to be announced]


Dmitri Soubbotin
Semantic Engines, New York
The Variety of Goals and Applications of Semantic Approach to Search

This presentation compares different approaches to presenting search results to users. Various types of search queries have been identified based on the user intent. Accordingly, different types of results are identified: a conventional list of links; a hierarchy or a cluster of concepts with underlying links; a direct answer. A multi-document summary of Web sources is introduced as a legitimate type of search result on the example of SenseBot. "Semantic cloud" of key concepts is suggested as a means of controlling the focus of the summary. The idea is to give the user a quick answer fast, obviating the need to drill down into the sources in many cases.
Semantic analysis is discussed as a way to augment traditional search with its page ranking system. Examples of intelligent applications based on the approach are presented. Intersection between semantic search engines and Semantic Web is discussed as a mutually beneficial opportunity. Two major challenges facing semantic systems are the ambiguity of natural languages, and high infrastructure requirements. Some ways to deal with these challenges are discussed.

Diane Burley
Nstein, Canada
A Pragmatic Look at the Semantic Web

Research shows that there are two types of site searchers: those who rely on the search bar and those who are link-dominant, who, like a spider, crawl from page to page using inline links — if those links exist. The challenge with the search bar is that unless the reader types in exactly the words that the journalist used, the story will go undiscovered. A story on “great stuffings for your holiday bird” may not appear if the reader happens to type in the word “dressing” and the story was not tagged properly. Move beyond the realm of synonyms to denotations and connotations and thus is the semantic web — a world filled with literal and figurative associations that could help the readers find what they are looking for —regardless if they know they are looking for it." Tagging” is the simple answer, while rich metadata are the crux to the semantic web.
  The advancement of the semantic web is a transformative time for news sites. If simple tagging seems onerous how is it possible that we could consistently and comprehensively semantically — and more importantly, semantically associate assets — be they article, image, motion or audio? The answer is multifaceted. In this presentation we take a rudimentary look at the components of the semantic web: tagging, taxonomies, authority files, knowledge bases, and look at some of the tools that will help you automatically tag and associate. Further, how can we expose these rich metadata to better create a reader experience? Indeed, how can we expose these metadata on the back end so that we editors can research or package news with greater ease? Does automation obviate the need for mediation? Just how do editors thrive in the semantic web?

Frank Bandach
Eeggi, California
Semantic Coherence and a New Search Paradigm

This presentation discusses the engineering of an indexing-numeric language for the manipulation of semantics, grammar, concept novelty, responsiveness, disambiguation, translation, and its evolution into basic rationality towards a new search engine paradigm.

Kathleen Dahlgren
Cognition Technologies, California
The Puzzle of Semantic Technologies

Semantics is now center stage in search, with various approaches having been proposed. Most current approaches to Web 3.0, or the Semantic Web, primarily tag pages in a tagging language. Others use ontology, so that users can query "car" and see retrievals with "SUV" or "Porsche", or they present users with summaries or pull-downs based on ontology. Still other semantic approaches focus on syntax parsing in order to recover the formal semantics or argument structure of text and query. Another additive approach to semantics is the building of a Semantic Map. A Semantic Map contains word-level and contextual information that enables a search engine to do complete word sense disambiguation, or understanding, at the word level. Our goal should be a complete approach that treats all aspects of semantics, including sense disambiguation, ontology, synonymy, commonsense knowledge, aspect, information to assist in pronoun reference and discourse reasoning and any other information required to replicate full lexical and formal semantic reasoning.

Martin Baumgärtel
bioRASI, California
Advanced Visualization of Search Results: More Risks, or More Chances?

Many products have been deployed and numerous articles have been published about breakthroughs in the visualization of search results. Yet, is search result visualization common practice in everyday information retrieval tasks? This presentation addresses the gap. Results from case studies and from the analysis of human-computer-interaction are presented. Direct user feedback from the visualization of semantic relations is summarized and a general theory concluded. Whether you work on visualization or have investigated visualization technologies/designs to improve the search experience in your environment, this presentation will give you valuable advice, help in maintaining a realistic view and methods to prevent common pitfalls.

Panel: Non-Text Search Technologies: Speech, Images, Video
Chaired by Susan Feldman (IDC)
Speakers to include:
Thomas Wilde (Yahoo, Massachusetts): How Video Gets Found: changing consumer search strategies for audio and video online and implications for content producers

Stephen E. Arnold
AIT, Kentucky
Google Looks Beyond the Laundry List

This presentation presents three of the technologies that are shifting Google from a service which requires the user to enter a query, to a service that presents search within a user's context. Each of these technologies is in use in various Google services. The combination of Google's existing and better-known search methods are complemented by functions that operate automatically or semi-automatically to improve the user experience. First, Google's Chrome is a way for Google to connect the user to Google services and Google services to the user. One key component in Chrome is its ability to track a user's behavior, perform predictive analyses, and give the user access to containers or virtual machines. Chrome is not an operating system; Chrome is a connectivity mechanism that operates regardless of the user's computing operating system or device. Second, Google's janitor technology allows the company to "clean up" structured and unstructured information. One way to use the cleaned up data is to produce an automatic dossier about a person, place or thing. Third, Google's dataspace technology provides an environment in which Google can generate new types of metadata about information processed by Google's indexing system.
   Google has not issued public information about these innovations, but each is disclosed in open source documents such as technical papers, patent documents, and public presentations by Google professionals. The conclusion drawn from this review of three interesting Google innovations from the 2007-08 period is that the company is shifting from key word queries to search-enabled applications. These applications present the user with solutions to information problems, not a laundry list of results.

Francisco Corella and Karen Lewison
Pomcor, Oregon
Searching the Web More Effectively with Multiple Simultaneous Queries

We describe a Web search facility that reduces the time and effort that it takes the user to home in on the desired results for difficult search problems. When the user enters a query the search facility anticipates possible follow-up queries, issues them immediately, and allows the user to browse the search results of the original query and these additional queries simultaneously. Additional queries may include a respelling of the original query, related queries, and/or sub-queries. (By sub-query we mean a query consisting of a subset of the search terms of the original query.) We describe a parallel algorithm that efficiently produces an optimal set of sub-queries and their results in the important special case where the original query has zero results; although it is rare for a query that targets the Web at large to have no results, the zero-result case is important for queries that target a particular site.
   We have built a prototype of such a search facility as a client-side script, implemented on the Adobe Flex platform, and thus running on the Flash plug-in, that accesses the Yahoo search engine via the Yahoo Astra Web APIs library. The Yahoo search engine has not been modified for this purpose, so our innovations are implemented entirely on the client side. We point out, however, that it would be beneficial to transfer parts of the implementation to the server side, and explain how this could be done. The prototype only handles purely conjunctive queries, but we also describe a method for handling general Boolean queries, and we describe an extension of the parallel zero-result algorithm to the general case.

Marguerite Leenhardt
Université Paris 3 Sorbonne nouvelle, CLA2T/SYLED, France
A Study of Evaluative Language in SMS Messages: Towards a Characterization of Opinion

At the moment, the results of tools for analysing information exchange have a significant commercial value. This current study is a textual and linguistic evaluation of a corpus of text messages sent by mobile phone. The approach used aims to bring distributional characteristics under different levels of description language with the aim of modeling the linguistic content of the knowledge contained in the corpus. The aim is to contribute to the characterization of the evaluative language in the SMS. In perspective, we try to put some markers on industrial applications of the analysis of such textual content, especially in relation to marketing applications.
    We support the idea that the subjective knowledge gained on large body of messages can be used for automated analysis of the views contained in brief texts published on the web, such as messages posted on Twitter.


  Tuesday April 28  

Day Two Opening Keynote
David A. Evans
JustSystems Evans Reseach, Pennsylvania
E-Discovery: A Signature Challenge for Search

Corporations increasingly use and retain information only in the form of electronically held data and documents. As a result, the production and sharing of information in legal proceedings throughout the U.S. will depend heavily on techniques for accessing, searching, organizing and analyzing electronic data -- the principal focus of E-Discovery. Large corporations may have terabytes of e-mail and other files spanning many years that are potentially relevant to a case. In response to a court order, an E-Discovery team must identify, assemble, individuate and categorize an organization's files, segregate all "privileged" material (which may be withheld legally), and deliver a minimally comprehensive and exhaustive set of data to the opposing party -- all in a relatively short amount of time. The techniques needed to accomplish such a task necessarily include search, clustering, classification, filtering, social network analysis, extraction, and more -- and no one of these is sufficient. Such requirements challenge our traditional models for search. In particular, the appropriate user models do not reflect the standard "web" or "enterprise" conditions. This presentation explicates the requirements and types of solutions that dominate E-D.

David Milward
Linguamatics, Cambridge, UK
Accessible Knowledge Discovery Using Agile Natural Language-Based Text Mining

This presentation reviews the challenges faced by the pharmaceutical industry and other knowledge-intensive industries in answering business-critical questions using diverse text resources. It discusses a selection of case studies where an NLP-based approach for discovering relevant facts and relationships from unstructured text is delivering significant value - both in terms of improved productivity and in discovering new knowledge by combining information extracted from different sources.

Brian J Buck
RiverGlass, Illinois
The Science of Search: How the Enterprise Intelligence Cycle (EIC) is Critical to Success

For the knowledge professional, there is more to ‘search’ than simply looking up a specific piece of information; t is about information discovery, relevance assessment, analytical summary and reporting in support of the entire intelligence-gathering process.
   The Enterprise Intelligence Cycle (EIC) is one of the most important information processes within an organization. It encompasses everything from the identification of critical information needs, the entire process of information seeking and collection, the incorporation of new information into organizational and individual knowledge models, and the application of human judgment to create the essential intelligence for timely business decision making. Recognizing and optimizing the EIC will be a prerequisite for success in a world increasingly overloaded with the volume, variety, and velocity of unstructured data – as will the adoption on search techniques that truly take into account user context and purpose.

Miles Kehoe and Mark Bennett
New Idea Engineering, California
Search Security Issues for the Enterprise

Enterprise search must factor into account access control and privacy issues, in particular sensitive documents need to be searchable so that they can be shared with the appropriate audience, but not visible to everyone behind the firewall. For example: nobody wants 401K account summaries to display except after appropriate access has been granted, corporate strategy documents relating to outsourcing or layoff plans should not be viewable by all until they are announced. Security must be handled at the document, sub-document and sub-field levels. Here are our best practices for these leading search engines. And here are actual "gotchas" that we have seem at consumer sites and that you can learn from.

Sid Probstein
Attivio, Massachusetts
Intelligent Integration: Combining Search and BI Capabilities for Unified Information Access

Enterprise search technologies are efficient in filtering unstructured content such as emails and documents. While corporate reports and dashboards display transactional database information, there is a disconnect between technologies. How do you integrate these two sources of data to make them more useful? What about important content that exists outside your organization? There is a new generation of innovative technologies that enable the integration of unstructured content with structured data, bringing together enterprise search with business intelligence capabilities. By enabling automatic updates and alerts in real time, these technologies can affect business processes when it matters: at the convergence of business decisions and actions. For example, drug companies could be alerted whenever a product is mentioned in connection with any terms implying “adverse effects.” Consumer goods companies could search blogs for comments on their products by tying their structured product catalog with the ability to analyze unstructured content and apply sentiment analysis.
  This presentation provides real-world examples and explores new tools that combine enterprise search with business intelligence capabilities that provide faster time to value by:

  • Enabling decisions based on an intuitive complete view of your information landscape: both structured and unstructured content including databases, web pages, office documents, email, and media files.
  • Providing a single repository: eliminating the need for jumping from application to application based on the type of question, or format of the information you examine.
  • Enabling users to use a simple search interface to access all your information assets, rather than learning complex BI applications to access structured data.
  • Offering comprehensive connectivity and language support, easy installation and being linearly scalable.
  • Integrating data with key business processes in real-time to affect enterprise-wide processes.

David Seuss
Northern Light, Massachusetts
Using Text Analytics for the Automated Analysis and Discovery of Meaning From Large Stores of Market Intelligence

There has been much recent coverage at places like the Search Engine Meeting on using text analytics for reputation management in brand metric tracking applications, but how do you create systems that assist in business analysis, strategic research and competitive intelligence from volumes of news and market research reports?  For example, rather than merely tracking mentions of your brands and measuring the sentiment toward them, you could find out which technologies your competitors are working on, uncover where your competitors are using pricing to gain market share, and identify what product marketing tactics are being employed in your target markets. Text analytics can greatly assist in this process, but using text analytics for strategic research is different from using  it for reputation management, and requires completely different solutions.  This presentation describes the opportunities and challenges in creating systems for the automated analysis and discovery of strategic meaning from market intelligence content.  It describes what it takes to create such systems and outlines the pitfalls needed to be avoided in developing and deploying them.

Jeff Catlin
Lexalytics, Massachusetts
Taking Search to the Next Stage with the Power of Text Analytics

Enterprise search is still growing, evolving and enhancing, and its main purpose continues to be to help users find the answers to their business questions hidden in a complex myriad of sources.  But many people are beginning to ask "what's next?" for the enterprise search industry.  The answer:  combining search technologies with the fast-evolving area of entity extraction and sentiment analysis.  Extracting important metadata and providing insight to the sentiment of those data compliments enterprise search by helping the user uncover the questions they may not think to ask. In fact, those familiar with text analytics would argue that enterprise search is more important than ever to maintain a competitive edge, and that text analytics will play an increasingly large part in that equation.

Christian Reuschling and Andreas Dengel
German Research Center for Artificial Intelligence, Germany
DynaQ - Dynamic Queries for Document-Based, Personal Information Spaces

The paradigms of common, keyword-based document search engines are often not sufficient for the natural searching attitude of human beings. In most systems, the only possibility for searching is to formulate a query from scratch and obtain the results.
  If we have not found what we were looking for, we usually have to start again, reformulating our query. While it is hard for humans to explain an entity completely, it is easier for us to 'navigate' through the document space step-wise, to have an overview of the current state of the search, and, having several tools at hand to support us, to refine the initial query.
  DFKI has developed an inquiry system called 'DynaQ'. Its aim is to enable searchers to explore their personal information space, supporting them with this step-wise searching paradigm called 'orienteering'. For that, the system offers several tools in order to fulfill the Visual Information-Seeking Mantra "overview, zoom & filter, details-on-demand". Some key features of DynaQ will be demonstrated:

  • Birds eye view of the result list
  • Dynamic query sliders allowing search terms to be weighted, thus dynamically  re-sorting the results on-the-fly
  • Thumbnail generation for indexed documents (pdf, office, etc)
  • Relevance feedback: Queries can be contextualized by marking one or more documents as relevant. Documents that are similar to them will be ranked higher in the result list. Users can choose between two kinds of similarity:
  1. Textual content similarity
  2. Image similarity (for text represented in bitmaps or for image files)
  • Push search dialogue showing details and related documents according to the attribute similarity (e.g., same author, similar full text)
  • Indexing of  all common file formats (e.g., pdf, MS Office, rtf, gif, jpeg) and Emails)
  • Availability of the complete Wikipedia index.

Daniel Tunkelang
Endeca, Massachusetts
Enabling the Information Seeking Process

In the early days of information science, the process of finding information was conceived as precisely that: a process. But the success of commercial search engines had the unfortunate side effect of reducing this process to a guessing game of relevance. Given that search engines are not mind readers and cannot reliably infer a person's intent from a two-word query, we need to remember that information seeking calls for a process, a dialogue between the user and the system.
  This presentation outlines the principles of information seeking as a dialogue and walks though concrete examples that illustrate the principles of human-computer information retrieval (HCIR), a vision that is reshaping approaches to information access. Specifically, the presentation shows how designing an application in terms of bi-directional communication between the user and the system addresses the inherent limitations of conventional search engine approaches.

Peter Noerr
MuseGlobal, California
The Underground Information Ecosystem:  Connectors

How these vital, but fragile, items allow different systems to connect and interact, and how they need to be maintained and supported for the information to flow. Like plumbing, they stay out of sight, but are critical for system integration and services, such as federated searching, content harvesting, semantic mapping, and any activity which requires information from more than one source.


Presentation of the Everett Brenner Award for the Best Paper at the 2009 Search Engine Meeting
Meeting Wrap-up Panel: What we Liked. What we Learned

Two expert industry commentators reflect on what was said during the two days of the 2009 Search Engine Meeting and, with the help of the audience, draw some lessons and conclusions.


Conference Ends at approximately 4.30 pm