This is Google's cache of http://wiki.yandex-team.ru/YandexLabsDev/MLRQ109plans. It is a snapshot of the page as it appeared on 3 Mar 2011 21:22:24 GMT. The current page could have changed in the meantime. Learn more

Text-only version
 
Вики Стафф Jira Рассылки Статистика Документация Переговорки Админы
Войти
 
/ YandexLabsDev/MLRQ109plans

Q1 2009 Objectives for Yandex Labs MLR

Владелец:
arkady
Последние исправления:
18 февраля 2009, 22:15

Deployment of Owner Specificity Features  

  • status: seeing pfound improvement with owner aggregated click view features
  • specifically smoothed owner level query entropy
  • straight forward deployment strategy
  • expected delivery: mid January
  • actual delivery:
    • 2009-01-20 - specificity.host, and specificity.owner tables on the sdf cluster
    • 2009-02-03 - checked into production click processing pipeline Andrey Khropov
  • status:
    • COMPLETE

User Sessions Analysis Framework  

  • unification of redir and reqans logs (added access logs)
  • user session and event sequencing (UES tables)
  • status:
    • PERL prototype complete
    • Wiki specification complete
    • Production implementation transferred to Igor Kurleon with change in scope (light weight (vs) complete UES logs)
    • Igor committed to stable first version complete 2009-02-20

User Sessions Features  

  • SERP derived duration features
    • mean time on page
      • query independent
      • query dependent
    • mean time on page of owner
      • query independent
      • query dependent (if there is a deployment methodology)
    • expected delivery: end of February (query independent), end of March (query dependent)
    • status:
      • PERL prototype complete
      • C++ implementation will be complete 2009-02-24

  • Session Weighted Click Features
    • navigational target (only click in session)
    • last click in session
    • skipped url discounting (click under)
    • ...
    • expected delivery: end of March
    • status:
      • PERL prototype complete (these features are included in the initial SERP prototype)
      • C++ implementation will be complete 2009-02-27 (ahead of schedule)


Added Objectives

URL Match Features  

  • Developed detailed ideas and research plan
  • Fedor performing actual work, driving the project and evaluation utility (only consulting at this point)
  • status:
    • working out production implementation issues for character n-gram matching

GBRTree modeling framework  

  • dynamic variable threshold identification
  • avoid [0,1] normalization of variables.
  • complete, Yandex! coding standard compliant (for the most part)
  • needs validation
  • status:
    • 2009-02-18 updated version checked into SVN - supports depth and support constrains on tree building, and cross validation
    • verifying correctness

Expanded feature evaluation framework  

  • initial/internal version complete (2009-02-28)
  • enhanced exploratory data analysis of existing features
  • status:
    • Dmitry has initial version ready for evaluation

Evaluation of Dmitri Pavlovski's Host Graph features  

  • based on relevance performance this may develop into a larger project
  • initial evaluation complete
    • factor_pfound_boost indicates there is improvement associated with some of the factors
    • gbrtree models does not utilize factors
    • treenet model generation using above framework shows marginal utility
  • status:
    • including as part of exploratory data analysis effort

Training Set Sampling Methodology Review  

  • generate a probability model on the utility of obtaining a judgment for an un-judged query-url pair
  • status:
    • determining utility of approach

Additional Features  

  • YABAR Features (server response time)
  • Per query click distribution features (click show log processing)
  • status:
    • exploration, research, prototyping