Research
Andrei's research area covers scalable semi-supervised learning
with graphs, and combination thereof with other learning techniques,
applied to NLP problems, particularly on resource-poor languages. He
works under the direction of
Katrin Kirchhoff.
Publications
A. Alexandrescu and K. Kirchhoff, "Graph-Based Learning for
Statistical Machine Translation", Proceedings of the North American Chapter of
the Association for Computational Linguistics - Human Language
Technologies (NAACL HLT) 2009 conference [PDF] [BibTeX].
A. Alexandrescu and K. Kirchhoff, "Graph-Based Learning for Phonetic
Classification", Proceedings of The 2007 IEEE Automatic Speech
Recognition and Understanding (ASRU) Workshop [PDF] [BibTeX]
A. Alexandrescu and K. Kirchhoff, "Data-Driven Graph Construction for
Semi-Supervised Graph-Based Learning in NLP", Proceedings of
HLT-NAACL
2007 [PDF] [BibTeX]
A. Alexandrescu and K. Kirchhoff, "Factored Neural Language Models",
Proceedings of HLT-NAACL
2006 [PDF] [BibTeX]
A. Alexandrescu and K. Kirchhoff, "Factored Neural Language Models",
UW EE Technical Report UWEETR-2006-0014
[HTML abstract]
[PDF]
[Gzipped PS] [BibTeX]
Doctoral Dissertation
Andrei's dissertation (a.k.a. thesis) is a tad unusual because it
includes as-of-yet unpublished research (usually the doctoral
dissertation is an expansion of the author's published
research). Also, the text aims at being more readable than others, but
it is unclear whether it succeeded at that. Doctoral dissertations are
highly specialized and need to build extensive background, which
inevitably ends up in referring to previous work that the putative
reader would need to absorb in addition to the dissertation proper.
Below is a quick list of the contributions of the dissertation and
other items of possible interest:
- An alternative proof of convergence for iterative label
propagation. Compared to the original proof in Jerry Zhu's
thesis [PDF] (pages
6-7), the proof in the dissertation rigorously uses only the minimal
requirements for convergence, while remaining simple and terse.
- A data-driven approach to graph construction. That
approach uses a supervised classifier that provides features for the
graph-based learner.
- A framework for applying graph-based learning to structured
inputs and outputs, in a formalization that is applicable to a
large variety of tasks. Experiments are done on a real-world
translation task.
- An in-place label propagation algorithm that is always faster
than the original iterative algorithm (experimentally converges
in roughly one third of the number of steps).
- A multicore label propagation algorithm that uses
parallel processing and benign data races to distribute work on
label propagation.
- A graph reduction algorithm that reduces the size of the graph
by orders of magnitude without affecting the result of label
propagation.
- A description and analysis of the inverted index
data structure, along with an efficient search algorithm.
- A thorough overview of kd-trees with
proofs and algorithms. The proofs include a fix of a 40-years old
bug: in their seminal analysis of kd-trees (An
algorithm for finding best matches in logarithmic expected
time), Friedman, Bentley, and Finkel describe the requirements
that a distance function must fulfill in order to be usable with
kd-trees. These properties include symmetry and monotonicity, but
oddly enough the authors forgot the essential requirement that the
coordinate distance applied to identical values yields zero:
fi(x, x) = 0 for all pertinent x in all dimensions
i. Follow-up work on kd-trees also don't mention this constraint (of
which necessity is obvious in hindsight). It was about time to fix
that bug.
- An original algorithm called DynTrie that optimizes
string kernel computations over a set of strings, which
experimentally is three times faster than existing approaches.
- Descriptions of experiments conducted using label propagation
on tasks in Natural Language Processing, Automatic Speech Recognition,
and Machine Translation. The results show that the proposed
approach improves significantly over state-of-the-art systems.
Read Andrei's dissertation: