Apache Tika is an interesting project. It is not a very big one but IMHO it is poised to become the project every team serious about doing the complex, unstructured, binary content processing will talk about and use.
The power of Apache Tika lies in the simplicity it offers for processing different types of binary and other types of complex data. Consider a simple situation: your project needs to support analyzing PDF files. One approach is to write a PDF library specific routine. This approach stops scaling as soon you need to support Excel and ODT files too. And stops working once you have a task to support a possibly unlimited number of types of data.
Apache Tika helps with generalizing the processing of arbitrary types of data and thus offers a unique opportunity for a given project to offer a real value add-on.
I really liked this presentation at the recent Apache Con NA. It was absolutely packed with the interesting content and Chris talked a lot about applying Tika to solving the real life problems. Andriy Redko did a brilliant talk about the CXF and Tika integration. There were more Tika presentations and I regret I could not make it to all of them.
The future is bright for Tika. And for the projects that will use it :-)