Sunday, February 22, 2009

SINEQUA : SPEED AND VOLUME, keeping the relevancy and the rich functionalities


 

Sinequa just finished a first series of tests on our new version of Sinequa CS. I must confess I'm very proud.

Without any specific optimization, the results generate a lot of enthusiasm here. Sinequa has long been ahead in terms of relevancy and functionalities. When others did not see the point of managing security, linguistics or connectivity, we already solved these issues three years ago. We now have developed a new architecture including the necessary options to fulfill enterprise search needs at the kernel level of the technology, while at the same time generating first class performance. Sinequa technology is today at an unparalleled level of performance for this level of functionality. There will be detailed product data sheets coming soon, but in the mean time, here are a few points:

Number of queries on a large volume of users: up to 1700 simultaneous queries per second on one bi-processor server (average response time around 10 milliseconds). In production, our most demanding customer today manages up to 400 queries per second but with multiple servers, we actually generate here an improvement of around 50 times compared to the previous release of the technology. More importantly, it's highly sufficient to serve any customer needs.

Capacity to manage large volumes per server: one single server has indexed around 100 million documents (enterprise documents) in a few dozens of hours, and the server limits had not been reached. The server is a quadri-processor with 32 Giga of RAM (yes… it takes what it takes), so this is very promising; it represents a huge improvement for Sinequa, especially considering the performances come with a complete linearity based on the number of servers. We can now index the integrality of the enterprise content without consuming a lot of hardware resources, and this will be done in a reasonable time, and with sufficient refresh. For precise indexation time and volumes, I'll wait to have all the data per types of documents, since a PDF or a word document , an excel spreadsheet or a html document can be quite different. As an example, one entry level server(4 processors and 8 Giga of RAM): can index a little bit more than 1000 press documents per second, which means around 100 million documents in 24 hours (per server).

Capacity to index a database on an entry level server
(4 processors and 8 Giga of RAM): 5000 lines (or database objects) per second, which gave around 20 million lines per hour and finally 100 million database objects indexed in 5 hours. Maximal number of insertion per seconds: 10,000 which means in the end 100 million in less than three hours. I have recently read the performances of a competitor who was proudly indexing 30 million database objects in ten hours on a server. Sinequa does 6 to 7 times faster, and we are talking about a competitor who's main competitive advantage is supposed to be scalability.

We are impatient to see this new release of Sinequa being exposed to the users and content inside the enterprise; the rich functionalities of Sinequa combined with this level of performance, should give results that users will notice and vote for. We don't have long to wait as next month the first customer will be in production…

Monday, February 2, 2009

Desktop Search vs. Enterprise Search: a very different game

I was pleased by my conversation Friday with a knowledge management executive from a large international firm. He considers that desktop search has little to do with enterprise search, which was not how he saw things six months ago. Customers or analysts sometimes ask me why Sinequa doesn’t create a desktop search product, except when we adress very specific customer needs. There are two reasons: one is functional and linked to the usage, to our vision and our value proposition, and the other one is technical. The two work quite well together.

The functional reason is simple, Sinequa is an Enterprise 2.0 specialist. This means that through our enterprise search solution, we deliver individual productivity as well as collective intelligence.

  • I am convinced that the Enterprise 2.0 serves this goal, making sure that anyone in the organisation is efficient and in phase with the rest of the company (what is new and disruptive here is the idea that productivity comes just as much from rich interactions as from organisation schemes and processes, cf. my december 2008 post « Taking advice from the ants »). In other words, collective intelligence comes from better interactions between employees. A prerequisite is that each employee must have access to shared information within the appropriate context. That means access to shared knowledge, according to his/her profile (i.e. a sales person must not have access to the knowledge of the CFO). This knowledge includes but is not limited to: documents, information within applications, employees who could provide valuable advice, or those who are interested in the same topic, or who would be relevant for the user to know of, customers that will be impacted, and so on...
  • An exhaustive enterprise search solution such as Sinequa CS, equipped with all necessary connectors managing security and access rights, providing advanced extraction functionalities and appropriate scalability can offer all this. All that needs to be done is to deploy the indexation on all the applications (CRM, ERP, PLM, HRS,...), the Intranets, the file systems, the mail servers, …

  • Some say that desktop content should be added to that shared content. I think this is highly inapropriate. As a matter of fact, information on the desktop happens to be… personal. Sure it must be easily searchable, but it should not be mixed with enterprise shared information and knowledge. The two applications (desktop search and enterprise search) should be different including the functionalities they offer. If not, you would get the worst of both worlds. One can actually legitimately compare desktop information with real world desktop and office documents: everyone of us organises his/her files according to his/her own needs. I file documents in a way that helps me stay efficient. What is on my desktop or in my drawers is there to help me do my job, and there is no capitalisation or sharing preocupation there. When I capitalise or share, it's from outside of my desktop. It does not mean that things should not be easily accessible and archived on my desktop (of course I need to be able to retrieve quickly from my drawer). But it could be dangerous to mix those contents with the rest of the enterprise content. That could lead to a massive slowdown of individual productivity. Indeed, it is important that when an employee searches something other than his/her own files, he/she searches only on the updated, validated, complete data sources, the ones that are on the shared environment. If the enterprise search always brings back personal desktop results, the employee will tend to go to those first (they are already known, I don't have to read them, just recognise them), and the risk of missing the right information increases.
  • On the contrary, when I search within the shared content of the Enterprise, I search, then navigate, then need to check what I have found,... It's a different mental process from retrieving a file on my hard drive. In the end, mixing both search applications is thefore dangerous and confusing and will also slow down the shift to the Enterprise 2.0. Guess what: employees are more likely to continue to work alone.

I’ll be more concise on the second and technical reason.

  • Desktop and desktop search is a discipline in itself, it happens to fit perfectly in the ergonomy of the desktop; not using too many resources to slow down the desktop. Moreover, I'm already familiar with the documents on my desktop since I am the only one downloading them on my hard drive. As a consequence, I can be satisfied with a very basic keyword search to find a document I already know exists. I do not need to search within context: the date, format, or location on my hard drive are enough to help me remember the context of a document. And desktop search must completely integrate within the operating system. Virtualisation does not change the argument.
  • It is quite interesting to notice that vendors selling a desktop search and an enterprise search solution actually sell two different solutions with no real technical integration. There are no synergies, not even commercially, as most desktop search solutions are free. In that respect, desktop search has a lot to do with World Wide Web search, I'll do a specific post on that...

In conclusion, I recommend Microsoft Windows Desktop Search if you are using Windows (free), or Google Desktop (free). For your Enterprise search solution, it shouldn't be a surprise if I tell you I would pick Sinequa CS. But most of all, I strongly recommend testing the solution in the real environment, to keep in mind the complete deployment scope of the project, and be sure to talk to exisiting customers of enterprise search vendors. By the way, the best enteprise search solution integrates seamlessly with good desktop search products.