Friday, March 16, 2012

How Google Search works ?

First important thing to note - when you are searching in google.com you are not really searching www, But you are searching google indexed databases !

How these indexed databases are created ?

Here comes the Google Bots or Spiders, which crawl the entire www to retieve web pages with in webpages, web pages with in webpages ,web pages with in webpages........................and store words and its webpages in google database present across say 100 to several thousands machine. 

These spiders were not so intelligent few years back.Because when sep 11 happened, google users were searching NewYork twin towers and google results were displaying nothing relevant to that sad event because their index was 1 month old !

But by now i guess these crawlers are smart enough to travel ever hour for the websites whose data changes often and every month for the websites whose data changes not so often.

If you create a new website,to display in google results either you have to submit the link to google or wait till crawlers retrieve your website pages


courtesy: www.googleguide.com


What happens when you enter some text say 'upendra', my favoutite actor :) in google text box and hit Search?

There might be billion pages in google indexed DB. Which of thses will be diplayed first ?
This is based on many factors like below

1. PageRank concept which was started by google founders  Sergey Brin and Lawrence Page
   Each website is given rank based on website authenticity , website quality, its references in other websites
2. Word frequency in the pages
3. Any synonyms of the words in the pages.
4. Word presence in page url, page title

It creates overall score of all above and displays the page with maximum score first.

We have already reached the stage where we are so much dependent on Google for even smallest of our daily tasks. As Google Fellow Amit Singhal says their aim is to answer questions like "what is the best time for me to sow seeds in some village in india, if there is early monsoon this time !! " in the very near future.

Google Search is becoming intelligent, making us dumb gradually !!

Wednesday, October 19, 2011

Understanding JIT(Just Intime Compiler) in JVM

There is no reason why a java programmer should not know how JIT works inside JVM.
Below picture just shows main components of a JVM

Main Components of JVM are Class Loader, Bytecode Verifier, Interpreter and Garbage Collector. I will take up only Interpreter in this post.
Interpreter - Interpreter come with a sub-component called JIT Compiler. Main task of the JIT is to improve runtime performance of a java program by compiling bytecode to machine code and caching the resultant machine code(This compilation should not be confused with javac compilation).

So when JIT is ON,  it is JIT COMPILATION + INTERPRETATION
When JIT is OFF, it is only INTERPRETATION

Let us take IBM jvm J9 and try to understand how JIT works.
When jvm is started, it might have lot of methods. JIT will not compile all the methods.First few calls to all methods are infact interpreted.It maintains a counter on each method call. When this counter crosses some threshold it will be JIT compiled and cached. And all the future calls to that method are not interpreted, in fact it is taken from JIT cache. Once the counter reaches the threshold it is reset to 0. when the threshold reaches second time, it is compiled with more optimization and cached . so more the times a method is called it is more optimized.The counter threshold value is chosen such that neither there is start up delay nor degraded performance.

JIT improves performance by optimization techniques like inlining, control flow optimization,local/global optimizations.

Client option in the JVM does less JIT compilations and hence less optimizations, while Server option does more JIT JIT compilations and optimizations. Hence there will be more starup delay with Server option.
Below is a small program which depicts the performance benefit we get when JIT compiled



By default, JIT will be ON. to switch off JIT(which is not advisable) option used is -Djava.compiler=none

With JIT
when we run above program with JIT enabled, it takes 1-2 milliseconds as seen below.


Without JIT
when we run above program with JIT disabled,  it takes 8-9 milliseconds as seen below.

Friday, March 11, 2011

My favorite quotes related to software,programming

In order to understand recursion, one must first understand recursion - Author Unknown

Before software can be reusable it first has to be usable. - Ralph Johnson

A good programmer is someone who always looks both ways before crossing a one-way street - Doug Linder

Programmers always confuse Halloween with Christmas because OCT 31 = DEC 25 - Author Unknown

There are two ways to write error-free programs; only the third one works. - Alan J. Perlis

Real programmers don't comment their code. It was hard to write, it should be hard to understand - Author Unknown

I would like to change the world But I do not have source code - Author Unknown

Low-level programming is good for the programmer’s soul.- John Carmack

"Kevorkian Virus: helps your computer shut down whenever it wants to." - Author Unknown

"Cannot delete tmp150---3.tmp: There is not enough free disk space. Delete one or more files to free disk space, and then try again." - Author Unknown

I think Microsoft named .Net so it wouldn’t show up in a Unix directory listing.- Oktal

If builders built buildings the way programmers wrote programs, then the first woodpecker that came along wound destroy civilization. - Gerald Weinberg

My programs never have bugs, they just develop random features - Author Unknown

Programs for sale: fast, reliable, cheap - choose two. - Author Unknown

You cannot teach beginners top-down programming, because they don't know which end is up. - C.A.R. Hoare

'Error, no keyboard ... press F1 to continue.' - Author Unknown

Friday, December 3, 2010

Few Basic Things about String intern( ) in Java

Consider a scenario when we read a csv file with very large number of records, we may end up a a lot of duplicate String objects. To avoid duplicate String objects getting created in these kind of scenarios we can use String.intern( ) method.

How it works ? Internally there will be Map/Table of String literals. First time when intern( ) is called on a String, it is added to this table. Subsequent calls String.intern ( ) will return reference to the String in the previous mentioned Table/Map.

one benefit of string interning is that == comparison is much faster.Say you interned few Strings which got added in the previous mentioned table. How do you remove any of those from the table ? Oops ! Are we Struck here until the program ends ?In the most recent JVMs, interned Strings are implemented as Soft references, so that they can be garbage collected soon.

String literals at compile time will be automatically interned, But literals created on run time(like command line arguments) will not be interned.

Monday, November 22, 2010

Diff b/w ClassNotFoundException and NoClassDefFoundError

When i was googling about this I found that information in some of blogs were misleading. Hence an attempt to clear this confusion.

ClassNotFoundException

Java Specification for ClassNotFoundException says below:
Thrown when an application tries to load in a class through its string name using:
  • The forName() method in class Class.
  • The findSystemClass method() in class ClassLoader.
  • The loadClass() method in class ClassLoader.
but no definition for the class with the specified name could be found.
So a ClassNotFoundException is thrown if an explicit attempt to load a class fails. ClassNotFoundException is thrown because the test attempts the load using an explicit call to loadClass().


NoClassDefFoundError

Java Specification for NoClassDefFoundError says:
Thrown if the Java virtual machine or a ClassLoader instance tries to load in the definition of a class (as part of a normal method call or as part of creating a new instance using the new expression) and no definition of the class could be found.

The searched-for class definition existed when the currently executing class was compiled, but the definition can no longer be found.
Essentially, this means that a NoClassDefFoundError is thrown as a result of a unsuccessful implicit class load.
public class MyNoClassDefFoundTest {
public static void main(String[] args) {
X x = new X();
}
}
public class X extends Y {
public void myShow(){
System.out.println("In X");
}
}

public class Y {
public void myShow(){
System.out.println("In Y");
}
Once you have compiled the code, remove the classfile of Y and execute the code. We got to see that it throws NoClassDefFound Error.
Please note that the same error would still occur if X referenced Y in any other way -- as a method parameter, for example, or as an instance field

Wednesday, June 2, 2010

Cannot open connection Error on large hibernate updates/inserts

Problem:Recently I was getting "Cannot open connection/Transaction Inactive" Error when doing large number (100000) updates with hibernate
Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction(); for ( int i=0; i<;100000; i++ ) { Customer customer = new Customer(.....); session.save(customer); } tx.commit(); session.close();
Solution:When making new objects persistent flush() and then clear() the session regularly in order to control the size of the first-level cache.
Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction(); for ( int i=0; i<;100000; i++ ) { Customer customer = new Customer(.....); session.save(customer); if ( i % 20 == 0 ) { //similar to JDBC batch //flush a batch of inserts and release memory: session.flush(); session.clear(); } } tx.commit(); session.close();