Tuesday, 13 December 2011

Linux Command for Disk Space

$ du
Command to know the disk space of current directory and all subdirectory in kilo bytes

$ du -s
Summarizes the total size of directory

$ du -c
Prints the grand total of directories, recursively

$ du -a
Prints the size of every file and folder in the directory and subdirectory

$ du -h
Prints the size of directory in Human readable format, like 12M, 1.2G

$ du -h -c
Prints the grand total in human readable format

$ du / -c | sort -nr | head -n 10
Command to find top 10 directory having maximum disk space

$ du / -a | sort -nr | head -n 10
Command to find top 10 file and folders having maximum size

$ df -lk
Command to know the overall disk space usage of entire HDD

Linux CPU/Memory Utilization - Tips/Tricks/Commands

As a common knowledge CPU is been allocated to multiple process in a round robin algorithm with a certain time slicing. But when any process starts consuming the CPU and not releasing due to heavy computation either in quality (depth of computation) or quantity, system's load average will increase causing other process to wait in the queue to get the CPU. This may cause in some deadlock, system CPU hogging. I am gonna share my experience on how to fix such issues on linux servers and more tricks around web systems.

Few command to figure out the CPU usages
$ top 
This will tell you overall usage, Make sure that you notice the load average. If load average is going more than 2 to 3, means system CPU is getting utilized a lot.
Second observation is the process, check the processes who is coming on the top and specially whose %CPU is printing out to be high in number, higher the value that particular program is consuming more CPU.

$ mpstat
Use this command to see the CPU utilization individually

$atop
Use this command to see the CPU utilization of every processor (or cores)

$ apt-get install sysstat
Install this package to track the system usages on regular intervals

$sar
This will show you history of CPU utilization, using which you can track when the CPU usage or IO wait was high

$sar -u 2 10
Prints the current system usages in every 2 seconds and repeats for 10 times
  • %user: Percentage of CPU utilization that occurred while executing at the user level (application).
  • %nice: Percentage of CPU utilization that occurred while executing at the user level with nice priority.
  • %system: Percentage of CPU utilization that occurred while executing at the system level (kernel).
  • %iowait: Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
  • %idle: Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

Who is the hell is eating the CPU
$ ps -eo pcpu,pid,user,args | sort -k 1 -r 
Prints all the processes in reverse sorted order of consuming CPU

$ ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
Prints the most consuming process in reverse order and limits to 10 size

$iostat
To prints the statistics of IO usage since system reboot

$iostat -xtc 2 10
Prints the IO usage in every 2 seconds for 10 times

Check Memory Usages

ps aux | awk '{print $2, $4, $11}' | sort -k2r | head -n 10
Prints the which process is consuming how much memory

You can see the free and available memory using commands like
$free
$atop

---
Some concepts to resolve the common issues in web system.
1. Most of the time IO wait will be high on server to cause the server to be slow, so check the slow query log in DB and resolve such queries, try to optimize the mysql usage with caching/indexing

2. Make sure that tomcat or given enough memory and GC parameters to use the memory efficiently
Like this (as a Java runtime variables)
java -Xms512M -Xmx512M -Xss128K -XX:PermSize=64M -XX:MaxPermSize=128M -XX:NewRatio=4 -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.awt.headless=true -jar xyz.jar

3. $jmap -histo:live process_id
Command to check the memory usage by a Java process - keep and eye that you are not loading too many objects in memory by mistake

4. $jstack process_id
Command to check java process running stack traces - keep and eye to watch any regular method exception is coming which is taking more time to execute(resolve that method execution, optimize it)

Monday, 21 November 2011

Making your chatter Bot using Jabber/XMPP

What is Jabber/XMPP?

Jabber is a non-for-profit organization who overlook the development of XMPP.

XMPP : Extensible Messaging and Presence Protocol (XMPP), an open XML based communications technology, widely used for IM, chatting.

There are many many implementation of XMPP as a server, and many many Jabber clients also. Gtalk/MSN/AOL/mac does support XMPP, so using any Jabber client you can communicate with above all account holding buddies.

Q. Where to get a Jabber client?
A. Here you can find the list : http://xmpp.org/xmpp-software/clients/

Q. Where to get XMPP server?
A. I used Openfire for it that you can download it from here : http://www.igniterealtime.org/downloads/index.jsp


Step for Making your chatter Bot

Step1: Download and Making you server up
Download XMPP server and install it on a machine. If you are Window/Mac user then you can directly download exe/dmg file and install.

But if you are installing it on Linux, then most simplest way is to download the tar file, extract it and run the openfire
executable file from bin folder, like "$ ./openfire start". Make sure before you start openfire you have Java set on your system.
In the script of openfire, just set some environment variable to access the Java like this.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
export JRE_HOME=/usr/lib/jvm/java-6-openjdk/jre

And while logging it might ask for access on certain directory or files, so instead giving access to them just edit the log4j.xml files(which you might find in lib directory), and change the log file, either you can hardcode them or just change the directory where you already have the access.

In case of Windows and Mac you can start your Openfire from program files or Applications.

Step 2: Install
Once you started Openfire, you can hit this url in browser http://localhost:9090/ and configure it. Its very simple, let me know if you face any problem in that. For this step you will need a DB also, you might use Mysql or anything like that.

Step3: Writing a Client
Write a client, using smack API, before that download the smack API from Openfilre download link(above), or precisely this will do, http://www.igniterealtime.org/downloads/download-landing.jsp?file=smack/smack_3_2_1.zip

Smack here very good tutorials on "how to start" and all, so using that write Java based application which can act as a bot, Lets call it "xyz@abc.com".

Step4: Certification
SSL Certification: If you want to send friend request to gtalk/msn or any other Jabber clients most likely they ask for a SSL certification because they prefer to communicate securely. If you are deploying this for a local purpose(intranet) possibly you can create a self signed certificate and deploy that in Openfire. Or else create a CSR, give it to CA and get a cert file and deploy that in Openfire.
How to generate keyfile, CSR and all, you can take help from this URL : http://www.igniterealtime.org/builds/openfire/docs/latest/documentation/ssl-guide.html


so kool everything is ready. One simple illustration of How it will work? Like If you want to send a friend request to a aramis_123@gmail.com, that can be done using a Client "xyz", once "aramis" confirms the friend request, "xyz" can send any chatting message to aramis and aramis send back any message to xyz, the client program of xyz will receive that in the Chatlistener.

So whoever is online at their chatting client, you are reachable to them...kool

Wednesday, 16 November 2011

Canonicalization

What is Canonicalization? (also known as c14n or standardization or normalization)

Canonicalization is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

Like consider a search text "Is there any side effect of taking paracetamol during cancer?" can be represented in other forms like "Side effects of paracetamol during cancer", or "During cancer taking paracetamol has what all side effects", but eventually every representation is talking about same thing. Now if you Canonicalization them they will become something like this "cancer during effect side taking paracetamol", what I did was just removed the stop words, and sorted the terms alphabetically. Now every representation will eventually match to this Canonical form.

Q. Why Canonicalization?
There are many benefits of it:
1. After doing the Canonicalization of the text you come to know the exact meaning of it whatever is the presentation.
2. Many variation of presentation can be targeted on a single title.
3. Search quality can be improved by searching the relevant terms only.

How Canonicalization?
1. Define your characters set as per your domain and remove the other characters which is not required. Like if you are dealing with english langage text data, then you can remove any character other than alphanumeric.

2. Remove the stop words

3. Do the stemming

4. Sort in a chronological order

Monday, 14 November 2011

Stop Word Listing

Stop words in computation domain are the terms which need to be removed before natural language text(data) processing because they do not make any significant sense.

A precise definition:

“Words that do not appear in the index in a particular database because they are either insignificant (i.e., articles, prepositions) or so common that the results would be higher than the system can handle (as in the case of IUCAT where terms such as United States or Department are stop words in keyword searching.) Stop words vary from system to system. Also, some systems will merely ignore stop words where use of stop words in other systems will result in retrieving zero hits. ”

You have to build your own stop word (manually) as per the use of it, suppose you want to build a topic Canonicalization then stop word list could be quite big. For topics you should remove negations like "not","nothing"; and you should remove question tokens like "why", "what","how". But if you are building Canonicalization for questions, then you should keep negations and question tokens.

Stop words for Topic Canonicalization
------------------------------------
a
about
above
across
again
against
all
almost
alone
along
already
also
although
always
am
among
an
and
another
any
anybody
anyone
anything
anywhere
are
area
areas
around
as
ask
asked
asking
asks
at
away
b
backed
backing
backs
be
became
because
become
becomes
been
began
behind
being
beings
best
better
between
big
both
but
by
c
came
can
cannot
case
cases
certain
certainly
clear
clearly
come
communityid
could
d
did
differ
do
does
done
down
downed
downing
downs
e
each
early
either
end
ended
ending
ends
enough
even
evenly
ever
every
everybody
everyone
everything
everywhere
f
face
faces
fact
facts
far
felt
few
find
finds
first
for
four
from
full
fully
further
furthered
furthering
furthers
g
gave
general
generally
get
gets
gif
give
given
gives
go
going
good
goods
got
great
greater
greatest
group
grouped
grouping
groups
h
had
has
have
having
he
her
here
herself
high
higher
highest
him
himself
his
how
however
i
icon
if
im
important
in
interest
interested
interesting
interests
into
is
it
its
itself
j
just
k
keep
keeps
kind
knew
know
known
knows
l
large
largely
last
later
latest
least
less
let
lets
like
likely
long
longer
longest
m
made
make
making
man
many
may
me
member
members
men
might
more
most
mostly
mr
mrs
much
must
my
myself
n
necessary
need
needed
needing
needs
never
new
newer
newest
next
no
nobody
non
noone
not
nothing
now
nowhere
number
numbers
o
of
off
often
older
oldest
on
once
one
only
open
opened
opening
opens
or
ordered
ordering
orders
other
others
our
out
over
p
part
parted
parting
parts
per
perhaps
pl
place
places
pls
plz
pointed
pointing
possible
presented
presenting
presents
put
puts
q
quite
r
rather
really
regname
right
room
rooms
s
said
same
saw
say
says
second
seconds
see
seem
seemed
seeming
seems
sees
several
shall
she
should
show
showed
showing
shows
side
sides
since
small
smaller
smallest
so
some
somebody
someone
something
somewhere
state
states
still
such
sure
t
take
taken
than
that
the
their
them
then
there
therefore
these
they
thing
things
think
thinks
this
those
though
thought
thoughts
three
through
thus
to
today
together
too
took
toward
turn
turned
turning
turns
two
u
under
until
up
upon
us
use
used
uses
v
very
w
want
wanted
wanting
wants
was
way
ways
we
well
wells
went
were
what
when
where
whether
which
while
who
whole
whose
why
will
with
within
without
work
worked
working
works
would
x
y
yet
you
young
younger
youngest
your
yours
yr
z

Stop word for Question Canonicalization
----------------------------------------------
a
about
above
across
again
against
all
almost
alone
already
also
although
always
am
among
an
and
any
anybody
anyone
anything
anywhere
are
area
areas
around
as
ask
at
away
b
backed
backing
backs
be
became
because
become
becomes
been
being
beings
best
better
but
by
c
came
cannot
communityid
d
differ
do
does
done
downed
downing
downs
e
each
either
end
ended
ending
ends
enough
even
evenly
ever
every
everybody
everyone
everything
everywhere
f
fact
facts
far
felt
few
find
finds
first
for
four
from
full
fully
further
furthered
furthering
furthers
g
general
generally
get
gets
gif
give
given
gives
greater
greatest
h
had
has
have
having
he
her
here
herself
him
himself
his
however
i
icon
if
im
important
in
into
is
it
its
itself
j
just
k
keep
keeps
l
large
largely
last
later
latest
least
less
let
lets
m
many
may
me
might
more
most
mostly
mr
mrs
much
must
my
myself
n
needed
needing
needs
new
newer
newest
nobody
noone
now
nowhere
o
of
off
on
one
or
others
our
out
p
parted
parting
per
perhaps
pl
please
pls
plz
put
puts
q
quite
r
rather
really
regname
s
said
same
saw
say
says
see
seem
seemed
seeming
seems
sees
several
she
so
some
somebody
someone
somewhere
such
sure
t
that
the
their
them
then
there
therefore
these
they
this
those
though
three
through
thus
to
today
too
toward
u
up
upon
us
v
very
w
wanted
wanting
was
we
well
wells
went
were
will
with
would
x
y
yet
you
your
yours
yr
z

Thursday, 10 November 2011

Optimizing Solr

Check the stats and find out these numbers: (you can find these stats in admin console of the solr)
1. queryRequestHandlers - First find out which one you are using, most likely it could be "dismax"
1. requests
2. avgTimePerRequest
2. queryResultCache
3. documentCache

Now 3 above things can be optimized...

1. If Number of requests for queryRequestHandler is quite good, say more than 10000 and avgTimePerRequest is more than 10ms, then I am pretty sure that It can be optimized.
2. Increase the documentCache size to the number of documents that you have in the index, or if it too big, then set it to the number which you think could get requested per day, might be 20000 to 30000 should suffice if you have traffic of 50000.
3. Increase the queryResultCache by the number which you think is getting request frequently
4. In both the cache, queyResultCache and documentCache, you should optimize the hit ratio to more than 95%, then caching is making sense to be used
5. Observe the avgTimePerRequest for the query handler

Friday, 6 May 2011

Install MySql on Mac 10.6 Snow Leopard

Download:
http://www.simonwhatley.co.uk/installing-mysql-on-mac-osx-10-6-snow-leopard
http://dev.mysql.com/downloads/mysql/5.1.html#macosx-dmg

And follow the instructions, it will install the mysql at following location /usr/local/mysql

$ cd /usr/local/mysql
$ sudo scripts/mysql_install_db --user=mysql
$ cd bin
$ ./mysqladmin -u root password root
$ ./mysql -u root -proot
$ PATH=$PATH:/usr/local/mysql/bin
$ export PATH
$ mysql -u root -proot

Done :)

Wednesday, 4 May 2011

Make your development system up on Mac

This post is about persisting the experience while installing Mac 10.6.3 (SnowLeopard) on my personal computer.

First of all you should decide if you want to install by using binary then I have no idea. I can tell you to install so many things easily using MacPort software manager.

So first of all follow this link to download and install MacRport http://www.macports.org/install.php

Basic command usages for MacPort
---------------------------
port list variant:no_ssl
port uninstall name:sql
port echo depof:mysql5
port echo apache*
port install mysql5
--------------------------

Now you need Apache + PHP + Mysql
Step 1: sudo port selfupdate
Step 2: sudo port install gawk
Step 3: sudo port install nawk (For me this step was failed, but I gave a damn !!!)
Step 4: sudo port install php5 +apache2 +mysql5-server

Now use the command below to setup the mysql db.
$ sudo /opt/local/lib/mysql5/bin/mysql_install_db --user=mysql

You can change mysql root password using this command
$ /opt/local/lib/mysql5/bin/mysqladmin -u root password 'new-password'

Next, you can configure apache and mysql to start automatically by enter the command below:-
$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.apache2.plist
$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.mysql5.plist

Now you need to configure apache to load php file, open /opt/local/apache2/conf/httpd.conf and add this 2 line:-
LoadModule php5_module modules/libphp5.so
AddType application/x-httpd-php .php

Once saved, you may restart your apache

To start your apache2 and mysql5 manually, type the command below:-
$ sudo /opt/local/etc/LaunchDaemons/org.macports.apache2/apache2.wrapper start
$ sudo /opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper start


Now you'll want Java
Java is usually installed with Mac OS and XCode (and yes you need to install XCode at the very very first thing of above all, you can locate XCode in any of your given DVD with Mac, or you can download also). You can usually find java and javac commands are usually working. The working java home would be usually at /Library/Java/Home

Eclipse : Download it from here http://eclipse.org/downloads/

Tomcat :
Step 1: Download it from here http://tomcat.apache.org/download-60.cgi
Step 2: After downloading tomcat, extract the file and go to bin
Step 3: Create a file setenv.sh and write this line JAVA_HOME=/Lbrary/Java/Home (or whatever java home you have)
Step 4: Save the file, and run "chmod 777 setenv.sh" and you are ready with tomcat
Step 5: ./startup.sh will run the tomcat, you can see the log at ~tomcat_home/logs/catalina.out