Showing posts with label Apache. Show all posts
Showing posts with label Apache. Show all posts

Wednesday, March 27, 2013

Apache Lucene and Apache Tika

Apache Lucene Core: Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Apache Tika: The Apache Tika toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Using Tika we will parse any file, and extract text out of it, then input the extracted text to Lucene, which inturn index it and then make it ready for searching.

Monday, August 13, 2012

Split String to equal length substrings in Java

Solution1: Use of Regular Expressions
System.out.println(Arrays.toString(
    "abcdefghij".split("(?<=\\G.{4})")
));

Output1:
[abcd, efgh, ij]

Solution2: Use of simple for Loop
String text = "abcdefghij";
 int size = 4;

        for (int start = 0; start < text.length(); start += size) {
            System.out.println( text.substring(start, Math.min(text.length(), start + size)));
        }

Output2:
abcd
efgh
ij

Solution3: Use of Splitter class(Guava from Google)
for(final String token :
    Splitter.fixedLength(4).split("abcdefghij")){
    System.out.println(token);
}

Friday, August 10, 2012

StringUtils

public static int getLevenshteinDistance(String s, String t):

Find the Levenshtein distance between two Strings.

This is the number of changes needed to change one String into another, where each change is a single character modification (deletion, insertion or substitution).


Example:
StringUtils.getLevenshteinDistance("pramod",StringUtils.reverse("prmaod"))


Output:
4

Q: How to display the Collection elements in customized way, like, one element in a a line?

Solution:
List<String> list = new ArrayList<String>();
list.add("one");
list.add("two");
list.add("three");
      
System.out.println(StringUtils.join(list, '\n'));

Output:
one
two
three

Thursday, July 19, 2012

Apache Commons

Apache commons has an excellent Utility class, that is : StrSubstitutor.

Example 1:

String testString = "This is a test string @1@ @2@ @3@ that needs to be replaced with value";

Map<String,String> properties = new HashMap<String,String>();
        properties.put("1", "one");
        properties.put("2", "two");
        properties.put("3", "three");

StrSubstitutor substitutor = new StrSubstitutor(properties,"@","@");

System.out.println(substitutor.replace(testString));

Output:
This is a test string one two three that needs to be replaced with value

Example 2:

The default behavior of this class works similar to "Velocity Templates". i.e., ${var_name}.

  Map<String, String> map = new HashMap<String,String>();
        map.put("name", "pramod");
        map.put("city", "karimnagar");
       
 String text = "My name is ${name}. I am from ${city}.";
       
 StrSubstitutor substitutor = new StrSubstitutor(map);
       
 System.out.println(substitutor.replace(text));

Output:
My name is pramod. I am from karimnagar.

Monday, July 16, 2012

Velocity Templates

Velocity is a Java-Based Template Engine. It can be used to substitute some keys with values at runtime. It is an Apache Project.

Known Uses:
  1. Sending Email to list of users with same content and different user information.
Note: It requires that variables should begin with $ symbol.