Writing a text filter

From EditPlus Wiki
Jump to: navigation, search

This is an outline of how to write your own text filter user tool.

Contents

Why?

I worked out how to do this because I wanted to filter error messages from a huge SQL script output file, but you can use this technique to manipulate a file in any way. Imagine if search and replace had even more power than regular expressions. The only limits are your programming ability and imagination.

How?

You set up a user tool to run your filter - select the "Run as text filter" option. You don't need to use any of the special arguments like $(FileName) because text filters are always run on the content of the current EditPlus window. The command and argument settings will vary according to the way your filter must be called. Of course, you also have to write the filter. My example below is Java, but any language that can read the standard input stream and write to the standard output stream is fine. If you're familiar with the idea of writing a utility that runs in a command line pipe, this is very similar. The general approach is that you are fed the content of the current file which you read. Your code decides what to do with this input. It can output some or all of the input, add or replace sections, generate something entirely new. Meanwhile, you can also do anything else you fancy with the text, like e-mail the juicy bits to your granny.

What if it goes wrong?

Just like using search and replace, if you don't like what the filter has done to your text, you can undo.

Examples

Java

This Java code removes from SQL script output messages that indicate that things have worked correctly, leaving only error messages.

import java.io.*;
import java.util.HashSet;
public class SPOutStripper
{
   static HashSet strippers = new HashSet ();
   static
   {
      strippers.add ( ""                  ); // a blank line
      strippers.add ( "Table dropped."    );
      strippers.add ( "Table created."    );
      strippers.add ( "1 row created."    );
      strippers.add ( "Commit complete."  );
      strippers.add ( "Table altered."    );
      strippers.add ( "1 row updated."    );
      // ...and many others
   }
   public static void main ( String [] args )
         throws Exception // Lazy programmer hopes IOException won't bite him
   {
      BufferedReader in    = new BufferedReader ( new InputStreamReader ( System.in ) );
      PrintWriter    out   = new PrintWriter ( new BufferedWriter ( new OutputStreamWriter ( System.out ) ) );
      String         line;
      // Loop through lines of input
      while ( null != ( line = in.readLine () ) )
      {
         // Check whether line should be stripped out
         if ( ! strippers.contains ( line ) )
         {
            // If it shouldn't, send it back out again
            out.println ( line );
         }
      }
      out.flush (); // Important!
      // Finished - tidy up
      out.close ();
      in.close ();
   }
}

Perl

Perl code for removing leading and trailing whitespace (spaces and tabs)

#!/usr/bin/perl
use warnings;
use strict;
while (my $text = <STDIN>) {
	chomp $text;
	$text =~ s/^[ \t]+|[ \t]+$//g;
	print "$text\n";
}

Javascript or VBScript

This example is in Javascript. It works basically the same in VBScript. Run as: cscript //NoLogo "c:\path to tool\tool.js"

var stdin   = WScript.StdIn;
var stdout  = WScript.StdOut;
var input = stdin.ReadAll();
/*
Here you do something with the input.
But since this is a demo, we're just going to write it back out.
*/
stdout.Write(input);

Python

This example attempts to tidy XML. It can be run as an EditPlus text filter tool, or from the command line.

import os,sys,re

def openAnything(source):
	"""Cribbed form diveintopython.org """
	if source == "-":
		return sys.stdin

	# try to open with urllib (if source is http, ftp, or file URL)
	import urllib
	try:
		return urllib.urlopen(source)
	except (IOError, OSError):
		pass

	# try to open with native open function (if source is pathname)
	try:
		return open(source, 'r')
	except (IOError, OSError):
		pass

	# treat source as string
	import StringIO
	return StringIO.StringIO(str(source))

def prettyUp ( xml ):
	""" Based on http://www.faqts.com/knowledge_base/view.phtml/aid/4334/fid/538 """
	parts = re.split ( '(<.*?>)', xml )
	level = 0
	wasText = False
	out = ""
	for part in parts:
		# ignore empty part
		if part.strip ( ) == '':
			continue
		# opening tags
		if part [ 0 ] == '<' and part [ 1 ] != '/' and part [ 1 ] != '?' and part [ 1 ] != '!':
			print
			sys.stdout.write ( '\t' * ( level ) + part )
			# short-cut empty tag
			if part [ -2 : ] != '/>':
				level += 1
			wasText = False
		# closing tags
		elif part [ : 2 ]  == '</':
			level -= 1
			if not wasText:
				print
				sys.stdout.write ( '\t' * ( level ) )
			sys.stdout.write ( part )
			wasText = False
		# text
		else:
			sys.stdout.write ( part )
			wasText = True

if len ( sys.argv ) == 1:
	xml = openAnything ( "-" ).read ()
elif len ( sys.argv ) == 2:
	xml = openAnything ( sys.argv [ 1 ] ).read ()
else:
	xml = None
	sys.stderr.write ( "Wrong number of arguments.\n" )

if None != xml:
	prettyUp ( xml )

Python again

This is surprisingly useful. It lines up text into columns by inserting spaces, for example from:

9 whatever
999 whatever
99 whatever
9999 whatever

to:

9    whatever
999  whatever
99   whatever
9999 whatever

Note: This code has some quirks - but you can hit Undo if you don't like the result.

You'll need to use "Prompt for arguments" ("$(Prompt)")after the script name to get a dialog where you can specify the whatever to be lined up. For a regular expression match, start with / (so /c.t will line up cat, cot, etc.)

import os,sys,re

def openAnything ( source ):
   """Cribbed form diveintopython.org """
   if source == "-":
      return sys.stdin

   # try to open with urllib (if source is http, ftp, or file URL)
   import urllib
   try:
      return urllib.urlopen ( source )
   except ( IOError, OSError ):
      pass

   # try to open with native open function (if source is pathname)
   try:
      return open ( source, 'r' )
   except ( IOError, OSError ):
      pass

   # treat source as string
   import StringIO
   return StringIO.StringIO ( str ( source ) )

def findMarker ( line, marker ):
   if "/" == marker [ : 1 ]:
      match = re.search ( marker [ 1 : ], line )
      if None == match:
         return -1
      return match.start ()
   return line.find ( marker )

def lineUp ( text, marker ):
   lines = re.split ( '\n', text )
   maxStartLen = max ( findMarker ( line, marker ) for line in lines )
   for line in lines:
      if 0 < len ( line ):
         pos = findMarker ( line, marker )
         start = line [ : pos ]
         end = line [ pos : ]
         print start + ( ' ' * ( maxStartLen - len ( start ) ) ) + end

if len ( sys.argv ) == 2:
   text = openAnything ( "-" ).read ()
   marker = sys.argv [ 1 ]
elif len ( sys.argv ) == 3:
   text = openAnything ( sys.argv [ 1 ] ).read ()
   marker = sys.argv [ 2 ]
else:
   text = None
   sys.stderr.write ( "Wrong number of arguments.\n" )

if None != text:
   lineUp ( text, marker )
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox