Difference between revisions of "Writing a text filter"

From EditPlus Wiki
Jump to: navigation, search
(trocdel)
m (Reverted edits by ErsitCnage (Talk); changed back to last version by 216.239.124.38)
Line 1: Line 1:
acmonroldom
 
 
== Writing a text filter ==
 
== Writing a text filter ==
 
This is an outline of how to write your own text filter user tool.
 
This is an outline of how to write your own text filter user tool.

Revision as of 07:20, 15 July 2008

Writing a text filter

This is an outline of how to write your own text filter user tool.

Why?

I worked out how to do this because I wanted to filter error messages from a huge SQL script output file, but you can use this technique to manipulate a file in any way. Imagine if search and replace had even more power than regular expressions. The only limits are your programming ability and imagination.

How?

You set up a user tool to run your filter - select the "Run as text filter" option. You don't need to use any of the special arguments like $(FileName) because text filters are always run on the content of the current EditPlus window. The command and argument settings will vary according to the way your filter must be called. Of course, you also have to write the filter. My example below is Java, but any language that can read the standard input stream and write to the standard output stream is fine. If you're familiar with the idea of writing a utility that runs in a command line pipe, this is very similar. The general approach is that you are fed the content of the current file which you read. Your code decides what to do with this input. It can output some or all of the input, add or replace sections, generate something entirely new. Meanwhile, you can also do anything else you fancy with the text, like e-mail the juicy bits to your granny.

What if it goes wrong?

Just like using search and replace, if you don't like what the filter has done to your text, you can undo.

Example 1: Java

This Java code removes from SQL script output messages that indicate that things have worked correctly, leaving only error messages.

import java.io.*;
import java.util.HashSet;
public class SPOutStripper
{
   static HashSet strippers = new HashSet ();
   static
   {
      strippers.add ( ""                  ); // a blank line
      strippers.add ( "Table dropped."    );
      strippers.add ( "Table created."    );
      strippers.add ( "1 row created."    );
      strippers.add ( "Commit complete."  );
      strippers.add ( "Table altered."    );
      strippers.add ( "1 row updated."    );
      // ...and many others
   }
   public static void main ( String [] args )
         throws Exception // Lazy programmer hopes IOException won't bite him
   {
      BufferedReader in    = new BufferedReader ( new InputStreamReader ( System.in ) );
      PrintWriter    out   = new PrintWriter ( new BufferedWriter ( new OutputStreamWriter ( System.out ) ) );
      String         line;
      // Loop through lines of input
      while ( null != ( line = in.readLine () ) )
      {
         // Check whether line should be stripped out
         if ( ! strippers.contains ( line ) )
         {
            // If it shouldn't, send it back out again
            out.println ( line );
         }
      }
      out.flush (); // Important!
      // Finished - tidy up
      out.close ();
      in.close ();
   }
}

Example 2: Perl

Perl code for removing leading and trailing whitespace (spaces and tabs)

#!/usr/bin/perl
use warnings;
use strict;
while (my $text = <STDIN>) {
	chomp $text;
	$text =~ s/^[ \t]+|[ \t]+$//g;
	print "$text\n";
}

Example 3: Javascript or VBScript

This example is in Javascript. It works basically the same in VBScript. Run as: cscript //NoLogo "c:\path to tool\tool.js"

var stdin   = WScript.StdIn;
var stdout  = WScript.StdOut;
var input = stdin.ReadAll();
/*
Here you do something with the input.
But since this is a demo, we're just going to write it back out.
*/
stdout.Write(input);

Example 4: Python

This example attempts to tidy XML. It can be run as an EditPlus text filter tool, or from the command line.

import os,sys,re,string

def openAnything(source):
	"""Cribbed form diveintopython.org """
	if source == "-":
		return sys.stdin

	# try to open with urllib (if source is http, ftp, or file URL)
	import urllib
	try:
		return urllib.urlopen(source)
	except (IOError, OSError):
		pass

	# try to open with native open function (if source is pathname)
	try:
		return open(source, 'r')
	except (IOError, OSError):
		pass

	# treat source as string
	import StringIO
	return StringIO.StringIO(str(source))

def prettyUp ( xml ):
	""" Based on http://www.faqts.com/knowledge_base/view.phtml/aid/4334/fid/538 """
	parts = re.split ( '(<.*?>)', xml )
	level = 0
	wasText = 0
	out = ""
	for part in parts:
		# ignore empty part
		if string.strip ( part ) == '':
			continue
		# opening tags
		if part [ 0 ] == '<' and part [ 1 ] != '/' and part [ 1 ] != '?' and part [ 1 ] != '!':
			print
			sys.stdout.write ( '\t' * ( level ) + part )
			# short-cut empty tag
			if part [ -2 : ] != '/>':
				level = level + 1
			wasText = 0
		# closing tags
		elif part [ : 2 ]  == '</':
			level = level - 1
			if not wasText:
				print
				sys.stdout.write ( '\t' * ( level ) )
			sys.stdout.write ( part )
			wasText = 0
		# text
		else:
			sys.stdout.write ( part )
			wasText = 1

if len ( sys.argv ) == 1:
	xml = openAnything ( "-" ).read ()
elif len ( sys.argv ) == 2:
	xml = openAnything ( sys.argv [ 1 ] ).read ()
else:
	xml = None
	sys.stderr.write ( "Wrong number of arguments.\n" )

if None != xml:
	prettyUp ( xml )