Thursday, May 5, 2005

First Rev design done

Have completed a first paper design of FRET. It will analyse a group of files in three phases, each phase performing a different task;
  • Phase 1: Parse each file individually and store detected structures in a database.
  • Phase 2: Parse each file, comparing its raw data to the already detected structures and use this to identify new structures.
  • Phase 3: Compare all the detected structures for each file and all the files raw data against each other, identifying new structures.

I´ve also developed (borrowed really) the following terminology to describe whats happening: GRAM - a data structure or pattern of bytes detected in a file or buffer. Each GRAM will have a position, length, type, confidence level and parent. The term GRAM is taken from the classical Greek for a letter. I´m using it because it is the root of the words bigram and trigram that are used in cryptography when performing statistical analysis. FRET, after all, was inspired by coincidence counting - why not treat an unknown file format like an encrypted file for analysis purposes?

Wednesday, May 4, 2005

FRET - it has a name

I have seen it and it has a name! Actually, I haven´t seen it but I have mapped it out roughly on paper. Since ealy March, I´ve been investigating needs within the free software community with the idea in mind of starting development of a Unix tool. I've looked at tools such as hex editors and binary diff tools and think I''ve spotted a gap. There is no tool available that will use sheer computing muscle to analyse black-box files and describe their internal structure - this could potentially save man-months if not man-years of development time - now I need to decide if there is a real need.

Sunday, May 1, 2005

and then there was 6

FRET now has 6 phases of analysis. I´ve done some pretty deep thinking (and scribbling) about this and identified 6 phases of analysis and optimised the order of execution. The phases are;
  • Buffer preprocessing - remove obfuscation etc.
  • Single Buffer Scans - analyse raw data in each buffer
  • Raw data to Gram comparison for each buffer
  • Comparison of Grams for each buffer
  • Comparison of raw data of each buffer
  • Rationalise and amalgamate results

Don´t worry, it may all make sense in the end. I´m not sure if this will work either.