online QDA logo - Home Page



Bookmark and Share

General transcription guidelines when using CAQDAS software

Author of this page: Ann Lewins

Affiliation: University of Surrey

Date written: July 2005




Preparation of Textual Data

Data is often inconsistently transcribed within one file. Much of the final preparation of data (which may have more to do with analysis stage decisions) may be your responsibility as the researcher. So, if you are planning to use a CAQDAS / QDA software to assist you in the handling of your textual data your preparation will be more efficient if some basic rules are applied at an early transcription stage, whatever software you are planning to use.

Choose one of the options below:

Choose this option if you do NOT know which CAQDAS software you will use

Choose the relevant option below if you DO know which software you will use.

File formats and preparation

This list of DO's and DON'Ts in this section is general and tries not to be too 'software-specific'. These are not all the steps and considerations required for the preparation of data for individual software packages. Before going any further it may be useful to have some idea of the range of file formats expected by a variety of software. Note that the last column refers to software that can handle additional graphic formats (in differing ways in each software), but detailed information about these is not included here.

CAQDAS Package Text format accepted Can handle additional graphic formats
N5/N6 Text Only No
MAXqda 1 Rich Text Format (.rtf) No
MAXqda2 .rtf Yes
Atlas.ti V4.2 Text only with line breaks Yes
Atlas.ti V5 .rtf or Word Yes
NVivo Version 1-2 .rtf (Text only, Plain text, Text only with line breaks) Yes
QDA Miner Word, .rtf, Text only No
QUALRUS .rtf Yes
The Ethnograph Word (copied and pasted into an Editor window). No
HyperRESEARCH Plain Text Yes

General Rules

If you are going to use a CAQDAS software, text search or autocoding tools can 'search' your entire dataset for words or strings that you define.The resultant finds can be autocoded. YOU DO NOT EVER HAVE TO USE THESE TOOLS AND WE WOULD SUGGEST THAT IT IS A MISTAKE TO MAKE THE USE OF THEM INTO THE PRIME OBJECTIVES OF INTERVIEW DESIGN. However, certain types of data lend themselves to these tools - very structured interviews, open ended questions from a survey, focus group data (where you have your speakers identified). The issues below have implications for the efficient use of these tools:-

  • Consistency of transcription
  • Speaker ID's, headers etc.
  • Units of text used in the transcription e.g. paragraphs, sentences, sections, heading levels.

a) Consistency

Some CAQDAS software programs allow you to edit your data freely. However it is rare for there to be a Spell checker or easy Edit/Find and Replace tools. These tools are invaluable in Word and should be used to clean up your data before the file is assigned/imported in the CAQDAS software. Also, anonymising data where necessary, using Find and Replace , should be performed as far as possible, in Word.

b) Subheaders, identifiers, question numbers

Always make spelling, spacing etc., of repeating speaker identifiers, question headers, section headers, topic headers, absolutely uniform throughout text, not for example an inconsistent mixture of both.





You may need to depend on this uniformity when performing text searches and saving or autocoding the results. It is easier to use Text search tools which look for exact strings of characters, not approximations.

Use a clear speaker identifier preferably in UPPER CASE. This will allow CASE SENSITIVE searches for specifically SPEAKER IDENTIFIERS or for words used in SUB-HEADERS as opposed to general text or transcribed speech. If only using one letter as the identifier, always add a colon or hyphen, to distinguish that identifier from the first capital letter of a normal word.

I: So tell me how you feel.


C- Well, its hard to describe but when I listen to Capital Radio I feel as if I'm keeping in touch with the London scene...

(if using a text search tool to find a particular speaker's text, a search for a simple I or C would pick up all sorts of text in the general speech that you do not want. The I: or the C- makes a search expression you are unlikely to find elsewhere. But it's your data, you know what will work for you!).

Additionally using Edit /Find and Replace tools in Word, you can for instance replace all basic speaker IDs with codified information : 33-F-Md-PT: (respondent 33, female, married, part time worker). Using text search/autocoding tools you might quickly identify and code all -PT workers and their speaker sections within a focus group. Such an autocoding task will also be enabled by considering what units of text will be useful. See below.

c) Units of text

Paragraphs, sentences, sections, heading levels, all break up the data into units of context which may be useful to you. Unfortunately these structures have slightly different implications for each software. More of this once you know what software you will be using.

However do consider the use of paragraphs (not indented). Text searches can usually make use of paragraphs, since the autocoded finds can include surrounding text at a paragraph (if you want that).

Software packages, however, define paragraphs in different ways. The basic commonality is that the first level of paragraph is defined by most software as the text separated by a forced line break (in other words one hard return /enter). These paragraphs will be useful in most software -especially when using text searching or autocoding tools

Generally speaking it is a good idea NOT to put any clear/empty line space between speaker identifiers, topic headers etc and the text associated with them . Find more out about this when you have chosen a package.

Sentences: Using full stops, exclamation and question marks is fairly normal. These sentences may be especially useful in some software as a unit of text to be autocoded. However normal speech does not always signal where these should be; so, if transcribing or instructing a transcriber, it might be useful to consciously insert full stops at reasonable moments in the transcription.

Line numbering: if you have traditionally used line numbering as the way to easily reference and locate text when using more traditional methods of handling data, you will find it difficult to keep those line numbers (and those line breaks) when transferring the data in Rich text Format to a software package.

top of page