A new tool for converting food frequency questionnaire data into nutrient and food group values: FETA research methods and availability

Objectives To describe the research methods for the development of a new open source, cross-platform tool which processes data from the European Prospective Investigation into Cancer and Nutrition Norfolk Food Frequency Questionnaire (EPIC-Norfolk FFQ). A further aim was to compare nutrient and food group values derived from the current tool (FETA, FFQ EPIC Tool for Analysis) with the previously validated but less accessible tool, CAFÉ (Compositional Analyses from Frequency Estimates). The effect of text matching on intake data was also investigated. Design Cross-sectional analysis of a prospective cohort study—EPIC-Norfolk. Setting East England population (city of Norwich and its surrounding small towns and rural areas). Participants Complete FFQ data from 11 250 men and 13 602 women (mean age 59 years; range 40–79 years). Outcome measures Nutrient and food group intakes derived from FETA and CAFÉ analyses of EPIC-Norfolk FFQ data. Results Nutrient outputs from FETA and CAFÉ were similar; mean (SD) energy intake from FETA was 9222 kJ (2633) in men, 8113 kJ (2296) in women, compared with CAFÉ intakes of 9175 kJ (2630) in men, 8091 kJ (2298) in women. The majority of differences resulted in one or less quintile change (98.7%). Only mean daily fruit and vegetable food group intakes were higher in women than in men (278 vs 212 and 284 vs 255 g, respectively). Quintile changes were evident for all nutrients, with the exception of alcohol, when text matching was not executed; however, only the cereals food group was affected. Conclusions FETA produces similar nutrient and food group values to the previously validated CAFÉ but has the advantages of being open source, cross-platform and complete with a data-entry form directly compatible with the software. The tool will facilitate research using the EPIC-Norfolk FFQ, and can be customised for different study populations.


INTRODUCTION
Food Frequency Questionnaires (FFQs) are commonly used in epidemiological studies to assess the dietary intake of large populations. Their popularity derives from ease of administration, ability to assess dietary intake over a defined period of time and low costs. 1 The European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk FFQ is semiquantitative and designed to record the average intake of foods during the previous year. The principles involved in data collection and processing of the EPIC-Norfolk FFQ and the development of the structure and content of the CAFÉ (Compositional Analyses from Frequency Estimates) programme for calculating nutrient intakes have been published previously. 2 The EPIC-Norfolk FFQ has been extensively validated and has been widely used. [3][4][5] However, the programmes used to process these FFQs, including CAFÉ, have not been easily accessible to end-users.
Our objective was to develop a new, open source, cross-platform processing tool (FETA-FFQ EPIC Tool for Analysis) based on and building on the earlier system, CAFÉ. 2 The aim of this report was to describe the research methods of the development of FETA, and to compare nutrient output from the FETA and CAFÉ Strengths and limitations of this study ▪ FETA (Food Frequency Questionnaire European Prospective Investigation into Cancer and Nutrition Tool for Analysis) has been tested using a large study sample of food intake data. ▪ No independent reference method used in the comparisons of FETA and CAFÉ (Compositional Analyses from Frequency Estimates) nutrient intake data although the CAFÉ system has been previously validated. ▪ Ability to modify the underlying data files in FETA to customise it for different study populations.
programmes. Food group intake data from FETA has also been described as having the effect of free text matching on nutrient and food group intake data. Free text matching refers to the assigning of an appropriate food code to handwritten text in the FFQ and will be further described in the methods section.

EPIC-FFQ design
The questionnaire consists of two parts. Part 1 consists of a food list of 130 lines; each line has a portion size attached to it: medium serving, standard unit or household measure. Study participants were requested to select an appropriate frequency of consumption for each line, from the nine frequency categories. As an example, figure 1 illustrates the sections relating to bread, savoury biscuits and breakfast cereals. A pdf copy of the EPIC-Norfolk FFQ may be downloaded from http://www. srl.cam.ac.uk/epic/epicffq/websitedocumentation.html; information on how to complete and code the FFQ is also available here. The questionnaire lines are either individual foods, combinations of individual foods or food types. The FFQ food list is based on items from an FFQ widely used within the USA, 6 7 but modified to reflect differences in American versus UK brand names, and some further food items were added. Part 2 contains further questions, a number of which ask for more detailed information that link back to food lines in part 1, as illustrated in figure 2. Detailed information was requested for breakfast cereals and fats as these are nutritionally important foods in the UK diet.

Data collection
The EPIC-Norfolk FFQ was posted to 25 639 participants in the EPIC-Norfolk cohort study. 8 The participants were aged 40-79 years, and the questionnaire was completed between 1993 and 1997. The study was approved by the Norfolk Local Research Ethics Committee, adhered to the Declaration of Helsinki, and all participants gave written informed consent. The FFQ was returned at a health examination, where it was checked and completed, if required, by trained nursing staff. In total, 25 351(99%) participants returned the completed questionnaire.
Comparison of FETA and CAFÉ programmes FETA uses a comma-separated values input file. Part 1 is coded as numeric values and part 2 is coded as numeric values and food codes, using the flowcharts and look-up lists provided (http://www.srl.cam.ac.uk/epic/epicffq/). We have also created a Microsoft Access form-based entry tool to facilitate FFQ data entry, based on the EPIC-Norfolk FFQ. The tool exports data in a format directly compatible with FETA. The FETA software was written in C and C++ languages, enabling faster processing times than SAS and the C/C++ software can also be used from the command line. The step-based graphical wizard for running FETA was written in Perl. Whereas in the CAFÉ programme, an Oracle-based entry system (Oracle Corporation, Redwood Shores, California, USA) was created to enter part 1 frequency data as numeric codes and part 2 data as numeric codes and free text. CAFÉ was written using SAS (SAS Software, V.8 of the SAS System for UNIX, SAS Institute Inc, Cary, North Carolina, USA) and links to tables in an Oracle relational database.
Part 1: Data entry Data were manually entered into a spreadsheet as numeric codes, using '1' for 'never or less than once a month', to '9' for '6+ times per day'. A code of '−9' was used to mark data where a frequency was not recorded. Where two frequencies were provided for a line, this was coded as '−4' and treated by CAFÉ and FETA programmes as missing data. However, in FETA, both frequencies may now be entered, separated by a semicolon, for example, '2;3', and FETA will process the first value.
Part 2: Assigning of food codes to ticked boxes and free text Part 2 contains handwritten text for milk, breakfast cereals and cooking fats (see figure 2, questions 3, 5, 6 and 7, respectively), which needs to be matched to the most appropriate food code in order to obtain nutrient data; this process is known as free text matching. The data in part 2 were coded using reference lists of food codes for varieties of milk, breakfast cereal and cooking fat. Where there is no clear match, it is suggested that a researcher consults the ingredients and nutrient information of the commercial item and compares this information with the nutrient profile of similar items from the reference lists. These reference lists and figures relating to food codes that may be assigned to appropriate ticked boxes may be found at http://www.srl.cam.ac.uk/ epic/epicffq/websitedocumentation.html Differences between FETA versus CAFÉ processing may also be found at http://www.srl.cam.ac.uk/epic/ epicffq/websitedocumentation.html; these differences relate to breakfast cereals, frying and baking fats, the outcome of selecting the 'None' or 'No' box, and default milk, cereal and fat codes.

Databases
Each line in part 1 of the FFQ is mapped to up to six food codes. Decisions regarding which food codes to use were based on data from UK government surveys and other UK population data. 7 9 10 These decisions were based on data for individuals aged 40-74 years. 7 Data for portion weights were sourced from UK population data and weighed records in 40-74-year-old study participants. 7 11 The EPIC-Norfolk FFQ uses 290 foods from the UK food composition database, McCance and Widdowson's 'The Composition of Foods' (5th edition) and its associated supplements. 12-21 A number of new food items were added to the EPIC-Norfolk FFQ food list, which are used in the FETA and CAFÉ programmes. These include low-calorie/diet fizzy drinks and crunchy oat cereal, as well as modified home-baked and fried foods (without their fat), to enable an individual's fat type, as recorded in part 2 of the FFQ, to be incorporated. However, the nutrient data of six of the nine new foods used in the CAFÉ programme were modified in FETA. These foods include crunchy oat cereal, milk nonspecific, low-calorie/diet fizzy drinks, solid vegetable oil, Crisp 'n Dry (solid fat), and oil and fat non-specific. Modifications to the nutrient data were made to ensure a more accurate nutrient profile and/or to better reflect the foods consumed, in the case of non-specific items, such as milk and oil/fat; these changes relate to nutrient/food data at the time of FFQ completion.

Identification of outliers
Outliers were defined as detailed previously. 2 In brief, the ratio of energy intake (EI) to basal metabolic rate (BMR) was calculated, where BMR was calculated using sex-specific Schofield equations, which included age and body weight. 22 Individuals in the top and bottom 0.5% of EI:BMR ratio were identified and excluded, as were individuals with FFQs containing 10 or more missing lines of data in part 1 of the FFQ.
Nutrient and food group outputs FETA produces four nutrient output formats and a sample of each of these can be viewed at http://www.srl. cam.ac.uk/epic/epicffq/websitedocumentation.html Output 1 contains average daily nutrient and food group intakes for an individual from all FFQ foods consumed, in wide format, suitable for import into a spreadsheet or statistical package. Intake data for 46 nutrients are provided as well as data for 14 basic food groups; however, only a selection of these nutrients is shown in this report. Output 2 contains the same nutrient intake data as output 1, but in long format, which is mostly suitable for programmers. Output 3 contains average daily nutrient and food group intakes (and amount of food consumed) for an individual for each FFQ line; this output file will be very large and is mostly suitable for programmers. The most detailed output (output 4) contains average daily nutrient and food group intakes, in addition to the amount of food consumed for an individual, for each food code, for each FFQ line (meal_id). An online description of each meal_id and nutrient code, including units of measurement, can be found in the data entry template. This output will also be very large and is mostly suitable for programmers.
A log file is created along with each output file, which records the processing of the data and provides useful error information (see online supplementary appendix 1 for log file of output 1). In these files, notes (general process information) and error messages are recorded, with a date and time stamp. The log files make it possible to calculate the number of missing frequencies based on part 1 (main grid) of the FFQ in order to exclude individuals with 10 or more missing ticks. The log files also record situations where a food code does not have any nutrient data attached to it.

Statistical analyses
The data were analysed using STATA V.10 (STATA Corp, Texas, USA). Intake data were described using mean, SD, median, minimum and maximum for FETA and CAFÉ programme outputs, stratified by sex. The nutrients selected for comparison are those described in the original CAFÉ paper. Where data on quintile changes are shown, cut-off points were calculated using CAFÉ nutrient data in order to compare quintile shift between FETA and CAFÉ output data.

RESULTS
We received FFQs from 25 351 participants (11 451 men and 13 900 women), with a mean age of 59 years. From this set, 249 FFQs (90 men and 159 women) containing 10 or more missing lines of data in part 1 of the FFQ were excluded, followed by a further exclusion of 250 FFQs (111 men and 139 women) from the top and bottom 0.5% of EI:BMR. This resulted in the final analytical dataset of 24 852 participants (11 250 men and 13 602 women). Table 1 shows the average daily intake data for a number of selected nutrients for 11 250 men. The data were similar for most nutrients across the two programmes. The nutrients which had the highest percentage of quintile change (≥10%) were monounsaturated fat, saturated fat, iron, vitamin D and vitamin E. However, only 1.3% of the men changed more than one quintile, for two of these five nutrients. The nutrients which had the lowest percentage of quintile changes were alcohol, calcium and carotene, with less than 3% change (table 1). Table 2 shows average daily intake data for the selected nutrients for 13 602 women, from FETA and CAFÉ programmes. There were similar quintile changes observed in women to those found in men for the selected nutrients; 4 of the 19 nutrients had a quintile change of greater than 10%: polyunsaturated fat, saturated fat, iron and vitamin E. However, the number of women who shifted more than one quintile was generally lower than the number observed in men. The nutrients which had the greatest percentage of women who changed more than one quintile were vitamins D and E, with 0.7% and 0.9%, respectively.

Nutrient intake data from FETA and CAFÉ programmes
Detailed (output 4) nutrient intake data at the individual level obtained from the two programmes were compared for approximately half of the participants (n=12 500; data not shown). All differences (>0.1%) found were investigated and explanations for these differences are considered in the discussion.
Food group intake data from FETA Average daily intakes for men and women of the 14 food groups readily available from FETA are shown in table 3. Mean daily intakes of six of the food groups were higher in men than in women: alcohol, cereals, fats, meat, potatoes and sugars. However, women had higher intakes of fruit (278 vs 212 g) and vegetables (284 vs 255 g). Mean daily intakes of eggs, fish, milk, non-alcoholic beverages, nuts and seeds, and soups and sauces were similar in men and women.
The effect of text matching in FETA Tables 4 and 5 illustrate the variation in nutrient and food group intake data obtained in a random subset of 1159 men and 1340 women, respectively, depending on whether text matching of milks, breakfast cereals, and baking and frying fats was applied. In general, mean nutrient intakes were higher when text matching was carried out. In men, (table 4), quintile changes (>15%) were most evident in the following nutrients: Englyst fibre, polyunsaturated fat, folate, vitamin D and vitamin E. The food group 'cereals and cereal products' was the only 1 of the 14 groups where there was a difference, with 31 men moving one quintile.  In women, (table 5), quintile changes (>15%) were also most evident in the same five nutrients. However, almost 21% of women also changed quintile for iron. Once again, the 'cereals and cereal products' food group was the only food group where there was any difference, with 40 women moving one quintile.

DISCUSSION
FETA provides a new, freely available, stand-alone tool that can produce nutrient and food group intake values from data collected using the EPIC-Norfolk FFQ. It makes the EPIC-Norfolk FFQ readily accessible to end-users and enables them to process and analyse nutritional data. The data can either be entered into a spreadsheet, using the instructions provided, or by using the specifically developed Microsoft Access form-based entry tool. The Access entry tool allows easier entry without requiring knowledge of specific food codes. The software for FETA for Windows and Linux can be downloaded from the website, as can the Microsoft Access data entry utility (http://www.srl.cam.ac.uk/epic/ epicffq/). Users are encouraged to register with EPIC-Norfolk, as this enables them to request assistance and support. The various types of output (with four levels of information) available should prove beneficial to researchers, especially those requiring more detailed information. There is an ongoing need for information on the intake of food groups. While the data from either output 3 or 4 could be used to generate more detailed food group data, we have treated food groups as another type of nutrient-a pseudonutrient. The FETA input/look-up files can be easily modified to create new groups, greatly adding to the flexibility of the system for analysing food group consumption, while requiring no spreadsheet or programming skills on the part of the analyst. A helpful feature of FETA is the log file which documents errors relating to FFQ data and/or default food codes assigned.
FETA was designed and based on the extensively validated EPIC-Norfolk FFQ, originally developed in 1988, to assess the nutrient and food group intake of 40-79-year-olds, who completed the FFQ between 1993 and 1997. The food list and look-up lists of milks, breakfast cereals and fats reflect this time period and the study population, as do the default milk, cereal, baking fat and frying fat codes assigned. However, the programme was created in such a way that it can be customised for different study populations, easily enabled by the separation of the processing algorithm in the FETA programme implementation from the data model text files. It is possible to delete/add foods and/or FFQ lines, and modify portion sizes as desired for a study. Nutrient data may also be easily modified or added. It is also possible for FETA to be used with other questionnaires containing a different set of line items or different numbers of frequencies.
Comparisons were carried out for a number of selected nutrients obtained from FETA and the previously validated CAFÉ programme. These showed that the nutrient output from both programmes were generally similar. All differences (>0.1%) found from the comparison of detailed food/nutrient data at the individual level for 12 500 participants from FETA and the CAFÉ programmes can be explained by one or more of the following reasons: up to four cereal foods assigned by FETA, as compared to a maximum of two cereal foods assigned by CAFÉ; differences in default baking and frying fat codes assigned; correction for muesli portion size in cereal data; exclusion of porridge from cereal data (free text); default codes assigned for milk, cereals or fats to participants using FETA (where no food codes were assigned by CAFÉ programme); rounding error (only where percentage absolute differences were   Although nutrient intakes as calculated by FETA and CAFÉ were similar, some relatively small differences existed, but these and the quintile shift of men and women can be explained. In FETA, a number of changes were made to the processing of breakfast cereals, affecting carbohydrate, starch, Englyst fibre, iron and folate estimates. The vitamin C content per 100 g of low-calorie/diet fizzy drinks was changed from 5 to 0 mg, and the vitamin E content of crunchy oat cereal and oil and fat non-specific was increased. Changes made to the processing of fats in questions 6 and 7 in part 2 of the FFQ, in addition to changes made to the fatty acid profile of the three new fats, could help explain the small differences observed in monounsaturated, polyunsaturated and saturated fat intakes. There was quite a large range in intake in the 14 food groups, with a minimum intake of zero for each of the food groups. It is difficult to compare food group intake data as the groupings of foods often vary. However, the combined mean intake of fruit (excluding juices) and vegetables for men and women was 467 and 562 g respectively, achieving the Government's 'Five a day' recommendation, 23 using a portion size of 80 g.
While text matching only affected one food group (cereals and cereal products), more than 15% of men and women changed quintile for a number of nutrients: Englyst fibre, polyunsaturated fat, folate, vitamin D and vitamin E, and iron (women only). Yet again, these nutrients related to the text matching of breakfast cereals and baking and frying fats. The inclusion of these data illustrates the effect of text matching on the ranking of individuals for certain nutrients and will enable future researchers using FETA to make informed decisions on the benefit of text matching for their study.
We have not addressed or discussed common FFQ issues, such as the number of items in a food list or the use of a single average portion size, as these are not the focus of this paper and have been reviewed previously. 24 25 It is anticipated that future updates of FETA might contain a number of improvements and overcome some of the limitations of FETA, currently released as V.2.53 for Windows and Linux (last updated 15 March and 21 February 2013, respectively). The source code has been made available online which enables users to make modifications and improvements to the programme. Currently, we have made available Windows and Linux versions and it is hoped that an OS X version will follow soon. We are currently working on a LibreOffice version of the Microsoft Access form-based entry tool.
In conclusion, we have created a new, open source, stand-alone, cross-platform FFQ processing tool, FETA, to produce nutrient and food group data for researchers using the EPIC-Norfolk FFQ. The tool produces similar nutrient and food group values to the previously validated CAFÉ programme, but is more accessible. Although FETA was designed and based on the EPIC-Norfolk FFQ, the programme was created in such a way that it can be customised for different study populations. It is anticipated that the development and availability of FETA will be a useful addition to the field of nutritional epidemiology and dietary public health.