Please write C++ code which is able to analyse a text block and provide the following statistics:
The start position in the text of all the smileys. Smileys are defined as the character colon plus an optional dash character and a bracket. e.g: :-] or :(
The top 10 used words (excluding smileys)
The application must support the following output formats: Console, Simple text file, XML file.
Any combination of these backends can be specified through command line arguments (Console only, Text file only, Text file + XML file, etc.). The output should contain all the above information (1 and 2 from the list above). Formatting is up to the developer.
Additional information:
UTF8 encoding can be assumed
Lines are separated by '\n'
Words are separated by whitespace
There are some edge cases that should be considered:
Whitespace can be '\t', multiple consecutive whitespace characters, etc
The text can consist of a single line without '\n' at the end. The line count should be 1 in such a case.
Desired solution should have the following:
Solution has to run on Linux.
Design with classes and clean APIs.
Good documentation.
Code quality to be comparable to production code.
Usage of C++11/14/17 features.
Usage of STL and Boost features would be preferable whenever applicable.
Unit tests > 90%
CMake file which covers building and adding the tests
If you find that this spec is ambiguous or incomplete please decide on your own and document our decision.