Incredibly Precise Sorter - prosty projekt w C++

Cześć,

Jakiś czas temu zacząłem pisać projekt w C++.

Jego działanie jest dość proste:

Bierzemy plik txt z liczbami całkowitymi --> wczytujemy je do programu --> sortujemy --> zapisujemy posortowane liczby do nowego pliku.

Głównym założeniem dla mnie było to, żeby nauczyć się różnych algorytmów do sortowania. Więc z czasem dodawałem nowe algorytmy. Jednak po pewnym czasie pomyślałem sobie, że to za mało, więc postanowiłem mierzyć czas sortowania dla każdego z algorytmów.

Na końcu tej przygody pomyślałem, że warto jest jeszcze nauczyć się testów jednostkowych w C++ - wybrałem GoogleTest.

To jest cały zarys projektu.

Kilka pytań:

Prosiłbym o taki bardzo ogólny code review projektu (zarówno ze strony designu strukturyzacji, jak i ze strony czysto technicznej samego kodu)
Czy ja w ogóle poprawnie implementuje testy jednostkowe? Patrząc na testy samych algorytmów sortowania to wygląda to dla mnie całkiem sensownie (daje jakąś pewność, że algorytm rzeczywiście poprawnie sortuje liczby), ale mam wątpliwości jeżeli chodzi o implementację testów do modułu IO - przykładowo funkcję zapisującą liczby do pliku testuję w taki sposób, że: wywołuję testowaną funkcję zapisującą liczby (podaję testową nazwę pliku) -> pobieram zawartość testowego pliku do zmiennej (jako string) -> usuwam ze zmiennej spacje i znaki nowej linii -> na końcu sprawdzam czy zawartość pliku jest taka sama jak vector z liczbami przekonwertowany na stringa
Nie do końca wiem w jaki sposób mógłbym przetestować funkcje:
output_data i output_result z modułu IO, ponieważ pierwsza wypisuje dane na podany ostream, a druga na cout

Niżej zamieszczam niepełny kod, ponieważ mogłoby być go za dużo i stałoby się to nieczytelne.

Także reszta kodu znajduje się na Github: https://github.com/fzwolinski/Incredibly-Precise-Sorter

Tutaj zamieszczę tylko kod do modułu IO i jego testy:

io.cpp (moduł do inputu i outputu)

#include <fstream>
#include <algorithm>
#include <iterator>
#include <filesystem>
#include <vector>
#include <string>
#include <iostream>
#include <sstream>

#include "io.hpp"

std::vector<int> IO::load_data(std::filesystem::path const& path) {
  /* 
   * Loads and returns data (integer numbers) from file
   *
   * :param path: Path to the data file
   * :return: data as vector of integers
  */
  auto data = std::vector<int>{};

  auto file = std::ifstream{path};

  std::copy(
    std::istream_iterator<int>{file},
    std::istream_iterator<int>{},
    std::back_inserter(data)
  );

  if (data.empty() == 1) {
    throw std::invalid_argument("Error reading file");
  }

  return data;
}

void IO::save_data_to_file(std::filesystem::path const& path, std::vector<int> const& data) {
  /* 
   * Saves data (integer numbers) to the file
   *
   * :param path: The path to the file for saving the data
   * :param data: Data (int nums) to be saved in file
  */
  std::ofstream out (path);

  auto first = true;
  for (auto&& number : data) {
    if (not first)
      out << ", ";

    first = false;

    out << number;
  }
}

void IO::output_data(std::vector<int> const& data, std::ostream& out) {
  /* 
   * Outputs data (integer numbers) to the defined output stream
   *
   * :param data: Data to output
   * :param out: output stream
  */
  auto first = true;
  for (auto&& number : data) {
    if (not first)
      out << ", ";

    first = false;

    out << number;
  }
}

void IO::output_result(std::string const& alg_name, double const& duration, std::size_t num_qty, bool avg) {
  /* 
   * Outputs (to the console) results of sort measurement
   *
   * :param alg_name: name of the algorithm
   * :param duration: sort measurement time (in microseconds)
   * :param num_qty: quantity of numbers to be sorted
   * :param avg: flag, specifying if this is output of single measurement or average time
  */

  double miliseconds = duration * 0.001;
  int seconds = miliseconds * 0.001;
  std::string out_type = " Took:";

  if (avg) {
    out_type = " Avg:";
  }

  std::cout.width(16);
  std::cout << "[" + alg_name + "]";

  std::cout.width(7);
  std::cout << out_type;

  auto duration_os = std::ostringstream{};
  duration_os.precision(0);
  duration_os << std::fixed << duration;

  std::cout.width(10);
  std::cout << duration_os.str();

  std::cout.width(2);
  std::cout << " \xE6s"; // microseconds symbol

  auto ms_os = std::ostringstream{};
  ms_os.precision(0);
  ms_os << " (" << std::fixed << miliseconds << " ms)";

  std::cout.width(13);
  std::cout << ms_os.str();

  std::cout.width(6);
  std::cout << " (" + std::to_string(seconds) + "s)";

  std::cout.width(20);
  std::cout << " to sort " + std::to_string(num_qty) + " numbers\n";
  if (avg) {
    std::cout << "\n";
  }
}

void IO::save_result_to_file(std::filesystem::path const& path, std::string const& alg_name, double const& duration, std::size_t num_qty, bool avg) {
  /* 
   * Saves test results to specified file (path)
   *
   * :param path: path to save the file with res
   * :param alg_name: name of the algorithm
   * :param duration: sort measurement time (in microseconds)
   * :param num_qty: quantity of numbers to be sorted
   * :param avg: flag, specifying if this is output of single measurement or average time
  */

  std::ofstream out;
  out.open(path, std::ios_base::app); // Append do file

  double miliseconds = duration * 0.001;
  int seconds = miliseconds * 0.001;
  std::string out_type = " Took:";

  if (avg) {
    out_type = " Avg:";
  }

  out.width(16);
  out << "[" + alg_name + "]";

  out.width(7);
  out << out_type;

  auto duration_os = std::ostringstream{};
  duration_os.precision(0);
  duration_os << std::fixed << duration;

  out.width(10);
  out << duration_os.str();

  out.width(2);
  out << " mcs"; // microseconds symbol

  auto ms_os = std::ostringstream{};
  ms_os.precision(0);
  ms_os << " (" << std::fixed << miliseconds << " ms)";

  out.width(13);
  out << ms_os.str();

  out.width(6);
  out << " (" + std::to_string(seconds) + "s)";

  out.width(20);
  out << " to sort " + std::to_string(num_qty) + " numbers\n";
  if (avg) {
    out << "\n";
  }
}

io_load_data_test.cpp

#include "../io.hpp"

#include <gtest/gtest.h>
#include <fstream>

class IOLoadDataTest : public ::testing::Test {
protected:
  virtual void SetUp() { 
    test_numbers = {5, 0, 2, 5, 6, -99, 3};
    
    // Create file with numbers
    std::ofstream f ("test_file_with_numbers.txt");
    for(auto num : test_numbers) {
      f << num << "\n";
    }
    f.close();

    // Empty file [For checking exception]
    std::ofstream empty_f ("empty_test_file.txt");
    empty_f << "";
    empty_f.close();
   }

  virtual void TearDown() {
    // Remove test files
    std::remove("test_file_with_numbers.txt");
    std::remove("empty_test_file.txt");
  }

  std::vector<int> test_numbers;
};

TEST_F(IOLoadDataTest, LoadDataShouldProperlyLoadNumbersFromFile) {
  auto data = IO::load_data("test_file_with_numbers.txt");

  EXPECT_EQ(test_numbers, data);  
}

TEST_F(IOLoadDataTest, ShouldThrowExceptionWhileLoadingFromEmptyFile) {
  try {
    auto data = IO::load_data("empty_test_file.txt");
  }
  catch(std::invalid_argument const& e) {
    EXPECT_STREQ("Error reading file", e.what());
  }
}

TEST_F(IOLoadDataTest, ShouldThrowExceptionWhileLoadingDataFromNotExistingFile) {
  try {
    auto data = IO::load_data("this_file_does_not_exist.txt");
  }
  catch(std::invalid_argument const& e) {
    EXPECT_STREQ("Error reading file", e.what());
  }
}

io_save_data_to_file_test.cpp

#include "../io.hpp"

#include <gtest/gtest.h>

class IOSaveDataToFileTest : public ::testing::Test {
protected:
  virtual void SetUp() {
    // Save test data  
    IO::save_data_to_file(filename_0, test_data_0);
    IO::save_data_to_file(filename_1, test_data_1);
  }

  virtual void TearDown() {
    // Remove test file
    std::remove(filename_0.c_str());
    std::remove(filename_1.c_str());
  }

  // Normal data
  const std::vector<int> test_data_0{1, 0, -5, 22, -1, 99};
  const std::string correct_0 = "1,0,-5,22,-1,99";
  const std::string filename_0 = "test_saved_data.txt";

  // Empty data
  const std::vector<int> test_data_1{};
  const std::string correct_1 = "";
  const std::string filename_1 = "test_saved_empty_data.txt";
};


TEST_F(IOSaveDataToFileTest, SavesCorrectDataToFile) {
  std::ifstream test_file(filename_0);
  
  std::string f_content((std::istreambuf_iterator<char>(test_file)),
                         std::istreambuf_iterator<char>());

  f_content.erase(std::remove_if(f_content.begin(), f_content.end(), isspace), f_content.end());
  f_content.erase(std::remove(f_content.begin(), f_content.end(), '\n'), f_content.end());

  EXPECT_EQ(correct_0, f_content);
}

TEST_F(IOSaveDataToFileTest, SavesEmptyDataToFile) {
  std::ifstream test_file(filename_1);
  
  std::string f_content((std::istreambuf_iterator<char>(test_file)),
                         std::istreambuf_iterator<char>());

  f_content.erase(std::remove_if(f_content.begin(), f_content.end(), isspace), f_content.end());
  f_content.erase(std::remove(f_content.begin(), f_content.end(), '\n'), f_content.end());

  EXPECT_EQ(correct_1, f_content);
}

PS. Nie przywiązujcie zbyt dużej uwagi do tytuły projektu. Oczywiście nie jest to niesamowicie precyzyjny sorter. To tylko nazwa :)

Rzeczy z review (nie patrzyłem na poprawność algorytmów):

- ogólnie taki design że masz klasy z samymi statycznymi metodami nie ma sensu. Jak chcesz po prostu odizolować przestrzenie nazw, to do dokładnie tego służy namespace. Funkcje prywatne (helpery) mogą być po prostu static funkcjami w .cpp i w ogóle nie być eksponowane w nagłówku.

- wszystkie Twoje funkcje sortujące kopiują wejście, podczas gdy kanonicznie powinny edytować wejście. (i user może po prostu przekazać kopię jak chce).

- Twój mergesort kopiuje vectory jak szalony, podczas gdy na papierze do algorytmu powinna wystarczyć jedna dodatkowa tablica rozmiaru N.

- cmake:

cmake_minimum_required(VERSION 2.6)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++2a")
find_package(GTest REQUIRED)
include_directories(${GTEST_INCLUDE_DIRS})

...tak się ani trochę współcześnie cmake'a nie używa. Ani CMAKE_CXX_FLAGS, ani include_directories. Sam cmake ma narzędzia do osobnego odpalania testów (ctest). Rzuć okiem na oficjalną dokumentację:

https://google.github.io/googletest/quickstart-cmake.html

Punkty 2/3:

Ogólnie powinieneś odizolować funkcję czytającą / formatującą tekst (niech albo bierze istream&/ostream&, albo bierze i zwraca stringi) i osobną funkcję manipulującą plikami. Tą pierwszą można łatwo unit testować, a drugiej w zasadzie nie trzeba unit testować (bo wtedy w zasadzie testujesz bibliotekę standardową), najwyżej testować integracyjnie.