Difference between revisions of "TDEIO"

From Trinity Desktop Project Wiki
Jump to navigation Jump to search
m (-Applicable to TDE (redundant), categories to bottom, minor reorganisation)
m (Blu256 moved page Network transparency (KDE3 Architecture) to TDEIO: More straightforward title)
 
(No difference)

Latest revision as of 10:26, 14 June 2022

In the TDE libraries, network transparency is implemented in the TDEIO API. The central concept of this architecture is an IO job. A job may copy, or delete files or similar things. Once a job is started, it works in the background and does not block the application. Any communication from the job back to the application - like delivering data or progress information - is done integrated with the TQt event loop.

Background

In the age of the world wide web, it is of essential importance that desktop applications can access resources over the internet: they should be able to download files from a web server, write files to an FTP server or read mails from a IMAP or POP server. Often, the ability to access files regardless of their location is called network transparency.

In the past, different approaches to this goals were implemented. The old NFS file system is an attempt to implement network transparency on the level of the POSIX API. While this approach works quite well in local, closely coupled networks, it does not scale for resources to which access is unreliable and possibly slow. Here, asynchronicity is important. While you are waiting for your web browser to download a page, the user interface should not block. Also, the page rendering should not begin when the page is completely available, but should updated regularly as data comes in.

IO Slaves

Background operation is achieved by starting ioslaves to perform certain tasks. ioslaves are started as separate processes and are communicated with through UNIX domain sockets. In this way, no multi-threading is necessary and unstable slaves can not crash the application that uses them.

URLs

File locations are expressed by the widely used URLs. But in Trinity, URLs do not only expand the range of addressable files beyond the local file system. It also goes in the opposite direction - e.g. you can browse into tar archives. This is achived by nesting URLs. For example, a file in a tar archive on a http server could have the URL http://www-com.physik.hu-berlin.de/~bernd/article.tgz#tar:/paper.tex

Using TDEIO

In most cases, jobs are created by calling functions in the TDEIO namespace. These functions take one or two URLs as arguments, and possible other necessary parameters. When the job is finished, it emits the signal result(TDEIO::Job*). After this signal has been emitted, the job deletes itself. Thus, a typical use case will look like this:

void FooClass::makeDirectory()
{
    SimpleJob *job = TDEIO::mkdir(KURL("file:/home/bernd/tdeiodir"));
    connect( job, SIGNAL(result(TDEIO::Job*)), 
             this, SLOT(mkdirResult(TDEIO::Job*)) );
}

void FooClass::mkdirResult(TDEIO::Job *job)
{
    if (job->error())
        job->showErrorDialog();
    else
        kdDebug() << "mkdir went fine" << endl;
}

Depending on the type of the job, you may connect also to other signals.

Here is an overview over the possible functions:

TDEIO::mkdir(const KURL &url, int permission)
Creates a directory, optionally with certain permissions.
TDEIO::rmdir(const KURL &url)
Removes a directory.
TDEIO::chmod(const KURL &url, int permissions)
Changes the permissions of a file.
TDEIO::rename(const KURL &src, const KURL &dest, bool overwrite)
Renames a file.
TDEIO::symlink(const TQString &target, const KURL &dest, bool overwrite, bool showProgressInfo)
Creates a symbolic link.
TDEIO::stat(const KURL &url, bool showProgressInfo)
Finds out certain information about the file, such as size, modification time and permissions. The information can be obtained from TDEIO::StatJob::statResult() after the job has finished.
TDEIO::get(const KURL &url, bool reload, bool showProgressInfo)
Transfers data from a URL.
TDEIO::put(const KURL &url, int permissions, bool overwrite, bool resume, bool showProgressInfo)
Transfers data to a URL.
TDEIO::http_post(const KURL &url, const TQByteArray &data, bool showProgressInfo)
Posts data. Special for HTTP.
TDEIO::mimetype(const KURL &url, bool showProgressInfo)
Tries to find the MIME type of the URL. The type can be obtained from TDEIO::MimetypeJob::mimetype() after the job has finished.
TDEIO::file_copy(const KURL &src, const KURL &dest, int permissions, bool overwrite, bool resume, bool showProgressInfo)
Copies a single file.
TDEIO::file_move(const KURL &src, const KURL &dest, int permissions, bool overwrite, bool resume, bool showProgressInfo)
Renames or moves a single file.
TDEIO::file_delete(const KURL &url, bool showProgressInfo)
Deletes a single file.
TDEIO::listDir(const KURL &url, bool showProgressInfo)
Lists the contents of a directory. Each time some new entries are known, the signal TDEIO::ListJob::entries() is emitted.
TDEIO::listRecursive(const KURL &url, bool showProgressInfo)
Similar to the listDir() function, but this one is recursive.
TDEIO::copy(const KURL &src, const KURL &dest, bool showProgressInfo)
Copies a file or directory. Directories are copied recursively.
TDEIO::move(const KURL &src, const KURL &dest, bool showProgressInfo)
Moves or renames a file or directory.
TDEIO::del(const KURL &src, bool shred, bool showProgressInfo)
Deletes a file or directory.

Directory entries

Both the TDEIO::stat() and TDEIO::listDir() jobs return their results as a type UDSEntry, UDSEntryList respectively. The latter is defined as TQValueList<UDSEntry>. The acronym UDS stands for "Universal directory service". The principle behind it is that the a directory entry only carries the information which an ioslave can provide, not more. For example, the http slave does not provide any information about access permissions or file owners. Instead, a UDSEntry is a list of UDSAtoms. Each atom provides a specific piece of information. It consists of a type stored in m_uds and either an integer value in m_long or a string value in m_str, depending on the type.

The following types are currently defined:

  • UDS_SIZE (integer) - Size of the file.
  • UDS_USER (string) - User owning the file.
  • UDS_GROUP (string) - Group owning the file.
  • UDS_NAME (string) - File name.
  • UDS_ACCESS (integer) - Permission rights of the file, as e.g. stored by the libc function stat() in the st_mode field.
  • UDS_FILE_TYPE (integer) - The file type, as e.g. stored by stat() in the st_mode field. Therefore you can use the usual libc macros like S_ISDIR to test this value. Note that the data provided by ioslaves corresponds to stat(), not lstat(), i.e. in case of symbolic links, the file type here is the type of the file pointed to by the link, not the link itself.
  • UDS_LINK_DEST (string) - In case of a symbolic link, the name of the file pointed to.
  • UDS_MODIFICATION_TIME (integer) - The time (as in the type time_t) when the file was last modified, as e.g. stored by stat() in the st_mtime field.
  • UDS_ACCESS_TIME (integer) - The time when the file was last accessed, as e.g. stored by stat() in the st_atime field.
  • UDS_CREATION_TIME (integer) - The time when the file was created, as e.g. stored by stat() in the st_ctime field.
  • UDS_URL (string) - Provides a URL of a file, if it is not simply the the concatenation of directory URL and file name.
  • UDS_MIME_TYPE (string) - MIME type of the file
  • UDS_GUESSED_MIME_TYPE (string) - MIME type of the file as guessed by the slave. The difference to the previous type is that the one provided here should not be taken as reliable (because determining it in a reliable way would be too expensive). For example, the KRun class explicitly checks the MIME type if it does not have reliable information.

Although the way of storing information about files in a UDSEntry is flexible and practical from the ioslave point of view, it is a mess to use for the application programmer. For example, in order to find out the MIME type of the file, you have to iterate over all atoms and test whether m_uds is UDS_MIME_TYPE. Fortunately, there is an API which is a lot easier to use: the class KFileItem.

Synchronous usage

Often, the asynchronous API of TDEIO is too complex to use and therefore implementing full asynchronicity is not a priority. For example, in a program that can only handle one document file at a time, there is little that can be done while the program is downloading a file anyway. For these simple cases, there is a mucher simpler API in the form of a set of static functions in TDEIO::NetAccess. For example, in order to copy a file, use

KURL source, target;
source = ...;
target = ...
TDEIO::NetAccess::copy(source, target);

The function will return after the complete copying process has finished. Still, this method provides a progress dialog, and it makes sure that the application processes repaint events.

A particularly interesting combination of functions is download() in combination with removeTempFile(). The former downloads a file from given URL and stores it in a temporary file with a unique name. The name is stored in the second argument. If the URL is local, the file is not downloaded, and instead the second argument is set to the local file name. The function removeTempFile() deletes the file given by its argument if the file is the result of a former download. If that is not the case, it does nothing. Thus, a very easy to use way of loading files regardless of their location is the following code snippet:

KURL url;
url = ...;
TQString tempFile;
if (TDEIO::NetAccess::download(url, tempFile) {
    // load the file with the name tempFile
    TDEIO::NetAccess::removeTempFile(tempFile);
}

Meta data

As can be seen above, the interface to IO jobs is quite abstract and does not consider any exchange of information between application and IO slave that is protocol specific. This is not always appropriate. For example, you may give certain parameters to the HTTP slave to control its caching behavior or send a bunch of cookies with the request. For this need, the concept of meta data has been introduced. When a job is created, you can configure it by adding meta data to it. Each item of meta data consists of a key/value pair. For example, in order to prevent the HTTP slave from loading a web page from its cache, you can use:

void FooClass::reloadPage()
{
    KURL url("http://www.trinitydesktop.org/about.php");
    TDEIO::TransferJob *job = TDEIO::get(url, true, false);
    job->addMetaData("cache", "reload");
    ...
}

The same technique is used in the other direction, i.e. for communication from the slave to the application. The method Job::queryMetaData() asks for the value of the certain key delivered by the slave. For the HTTP slave, one such example is the key "modified", which contains a (stringified representation of) the date when the web page was last modified. An example how you can use this is the following:

void FooClass::printModifiedDate()
{
    KURL url("http://www.trinitydesktop.org");
    TDEIO::TransferJob *job = TDEIO::get(url, true, false);
    connect( job, SIGNAL(result(TDEIO::Job*)),
             this, SLOT(transferResult(TDEIO::Job*)) );
}

void FooClass::transferResult(TDEIO::Job *job)
{
    TQString mimetype;
    if (job->error())
        job->showErrorDialog();
    else {
        TDEIO::TransferJob *transferJob = (TDEIO::TransferJob*) job;
        TQString modified = transferJob->queryMetaData("modified");
        kdDebug() << "Last modified: " << modified << endl;
}

Scheduling

When using the TDEIO API, you usually do not have to cope with the details of starting IO slaves and communicating with them. The normal use case is to start a job and with some parameters and handle the signals the jobs emits.

Behind the curtains, the scenario is a lot more complicated. When you create a job, it is put in a queue. When the application goes back to the event loop, TDEIO allocates slave processes for the jobs in the queue. For the first jobs started, this is trivial: an IO slave for the appropriate protocol is started. However, after the job (like a download from an HTTP server) has finished, it is not immediately killed. Instead, it is put in a pool of idle slaves and killed after a certain time of inactivity (current 3 minutes). If a new request for the same protocol and host arrives, the slave is reused. The obvious advantage is that for a series of jobs for the same host, the cost for creating new processes and possibly going through an authentication handshake is saved.

Of course, reusing is only possible when the existing slave has already finished its previous job. when a new request arrives while an existing slave process is still running, a new process must be started and used. In the API usage in the examples above, there are no limitation for creating new slave processes: if you start a consecutive series of downloads for 20 different files, then TDEIO will start 20 slave processes. This scheme of assigning slaves to jobs is called direct. It not always the most appropriate scheme, as it may need much memory and put a high load on both the client and server machines.

So there is a different way. You can schedule jobs. If you do this, only a limited number (currently 3) of slave processes for a protocol will be created. If you create more jobs than that, they are put in a queue and are processed when a slave process becomes idle. This is done as follows:

KURL url("http://www.trinitydesktop.org");
TDEIO::TransferJob *job = TDEIO::get(url, true, false);
TDEIO::Scheduler::scheduleJob(job);

A third possibility is connection oriented. For example, for the IMAP slave, it does not make any sense to start multiple processes for the same server. Only one IMAP connection at a time should be enforced. In this case, the application must explicitly deal with the notion of a slave. It has to allocate a slave for a certain connection and then assign all jobs which should go through the same connection to the same slave. This can again be easily achieved by using the TDEIO::Scheduler:

KURL baseUrl("imap://bernd@albert.physik.hu-berlin.de");
TDEIO::Slave *slave = TDEIO::Scheduler::getConnectedSlave(baseUrl);

TDEIO::TransferJob *job1 = TDEIO::get(KURL(baseUrl, "/INBOX;UID=79374"));
TDEIO::Scheduler::assignJobToSlave(slave, job1);

TDEIO::TransferJob *job2 = TDEIO::get(KURL(baseUrl, "/INBOX;UID=86793"));
TDEIO::Scheduler::assignJobToSlave(slave, job2);

...

TDEIO::Scheduler::disconnectSlave(slave);

You may only disconnect the slave after all jobs assigned to it are guaranted to be finished.

Defining an ioslave

In the following we discuss how you can add a new ioslave to the system. In analogy to services, new ioslaves are advertised to the system by installing a little configuration file. The following CMakeLists.txt snippet installs the ftp protocol:


install(
  FILES ftp.protocol
  DESTINATION ${SERVICES_INSTALL_DIR}
)

The contents of the file ftp.protocol is as follows:

[Protocol]
exec=tdeio_ftp
protocol=ftp
input=none
output=filesystem
listing=Name,Type,Size,Date,Access,Owner,Group,Link,
reading=true
writing=true
makedir=true
deleting=true
Icon=ftp

The protocol entry defines for which protocol this slave is responsible. "exec" is (in contrast what you would expect naively) the name of the library that implements the slave. When the slave is supposed to start, the "tdeinit" executable is started which in turn loads this library into its address space. So in practice, you can think of the running slave as a separate process although it is implemented as library. The advantage of this mechanism is that it saves a lot of memory and reduces the time needed by the runtime linker.

The input and output lines are not used currently.

The remaining lines in the .protocol file define which abilities the slave has. In general, the features a slave must implement are much simpler than the features the TDEIO API provides for the application. The reason for this is that complex jobs are scheduled to a couple of subjobs. For example, in order to list a directory recursively, one job will be started for the toplevel directory. Then for each subdirectory reported back, new subjobs are started. A scheduler in TDEIO makes sure that not too many jobs are active at the same time. Similarly, in order to copy a file within a protocol that does not support copying directly (like the ftp: protocol), TDEIO can read the source file and then write the data to the destination file. For this to work, the .protocol must advertise the actions its slave supports.

Since slaves are loaded as shared libraries, but constitute standalone programs, their code framework looks a bit different from normal shared library plugins. The function which is called to start the slave is called kdemain(). This function does some initializations and then goes into an event loop and waits for requests by the application using it. This looks as follows:

extern "C" { int kdemain(int argc, char **argv); }

int kdemain(int argc, char **argv)
{
    TDELocale::setMainCatalogue("tdelibs");
    TDEInstance instance("kio_ftp");
    (void) TDEGlobal::locale();

    if (argc != 4) {
        fprintf(stderr, "Usage: tdeio_ftp protocol "
                        "domain-socket1 domain-socket2\n");
        exit(-1);
    }

    FtpSlave slave(argv[2], argv[3]);
    slave.dispatchLoop();
    return 0;
}

Implementing an ioslave

Slaves are implemented as subclasses of TDEIO::SlaveBase (FtpSlave in the above example). Thus, the actions listed in the .protocol correspond to certain virtual functions in TDEIO::SlaveBase the slave implementation must reimplement. Here is a list of possible actions and the corresponding virtual functions:

void get(const KURL &url)
Reading - Reads data from a URL
void put(const KURL &url, int permissions, bool overwrite, bool resume)
Writing - Writes data to a URL and create the file if it does not exist yet.
void rename(const KURL &src, const KURL &dest, bool overwrite)
Moving - Renames a file.
void del(const KURL &url, bool isFile)
Deleting - Deletes a file or directory.
void listDir(const KURL &url)
Listing - Lists the contents of a directory.
void mkdir(const KURL &url, int permissions)
Makedir - Creates a directory.

Additionally, there are reimplementable functions not listed in the .protocol file. For these operations, TDEIO automatically determines whether they are supported or not (i.e. the default implementation returns an error).

void stat(const KURL &url)
Delivers information about a file, similar to the C function stat()
void chmod(const KURL &url, int permissions)
Changes the access permissions of a file.
void mimetype(const KURL &url)
Determines the MIME type of a file.
copy(const KURL &url, const KURL &dest, int permissions, bool overwrite)
Copies a file.
void symlink(const TQString &target, const KURL &dest, bool overwrite)
Creates a symbolic link.

All these implementation should end with one of two calls: If the operation was successful, they should call finished(). If an error has occured, error() should be called with an error code as first argument and a string in the second. Possible error codes are listed as enum TDEIO::Error. The second argument is usually the URL in question. It is used e.g. in TDEIO::Job::showErrorDialog() in order to parametrize the human-readable error message.

For slaves that correspond to network protocols, it might be interesting to reimplement the method SlaveBase::setHost(). This is called to tell the slave process about the host and port, and the user name and password to log in. In general, meta data set by the application can be queried by SlaveBase::metaData(). You can check for the existence of meta data of a certain key with SlaveBase::hasMetaData().

Communicating back to the application

Various actions implemented in a slave need some way to communicate data back to the application using the slave process:

  • get() sends blocks of data. This is done with data(), which takes a TQByteArray as argument. Of course, you do not need to send all data at once. If you send a large file, call data() with smaller data blocks, so the application can process them. Call finished() when the transfer is finished.
  • listDir() reports information about the entries of a directory. For this purpose, call listEntries() with a TDEIO::UDSEntryList as argument. Analogously to data(), you can call this several times. When you are finished, call listEntry() with the second argument set to true. You may also call totalSize() to report the total number of directory entries, if known.
  • stat() reports information about a file like size, MIME type, etc. Such information is packaged in a TDEIO::UDSEntry, which will be discussed below. Use statEntry() to send such an item to the application.
  • mimetype() calls mimeType() with a string argument.
  • get() and copy() may want to provide progress information. This is done with the methods totalSize(), processedSize(), speed(). The total size and processed size are reported as bytes, the speed as bytes per second.
  • You can send arbitrary key/value pairs of meta data with setMetaData().

Interacting with the user

Sometimes a slave has to interact with the user. Examples include informational messages, authentication dialogs and confirmation dialogs when a file is about to be overwritten.

  • infoMessage() - This is for informational feedback, such as the message "Retrieving data from <host>" from the http slave, which is often displayed in the status bar of the program. On the application side, this method corresponds to the signal TDEIO::Job::infoMessage().
  • warning() - Displays a warning in a message box with KMessageBox::information(). If a message box is still open from a former call of warning() from the same slave process, nothing happens.
  • messageBox() - This is richer than the previous method. It allows to open a message box with text and caption and some buttons. See the enum SlaveBase::MessageBoxType for reference.
  • openPassDlg() - Opens a dialog for the input of user name and password.


Initial Author: Bernd Gehrmann