Blog about tips & tricks for CMS enhancement

eric.petersson

Get related hits on attached documents for pages in Episerver Search & Navigation


Working with Episerver Search & Navigation (I am still used to be refering this product to the former Episerver Find name 😆), you may want to get hits on pages where attached documents content (their text content) refers to the page itself in your global search functionality.

When looking into this, there was some vague explanations in the official Episerver documentation regarding this. There is a mention regarding a NuGet package called EPiServer.Find.Cms.AttachmentFilter whom you are suppose to use to achieve this.

This blog post will focus on getting this together with the issues I faced regarding this:

Initialize the dependencies

Start of with initializing (if needed) the Episerver.Find.Cms.AttachmentFilter's IAttachmentHelper in your dependecy injection initializer:

using EPiServer.Cms.Shell.UI.Rest;
using EPiServer.Find.Cms;
using EPiServer.Find.Cms.AttachmentFilter;
using EPiServer.Framework;
using EPiServer.Framework.Initialization;
using EPiServer.ServiceLocation;
using EPiServer.Shell.UI.Rest;
using EPiServer.Web.Mvc;
using EPiServer.Web.Routing;
using System.Web.Http;
using System.Web.Mvc;

[InitializableModule]
[ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
[ModuleDependency(typeof(ServiceContainerInitialization))]
public class DependencyResolverInitialization : IConfigurableModule
{
    public void ConfigureContainer(ServiceConfigurationContext context)
    {

        context.ConfigurationComplete += (o, e) =>
        {
            context.Services.AddScoped<IAttachmentHelper, DefaultAttachmentHelper>();
        };

        var resolver = new StructureMapDependencyResolver(structureMapContainer);
        DependencyResolver.SetResolver(resolver);
        GlobalConfiguration.Configuration.DependencyResolver = resolver;
    }

    public void Initialize(InitializationEngine context)
    {
        // Not in use
    }

    public void Uninitialize(InitializationEngine context)
    {
        // Not in use
    }

    public void Preload(string[] parameters)
    {
        // Not in use
    }

}

And then in your FindInitialization you should extend your needed content type for indexing the SearchAttachmentText, as specified in the Episerver documentation:

using EPiServer.Find.ClientConventions;
using EPiServer.Find.Cms;
using EPiServer.Find.Cms.Conventions;
using EPiServer.Find.Cms.Module;
using EPiServer.Find.Framework;
using EPiServer.Framework;
using EPiServer.Framework.Initialization;

[InitializableModule]
[ModuleDependency(typeof(IndexingModule))]
public class FindInitialization : IInitializableModule
{
    public void Initialize(InitializationEngine context)
    {
        var searchConventions = SearchClient.Instance.Conventions;
        
        searchConventions.ForInstancesOf<NewsArticlePage>()
            .IncludeField(x => x.SearchAttachmentText());
    }

    public void Uninitialize(InitializationEngine context)
    {
        // Not in use
    }
}

In my example, I extend the content type of NewsArticlePage to include a property in Search & Navigation of the extracted text content of the attached media/document in Episerver. The Episerver editor choice in this case is of  IEnumerable<ContentReference> and the example below will therefor be of that implementation. You may in your case have a content area with a block containing the attached document or whatever.

public static string SearchAttachmentText(this NewsArticlePage page)
{
    var sb = new StringBuilder();

    foreach (var document in page.Documents())
    {
        sb.Append(document.ExtractDocumentContent());
    }

    return sb.ToString();
}

 public static IEnumerable<DocumentMedia> Documents(this NewsArticlePage page)
{
    if (page.Documents == null || !page.Documents.Any())
        return null;

    var contentRepository = ServiceLocator.Current.GetInstance<IContentRepository>();

    var documents = new List<DocumentMedia>();

    foreach (var doc in page.Documents)
    {
        documents.Add(contentRepository.Get<DocumentMedia>(doc));
    }

    return documents;
}

public static string ExtractDocumentContent(this DocumentMedia document)
{
    var sb = new StringBuilder();

    try
    {
        var attachmentHelper = ServiceLocator.Current.GetInstance<IAttachmentHelper>();
        using (var stream = new MemoryStream())
        {
            var writer = new StreamWriter(stream);
            attachmentHelper.ExtractFileText(document, writer);

            writer.Flush();
            stream.Position = 0;

            using (var reader = new StreamReader(stream))
            {
                sb.Append(reader.ReadToEnd());
            }

            return sb.ToString();
        };
    }
    catch (Exception ex)
    {
        var logger = ServiceLocator.Current.GetInstance<ILogger>();
        logger.Error("Attachment text failed: ", ex);
    }

    return string.Empty;
    }
}

Here, we iterate through each attached document on the page, get its belonging media types, in this case only DocumentMedia. Then we read out the documents text content with MemoryStream combined with StreamWriter/StreamReader and lastly, use Episerver's AttachmentHelper from the IAttachmentHelper to extract the content and append it to our string builder which will index the text the SearchAttachmentText property.

You should now be able to get hit on your search page from the inside content of the desired document!